TechDogs-"OpenAI’s New Advanced Reasoning GenAI Models See Mixed Views"

Emerging Technology

OpenAI’s New Advanced Reasoning GenAI Models See Mixed Views

By Amrit Mehra

Updated on Fri, Sep 13, 2024

Overall Rating
As artificial intelligence (AI) and generative artificial intelligence (GenAI) tools continue to dazzle users around the world, AI companies have found it important to keep updating and upgrading their offerings.

As such, generative AI technology industry leader OpenAI has come forth with an announcement revealing its newest model, which is poised to be extremely powerful when it comes to reasoning.

So, what’s the company’s newest offering about and what did users say about it? Let’s explore!


What Is OpenAI’s New Model About?

 
  • Through blog posts published on its website, OpenAI announced the release of its new AI models OpenAI o1 in preview and OpenAI o1-mini.

  • The motive behind the new models was to develop a series of models that would spend more time thinking before responding like a person would.

  • The models are built to reason through complex tasks and are more capable than previous models at solving problems in science, coding, math and more.

  • The models can even refine their thinking, explore different strategies to provide responses and even recognize mistakes in responses.

  • Noam Brown, a researcher at OpenAI specializing in reasoning research and tools, announced the launch of the new model and what it offers users through a series of posts on X.

  • “Today, I’m excited to share with you all the fruit of our effort at @OpenAI to create AI models capable of truly general reasoning: OpenAI's new o1 model series! (aka 🍓 [strawberry emoji]),” read the first post in a series of nine.

  • The post also confirmed that the new model is the previously reported “strawberry” project the company was working on.

  • OpenAI o1 will be available to ChatGPT Plus and Team users and will have to be manually selected through the model picker to be used.

  • To begin with, the o1 will come with a weekly rate limit of 30 messages and the o1-mini with 50 messages.

  • Access for ChatGPT Enterprise and Edu users is expected to come in a week after launch.

  • As for developers qualifying for API usage tier 5 can prototype both models in the API with a rate limit of 20 RPM (requests per minute).

  • OpenAI is looking to increase the rate limits and bring the model to free users soon and plans to “add browsing, file and image uploading and other features to make them more useful to everyone.”

  • OpenAI o1-mini is available to tier 5 API users (80% cheaper than OpenAI o1-preview), ChatGPT Plus, Team, Enterprise and Edu users

  • The model is optimized for STEM (science, technology, engineering and mathematics) reasoning.


What Do OpenAI’s New Series Of Models Offer?

 
  • In addition to excelling in math and coding, OpenAI also found that the next model update performs similarly to PhD students on benchmark tasks in physics, chemistry and biology.

  • As per the tests, where GPT-4o correctly solved only 13% of the problems in a qualifying exam for the International Mathematics Olympiad (IMO), the new model hit a score of 83%.

  • Furthermore, its coding abilities reached the 89th percentile in Codeforces competitions.

  • However, the model can’t browse the web for information or upload files and images, quite like other early models.

  • As per the blog post detailing OpenAI o1, the models are most useful for users tackling complex problems in fields such as genetics, economics, cognition, quantum physics, coding, reasoning, math, science, logic puzzles and more.

  • The company even published a wide range of videos explaining how the models can be used in fields.

  • The OpenAI o1-mini model offers users a faster and cheaper reasoning model that is also adept at coding.

  • As for safety, the company has introduced a new and improved safety training approach that uses the models' reasoning capabilities to ensure they adhere to safety and alignment guidelines.

  • On one of the OpenAI’s hardest jailbreak tests, where GPT-4o scored 22 out of 100, the o1 preview model scored 84.


TechDogs-"An Image Showing THe Comparison Of OpenAI's GPT4-o, o1 Preview And o1 Model On Challenging Reasoning Benchmarks"
While the model is poised to enable advanced reasoning capabilities, users experimenting with the model have presented mixed reactions.


How Did Users React?

 
  • One user on Reddit reported the model was able to solve the day’s Connections game in the New York Times, getting all four groupings right in two tries, in contrast to GPT4-o's strongest game coming in the form of one grouping.

  • Some even complimented the model’s use of its Memory feature, while also commending its improved ability to create jokes.

  • Some users tried the classic “how many Rs are there in the word strawberry” test. In some cases, the new model responded with the correct answer, while some users reported it still replied with an incorrect answer; 2.

  • While there’s no way to verify if the images posting the correct and incorrect answers are genuine, we tried to ask the model the question and its response was correct (i.e. 3), even after multiple attempts.

  • However, the model faced difficulties with some simple questions. One user asked “1 + 1 = ”, a question which the model “thought for 7 seconds”. Again, without the ability to verify the post’s credibility, we tried the same question.

  • The response we got was correct but it took “a few seconds.” When the question was modified to “1 + 1 = ?, Please Think Carefully What I Asked!”, the response took 10 seconds to answer with, “Certainly! Mathematically, 1 + 1 equals 2. However, if you're referring to binary numbers, 1 + 1 equals 10 in binary notation.”

  • On the other hand, one user noted that the model “still fails miserably at trivial questions”, having got a bizarre answer to the question “The surgeon, who is the boy's father says, "I can't operate on the boy, he's my son!" Who is the surgeon to the boy?”

  • Again, while this is unverifiable, we did try it out ourselves and the first answer we got was “The surgeon is the boy's mother.”

  • Upon tweaking, the response received was “Since the surgeon is the boy's father, and he refers to the boy as his son, this means the boy is actually his grandson. Therefore, the surgeon is the boy's grandfather.”

  • This was accompanied by a rather confusing explanation (look below).

 

TechDogs-"An Image Showing The Response Received To A Question Asked On The OpenAI o1 Preview Model"


TechDogs-"An Image Showing The Explanation For A Response Received To A Question Asked On The OpenAI o1 Preview Model"


Do you think OpenAI’s new model will help it capture a commanding position in the generative AI market? Do you think its competitors need to make similar moves soon to remain competitive?

Let us know in the comments below!

First published on Fri, Sep 13, 2024

Liked what you read? That’s only the tip of the tech iceberg!

Explore our vast collection of tech articles including introductory guides, product reviews, trends and more, stay up to date with the latest news, relish thought-provoking interviews and the hottest AI blogs, and tickle your funny bone with hilarious tech memes!

Plus, get access to branded insights from industry-leading global brands through informative white papers, engaging case studies, in-depth reports, enlightening videos and exciting events and webinars.

Dive into TechDogs' treasure trove today and Know Your World of technology like never before!

Disclaimer - Reference to any specific product, software or entity does not constitute an endorsement or recommendation by TechDogs nor should any data or content published be relied upon. The views expressed by TechDogs' members and guests are their own and their appearance on our site does not imply an endorsement of them or any entity they represent. Views and opinions expressed by TechDogs' Authors are those of the Authors and do not necessarily reflect the view of TechDogs or any of its officials. While we aim to provide valuable and helpful information, some content on TechDogs' site may not have been thoroughly reviewed for every detail or aspect. We encourage users to verify any information independently where necessary.

Join The Discussion

Join Our Newsletter

Get weekly news, engaging articles, and career tips-all free!

By subscribing to our newsletter, you're cool with our terms and conditions and agree to our Privacy Policy.

  • Dark
  • Light