TechDogs-"OpenAI Reveals GPT-4o That Brings Real-Time Reasoning Across Audio, Vision And Text"

Emerging Technology

OpenAI Reveals GPT-4o That Brings Real-Time Reasoning Across Audio, Vision And Text

By Amrit Mehra

Updated on Tue, May 14, 2024

Overall Rating
As the race to lead the generative artificial intelligence (GenAI) game continues, artificial intelligence (AI) companies are striving to produce and release powerful AI tools that can help them capture a stronger customer base ranging from enterprise and individual users.

Keeping to this thought, OpenAI, the powerhouse behind the most popular chatbot – ChatGPT, has come out with a new announcement that’s aimed at enhancing its position in the artificial intelligence sector.

So, what did the AI company announce? Let’s explore!
 

What Did OpenAI Announce?

 
  • Through a post on the social networking platform X and a release published on its website, OpenAI revealed its latest and most powerful model yet – GPT-4o.

  • The new model will be capable of carrying realistic voice conversations, with interactions through text, audio and images.

  • Along with the post introducing the new model on X, OpenAI also released a range of videos (also available through its website release) showing the model’s capabilities, including some from a live demonstration of the model in front of an audience.

  • “Today we are introducing our newest model, GPT-4o, and will be rolling out more intelligence and advanced tools to ChatGPT for free,” reads a release by the company.

  • The model also offers a range of voice options for users to converse and interact with.

  • As per the company, the “o” in GTP-4o stands for “omni”, which marks a step towards a more natural human-computer interaction, which can accept inputs in text, audio or visual using any combination to generate content in text, audio or visual image outputs, in any combination.

  • OpenAI finds that the model can respond to audio inputs in just 232 milliseconds, averaging 320 milliseconds, which matches human response times during conversations.

  • Additionally, the new model matches GPT-4 Turbo’s performance on text in English and code, while significantly improving text in other languages. Currently, OpenAI’s ChatGPT supports over 50 languages.

  • Furthermore, the new model is much faster, uses less tokens for various non-English languages and is 50% cheaper in the API.

  • As per the release, “With GPT-4o, we trained a single new model end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network. Because GPT-4o is our first model combining all of these modalities, we are still just scratching the surface of exploring what the model can do and its limitations.”

  • The company has begun rolling out GPT-4o to ChatGPT Plus and Team users, while the model will be made available for Enterprise users soon.

  • The new model will also be available for ChatGPT Free users but will come with usage limits, whereas the limit for Plus users will be 5x more than free users, while Team and Enterprise users will have higher limits. 


TechDogs-"An Image Of The Test Evaluation Of The New Model Against Other Models As Used In The Announcement"  

What Were The Released Videos About?

 
  • OpenAI posted videos on X as well as its website, with the disclaimer saying, “All videos on this page are at 1x real time.”

  • One of the videos showcased the model’s real-time conversational speech, which included an entire conversation between a tester and the model using audio responses.

  • Through the conversation, the model was also able to gather real-time visual feedback and provide suggestions based on the tester’s behavior.

  • Another video showed how the model provided interview preparation tips based on visual feedback. Here the tester was dressed in a simple dark-grey t-shirt and asked the model if he looked okay, to which the model replied by acknowledging the look was simple but could work in his favor. However, the model mentioned he could set his hair better.

  • All the responses by the model carried a casual-conversational style of voice.

  • The tester then threw on a hat but didn’t specify what he changed, to which the model replied saying he would stand out but not in the way the tester initially intended.

  • Other videos included asking the model to describe the user and the environment around the user, real-time translations between two users speaking different languages, identifying and singing “happy birthday”, responding with sarcasm when requested to, solving math problems, commentating on a game of rock-paper-scissors as professionals do and more.

  • Essentially, the demo videos and recorded versions displayed the model’s ability to gather text, audio and visual feedback to respond in real-time.


Do you think this move by OpenAI will help it gather a better position in the GenAI sector? Do you think its competitors need to make similar moves soon?

Let us know in the comments below!

First published on Tue, May 14, 2024

Enjoyed what you've read so far? Great news - there's more to explore!

Stay up to date with the latest news, a vast collection of tech articles including introductory guides, product reviews, trends and more, thought-provoking interviews, hottest AI blogs and entertaining tech memes.

Plus, get access to branded insights such as informative white papers, intriguing case studies, in-depth reports, enlightening videos and exciting events and webinars from industry-leading global brands.

Dive into TechDogs' treasure trove today and Know Your World of technology!

Disclaimer - Reference to any specific product, software or entity does not constitute an endorsement or recommendation by TechDogs nor should any data or content published be relied upon. The views expressed by TechDogs' members and guests are their own and their appearance on our site does not imply an endorsement of them or any entity they represent. Views and opinions expressed by TechDogs' Authors are those of the Authors and do not necessarily reflect the view of TechDogs or any of its officials. While we aim to provide valuable and helpful information, some content on TechDogs' site may not have been thoroughly reviewed for every detail or aspect. We encourage users to verify any information independently where necessary.

Join The Discussion

Join Our Newsletter

Get weekly news, engaging articles, and career tips-all free!

By subscribing to our newsletter, you're cool with our terms and conditions and agree to our Privacy Policy.

  • Dark
  • Light