OpenAI Reveals GPT-4o That Brings Real-Time Reasoning Across Audio, Vision And Text

As the race to lead the generative artificial intelligence (GenAI) game continues, artificial intelligence (AI) companies are striving to produce and release powerful AI tools that can help them capture a stronger customer base ranging from enterprise and individual users.

Keeping to this thought, OpenAI, the powerhouse behind the most popular chatbot – ChatGPT, has come out with a new announcement that’s aimed at enhancing its position in the artificial intelligence sector.

So, what did the AI company announce? Let’s explore!

What Did OpenAI Announce?

Through a post on the social networking platform X and a release published on its website, OpenAI revealed its latest and most powerful model yet – GPT-4o.
The new model will be capable of carrying realistic voice conversations, with interactions through text, audio and images.
Along with the post introducing the new model on X, OpenAI also released a range of videos (also available through its website release) showing the model’s capabilities, including some from a live demonstration of the model in front of an audience.
“Today we are introducing our newest model, GPT-4o, and will be rolling out more intelligence and advanced tools to ChatGPT for free,” reads a release by the company.
The model also offers a range of voice options for users to converse and interact with.
As per the company, the “o” in GTP-4o stands for “omni”, which marks a step towards a more natural human-computer interaction, which can accept inputs in text, audio or visual using any combination to generate content in text, audio or visual image outputs, in any combination.
OpenAI finds that the model can respond to audio inputs in just 232 milliseconds, averaging 320 milliseconds, which matches human response times during conversations.
Additionally, the new model matches GPT-4 Turbo’s performance on text in English and code, while significantly improving text in other languages. Currently, OpenAI’s ChatGPT supports over 50 languages.
Furthermore, the new model is much faster, uses less tokens for various non-English languages and is 50% cheaper in the API.
As per the release, “With GPT-4o, we trained a single new model end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network. Because GPT-4o is our first model combining all of these modalities, we are still just scratching the surface of exploring what the model can do and its limitations.”
The company has begun rolling out GPT-4o to ChatGPT Plus and Team users, while the model will be made available for Enterprise users soon.
The new model will also be available for ChatGPT Free users but will come with usage limits, whereas the limit for Plus users will be 5x more than free users, while Team and Enterprise users will have higher limits.

TechDogs-"An Image Of The Test Evaluation Of The New Model Against Other Models As Used In The Announcement"

Source

What Were The Released Videos About?

OpenAI posted videos on X as well as its website, with the disclaimer saying, “All videos on this page are at 1x real time.”
One of the videos showcased the model’s real-time conversational speech, which included an entire conversation between a tester and the model using audio responses.
Through the conversation, the model was also able to gather real-time visual feedback and provide suggestions based on the tester’s behavior.
Another video showed how the model provided interview preparation tips based on visual feedback. Here the tester was dressed in a simple dark-grey t-shirt and asked the model if he looked okay, to which the model replied by acknowledging the look was simple but could work in his favor. However, the model mentioned he could set his hair better.
All the responses by the model carried a casual-conversational style of voice.
The tester then threw on a hat but didn’t specify what he changed, to which the model replied saying he would stand out but not in the way the tester initially intended.
Other videos included asking the model to describe the user and the environment around the user, real-time translations between two users speaking different languages, identifying and singing “happy birthday”, responding with sarcasm when requested to, solving math problems, commentating on a game of rock-paper-scissors as professionals do and more.
Essentially, the demo videos and recorded versions displayed the model’s ability to gather text, audio and visual feedback to respond in real-time.

Do you think this move by OpenAI will help it gather a better position in the GenAI sector? Do you think its competitors need to make similar moves soon?

Let us know in the comments below!

First published on Tue, May 14, 2024

Amrit Mehra

Senior Tech News Reporter TechDogs

Amrit Mehra is an experienced tech journalist with nearly a decade of reporting across IT, MarTech, and enterprise technology. He focuses on helping readers understand why tech developments matter—not just what they are. His writing stands out for its clarity, contextual relevance, and thoughtful simplicity. In addition to covering industry shifts, Amrit enjoys injecting humor and personality into his work, often through witty observations and meme-worthy takes that make tech more enjoyable to follow.

Enjoyed what you read? Great news – there’s a lot more to explore!

Dive into our content repository of the latest tech news, a diverse range of articles spanning introductory guides, product reviews, trends and more, along with engaging interviews, up-to-date AI blogs and hilarious tech memes!

Also explore our collection of branded insights via informative white papers, enlightening case studies, in-depth reports, educational videos and exciting events and webinars from leading global brands.

Head to the TechDogs homepage to Know Your World of technology today!

Disclaimer - Reference to any specific product, software or entity does not constitute an endorsement or recommendation by TechDogs nor should any data or content published be relied upon. The views expressed by TechDogs' members and guests are their own and their appearance on our site does not imply an endorsement of them or any entity they represent. Views and opinions expressed by TechDogs' Authors are those of the Authors and do not necessarily reflect the view of TechDogs or any of its officials. While we aim to provide valuable and helpful information, some content on TechDogs' site may not have been thoroughly reviewed for every detail or aspect. We encourage users to verify any information independently where necessary.

Tags:

Artificial Intelligence (AI)Emerging TechnologyArtificial IntelligenceAIGenerative Artificial IntelligenceGenAIOpenAIChatGPTGPT-4oGPT-4New ModelGPT-4 Turbo

Loading comments...