Even before Monday’s conference, the co-founder and president of OpenAI, Sam Altman, announced that what his company will show the world will be a breakthrough. “For me it’s magic,” he explained. And he was actually right, as probably everyone who watched the videos shared, among others, will confirm. technological demo of the new GPT-4o language model on YouTube.
OpenAI demonstrated groundbreaking technology. “Magic”
Let’s start with the fact that the letter “o” in the name of this model did not appear by accident. We are dealing here with a veritable “omnimodel” or “all-model”, which combines the functions of at least several LLM models.
The new model can not only communicate with us using text commands, but can also analyze sound and image in real time. As OpenAI explains, GPT-4o responds to audio with an average of 320 milliseconds, which corresponds to the response time of a conversation between two people.
The speech recognition function already appeared in versions GPT-3.5 or GPT-4, but it was quite limited then and worked much slower. Now the entire process takes place within one neural network, thanks to which GPT-4o instantly became the most “human” voice assistant, leaving far behind solutions such as Siri from Apple or Google Assistant.
At Monday’s presentation, we could see, among other things, how ChatGPT with the built-in GPT-4o model helps a student solve a math problem. The conversation takes place naturally. The new model can react quickly when we interrupt it mid-sentence and provide it with new information. When we show him a piece of paper with an equation written on it, he will not limit himself to describing what he sees, but will explain step by step how to solve it – like an efficient and patient teacher.
The new model also works perfectly as a translator. In the presented technology demo, we could see how ChatGPT copes with translation from Italian to English. Everything was done “on the fly” – without transcription.
GPT-4o can also analyze the environment in real time. Recognizes people and objects that he “sees” and can refer to the information obtained in this way in real time.
Perhaps the most “magical” function of the new language model, which brings it closer to visions previously known only from sci-fi movies, is the ability to recognize and name human emotions. During the presentation, one of the OpenAI employees brought the smartphone camera lens close to his face, to which the AI responded with the following question.
Would you like to share the reason for your good mood?
A surprise from OpenAI. GPT-4o will be available for free
Importantly, the new functions of the GPT-4o model will also be available to users of the free version of the ChatGPT application. They will also be able to create their own chatbots, as well as access to the GPT Store, which has previously been reserved only for premium users.
At Monday’s conference, OpenAI also presented the ChatGPT application for Apple devices running macOS. For now, only users of the paid ChatGPT Plus subscription can use it. To invoke the OpenAI assistant on Mac, simply use the Option+Space keyboard shortcut.
It is worth adding that the industry has been gossiping about strategic cooperation between OpenAI and a company from Cupertino for many weeks. It is possible that it will be announced during the upcoming WWDC 2024 conference.
Source: Gazeta

Mabel is a talented author and journalist with a passion for all things technology. As an experienced writer for the 247 News Agency, she has established a reputation for her in-depth reporting and expert analysis on the latest developments in the tech industry.