OpenAI confirmed it won’t be rolling out advanced voice features in ChatGPT until later this year but has continued to provide insights into what we can expect. The latest shows off GPT-4o’s impressive linguistic capabilities, teaching users Portuguese.
GPT-4o was unveiled at the OpenAI spring update earlier this year and with it the impressive advanced voice capabilities. They also revealed some vision and screen-sharing features that we now know won’t come until much later in the year or possibly even early next year
One of the big selling points included in that original demo was GPT-4o’s ability to act as a live translation device, but what we’re starting to see from some of the new demos is that it can also be an incredible language teacher. This is something I’ve experienced for myself to a lesser degree with the current voice model.
In a new OpenAI video, a native English speaker trying to learn Portuguese and a Spanish speaker with a basic understanding of the language used ChatGPT to help them improve their skills. At different points they ask it to slow down or explain terms — and it does it perfectly.
Learning languages with GPT-4o
What makes the new ChatGPT-4o advanced voice so exciting is the fact that it’s natively speech-to-speech. Unlike previous models which have to first convert the speech into text and do the same in reverse for the response, this just understands what you’re saying naturally.
The ability to natively understand speech and audio allows for some exciting features including working across multiple languages, putting on different accents or changing the speed tone and vibrance of a voice, essentially making it the perfect teacher
Its native speech capabilities give it the ability to listen to what you’re saying analyze the way you’ve said certain words and even your accent. It can then offer direct feedback based on what it’s heard rather than assessing a transcript.
In addition to all of this, GPT-4o also has impressive reasoning and problem-solving capabilities so can even identify where you’re making a mistake in less obvious ways.
What else have we seen from GPT-4o?
There have been multiple demos of the new advanced voice features including some that weren’t meant to be released. One of these shows that it’s capable of creating sound effects while telling you a story and another reveals it is capable of using multiple different voices.
In the official videos shared by OpenAI on YouTube, we’ve seen it used as a math teacher. In the video, it is working on an iPad where the screen is being shared and the AI shows advice and information on every aspect of a math problem.
Advanced voice mode and particularly the ability to understand speech natively feels like one of the most significant leaps in artificial intelligence since OpenAI put a chat interface on its GPT-3 model back in November 2022.