The Incarnation of Her: ChatGPT 4o, Voice AI, and applications
Hey folks, Kavir here. welcome to yet another edition of The Discourse. I am glad you could join me today. It’s a great time to be an AI optimist. Yesterday’s OpenAI event was fantastic and today’s Google IO event is much awaited to see what the sleeping AI giant has for consumers. Today’s piece is on yesterday’s announcements from OpenAI centered around Voice AI.
OpenAI Spring Update: Voice AI, GPT 4 ‘Omni’, ChatGPT Mac App and more
We knew before the event that voice was touted to be the big feature to be released. The Information covered it and there were many hints including Sam Altman liking a tweet mentioning that OpenAI was launching ‘her’.
And then after the event tweeting just the word ‘her’.
Voice is a useful medium and I’ve lately found myself using it regularly for use cases including writing first drafts (including this article), listing my next day’s tasks, brainstorming, venting, and more use cases. I’ll write more on these, so subscribe if you haven't.
But with the introduction of GPT-4o, particularly its advanced voice interaction and unified Omni model, represents a step function upgrade to voice interactions and poised to redefine interactive assistants' roles and capabilities in our daily lives and professional environments.
In this piece, we’ll discuss my personal experience with ChatGPT voice, the new omni model, comparison with existing voice assistants, the new ChatGPT Mac app and the new text model.
Let’s start with voice.
Personal Experience with ChatGPT Voice
I’ve used it extensively ever since it launched, from venting out my concerns and getting empathetic and supportive responses to brainstorming ideas, and thinking through solutions — often on walks when I don’t want to type and they usually are 5-10 minute conversations.
Since I speak slowly with pauses, I end up triggering ChatGPT voice to respond before I am done. So I end up holding onto the screen and using it as a walkie-talkie rather than a normal conversation.
There have been a few things that break the experience.
Sometimes the connection breaks and you have to repeat what you say. That’s a big no-no. The latency between you finishing speaking and the response is long enough to break the flow. In yesterday’s event, they deconstructed the way it used to work with the GPT4 model. The AI had to first transcribe your voice to text, feed the model with text, get an output, and then convert the output from text to speech.
I found that the voice interaction was already more human-like than any other interaction I had before.
This changed a bit when Hume, the empathic voice agent, launched. It was real-time, could decipher emotions through a combination of text and voice tone, was able to emote, and handle interruptions.
Yesterday’s update basically shipped all of that, with a way more powerful reasoning model.
The Omni Model
The simplified version of yesterday’s update is that they have managed to create a model that natively supports text, audio, vision — all in one. It takes the input as is, processes it, and returns some intelligence, reasoning, and response based on that.
And it’s super fast.
This unlocks a lot of use cases that I’ll talk about in a bit. But before that, let’s look at the existing ecosystem of voice assistants and how they have all flattered to deceive.
Comparisons to Siri, Alexa, and Other Assistants
Voice assistants are nothing new. Siri, Apple's virtual assistant, launched on October 4, 2011, alongside the iPhone 4S. 2011! That’s a whole 12.5 years ago now.
Amazon Alexa launched on November 6, 2014, with the release of the Amazon Echo smart speaker. Google Assistant was introduced on May 18, 2016, during Google's I/O developer conference and became available on the Google Pixel smartphones later that year.
It works for simple tasks like setting a timer, setting an alarm, telling me a joke, playing a song on Spotify, asking a fact, or interacting with your home devices — like switching on and off lights. I use Alexa for playing music and have set a night-time routine for sleep sounds.
Some do some things better than others. MKBHD released this comparison video in 2023. Worth a watch.
But none of them have leveled up their intelligence to the point of it being useful. All of them are built on previous tech. Here is Siri’s response to me asking it to call me an Uber.
ChatGPT voice would change things. And if rumor has it, if it’s integrated into Siri and onto 1.3B iPhones1, it would be a game changer. We’ll know more at WWDC on June 10th, 2024.
Applications of Voice AI
Some of the applications that OpenAI demoed were interesting. Live language translation, bedtime stories for kids, jokes, homework assistants, interview prep, and what I use it for: assistant to create tasks, write emails and messages, write first drafts, act as a life coach, act as a career coach, and more.
Apart from these, I would be really interested in applications that were demoed by Google 6 years ago!
Google Duplex allowed the AI to make calls for you - to make restaurant, hair cut, etc bookings.
And also allows you to screen calls and filter out scam and spam calls. The AI could talk to these callers and screen them for me, so I only pick up the worthwhile ones.
I would love to not have to speak on the phone for these tasks ever again.
Even though Google demoed this a while ago, we’ve not seen it live yet, which has been disappointing.
Maybe another feature for OpenAI to integrate into the iPhone. Of course, Google will be forced to finally launch this publicly soon at Google IO (May 14th, 2024).
The new ChatGPT Mac App
Now, coming to the laptop, they have announced a new ChatGPT app, which I’ve got access to. It’s an improved UI with a search filter and direct access to voice conversations, with a keyboard shortcut to access ChatGPT similar to Spotlight or Raycast. You can drag and drop images onto the chat and ask questions about them. This makes the previous ChatGPT apps on your Mac obsolete.
Here is how you can get access.
In a future iteration - You can also live share your screen, and the AI will respond, but that’s not yet available. This ability to view what’s on your screen and have an ambient assistant always available in terms of voice is going to be really powerful.
The new text model GPT 4o
Impressions on GPT 4o are that it’s extremely fast, faster than 3.5 turbo, and it seems to be more intelligent. I’ve used it in some of my business APIs, which have provided mixed results because I was used to the previous version. This changes things a bit—in some ways better, some ways worse — but I think that can be fixed with better prompt engineering and giving a few examples.
This update might reduce my usage of Claude now because things are more integrated—it has memory, it’s fast, it’s smarter, and it has code interpreter. It can use my writing style to write things and do a lot of things. It can access the GPT store, and I can create more GPTs for my use cases.
Final Thoughts
Overall, it’s not the big bang update. It’s not GPT-5. It’s not a search engine. But it is incremental and good in a way. It also allows free users to access GPT4, driving further adoption and acceptance of AI. 3.5 wasn’t cutting it anymore and it was now a lousy first impression of AI for people. For example, I can’t wait to get my parents on it. Everyone can benefit from an IQ boost and an intelligent partner.
We still don’t have access to the live voice updates yet, but when that’s out, I can’t wait to test it out and improve my quality of life with the new intelligent voice assistant.
Thanks to ChatGPT 4o and Lex.page for providing feedback on early drafts of this piece.
Thanks for reading The Discourse! Subscribe for free to receive new posts and support me.
That's it for today, thanks for reading!
What do you think ofVoice AI and the new updates?
Reply or comment below, and I'll reply to you. Give feedback and vote on the next topic here.
Talk to you soon!
Follow me on Twitter @kavirkaycee
1 Number of iPhone users https://backlinko.com/iphone-users
Subscribe to my newsletter
Read articles from Kavir Kaycee directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Kavir Kaycee
Kavir Kaycee
Product Leader, writing a product management newsletter https://thediscourse.co • alum: On Deck, ISB