
OpenAI is releasing a handful of upgrades, available through ChatGPT, that will help people interact more naturally with its technology. Many of the features, announced on Monday, could convince more people to speak to the AI rather than text with it.
Most changes arrived courtesy of a new, more powerful artificial intelligence model that enhances ChatGPT’s ability to listen and respond by voice, instead of text. OpenAI said it would offer people limited, free access to the new technology, along with tools that once required a subscription.
The company announced the updates in a live stream from its San Francisco headquarters the day before Google’s developer conference, where the search giant is expected to announce news about its own competing AI products.
Demonstrations conducted in an AI voice that sounded like Scarlett Johansson showed ChatGPT translating a live conversation from English to Italian and vice versa. The updated voice can mimic a wider range of human emotions, and allows the user to interrupt. It chatted with users with fewer delays, and identified an OpenAI executive’s emotion based on a video chat where he was grinning.
Advertisement
The new model, called GPT-4o (“o” stands for “omni”) can interpret user instructions delivered via text, audio and image — and respond in all three modes as well. For example, users can show ChatGPT software code and the chatbot will describe out loud, in conversational English, what the code does. Users can also ask it to translate in real time, all through voice, and respond aloud to handwritten math equations or written messages.
Users of ChatGPT Free, the company’s unpaid model, will get limited access to GPT-4o, as well as limited access to capabilities that previously cost money, including browsing the web, advanced data analysis and the GPT Store (OpenAI’s version of the App Store). Subscribers will have a message limit that’s five times greater.
“We are a business and will find plenty of things to charge for, and that will help us provide free, outstanding AI service to (hopefully) billions of people,” CEO Sam Altman wrote on his personal blog after the event.
Advertisement
OpenAI already offered different voices when the feature first launched, but the company opted for the female one that sounds like Johansson, who voiced the omnipotent AI in the 2013 movie “Her.” Altman and others teased the crossover.
These new capabilities will be rolled out in stages. Today, text and image input are available in ChatGPT and its API, a software portal for businesses. In the coming weeks, OpenAI said it will deliver voice and video. The company said accessing GPT-4o through its API, or application programming interface, would be twice as fast and 50 percent cheaper.
The company also demonstrated its technology working as part of mobile app Be My Eyes,which helps people with vision impairments. In the filmed demo, the AI helped a person hail a taxi cab by telling him when it was approaching and whether its availability light was lit.
Advertisement
In a briefing afterward, Mark Chen, OpenAI’s head of frontiers research, told The Washington Post that the improved capabilities were a result of building GPT-4o as a single model, while OpenAI previously used separate models for functions like text-to-speech and speech recognition.
Features like ChatGPT’s ability to identify emotion are part of OpenAI’s goal of growing intelligence beyond text, Chen said. “The way the user expresses themselves, you can tell a lot about their intent and make our AIs more helpful,” he said.
The most surprising moment from the event was listening to ChatGPT’s voice feature appear to flirt with Barret Zoph, OpenAI’s head of post-training, who participated in the demos. After having ChatGPT solve a math problem Zoph asked the bot to read a handwritten message that said, “I love ChatGPT.” The bot correctly read the message and said, “That’s sweet,” sounding touched. When Zoph said he appreciated the AI’s help and tried to move on, ChatGPT interrupted and, unprompted, said, “Wow. That’s quite the outfit you’ve got on.”
Advertisement
Still, today’s demos appeared lackluster to observers who pay close attention to OpenAI, which already offered a similar version of many of the same features. The company, which releases upgrades faster than its rivals, has a habit of timing announcements in an attempt to upstage Google.
Last week, speculation circulated that OpenAI might be releasing its own search engine when whispers leaked about a planned live event at its headquarters. But the company quashed those rumors when it pivoted to streaming instead.
During the briefing, chief technology officer Mira Murati kept a tight lid on the contents of the new model’s training data — a massive corpus of information, typically scraped from the web. Generative AI models like ChatGPT or Google’s Gemini are not programmed; instead they learn probabilities from these data sets.
Advertisement
OpenAI is facing a number of copyright infringement allegations over its use of data, including a lawsuit from the New York Times. At the same time, the company has been announcing deals with more media companies to license their data.
Murati said GPT-4o was trained on licensed content from its partners; human feedback from the people who label data and safety test the models; and publicly available sources.
For the latter, Murati said OpenAI collects “industry standard machine learning data sets” and uses a web crawler to scrape information, “very similar to search engines.” But she emphasized that the company plays by the rules when it comes to scraping other people’s work. “We will filter out any stuff that’s behind paywalls. We will not use anything where people have opted out. We won’t use anything that’s against our content policies and or aggregators of [personally identifiable information],” she said.
During the demonstration, Murati touted GPT-4o’s enhanced abilities in 50 languages. She declined to specify whether the company had used more data in those languages, and Chen suggested the enhancement was a result of the model’s ability to generalize.
“We’re not disclosing the specific make up of the data or where the data come from,” Murati said. “That’s sensitive trade secrets.”
ncG1vNJzZmivp6x7uK3SoaCnn6Sku7G70q1lnKedZMGmr8enpqWnl658c3yRbWZpbV9mgHC7z56lmqFdo7K4ecWemK2toprAcA%3D%3D