In a notable development for AI, OpenAI debuted an advanced version of its well-known language model, ChatGPT. Known as GPT-4o, where the “o” stands for “omni,” reflecting its ability to process text, speech, and video, this powerful AI will be progressively integrated into OpenAI’s developer and consumer products over the coming weeks.
While maintaining the intelligence level of GPT-4, GPT-4o significantly expands its capabilities across different mediums. “It reasons across voice, text, and vision,” explained Mira Murati, OpenAI CTO, during a presentation. “This is crucial for shaping the future of human-machine interaction.”
Previously, OpenAI’s most advanced model, GPT-4 Turbo, could analyze text and images, performing tasks like extracting text or describing image content. GPT-4o builds upon this foundation by adding speech recognition and understanding. This opens doors to a range of exciting possibilities.
GPT-4o introduces significant enhancements to the user experience within OpenAI’s AI-powered chatbot, ChatGPT. While the platform has long featured a voice mode that converts the chatbot’s responses into speech using a text-to-speech model, GPT-4o takes this functionality to a whole new level, effectively transforming the interaction with ChatGPT into a more assistant-like experience.
For instance, users now have the ability to pose questions to ChatGPT powered by GPT-4o and seamlessly interrupt its responses. This real-time responsiveness, as described by OpenAI, enhances the fluidity of conversations, making interactions feel more natural and engaging. Additionally, GPT-4o has the remarkable capability to detect nuances in a user’s voice, enabling it to adjust its own responses accordingly. This includes generating voices in a diverse range of emotive styles, including singing, further enriching the conversational experience.
GPT-4o takes ChatGPT’s capabilities beyond just understanding the user’s voice. Now, users can show ChatGPT a picture or even a screenshot from their computer screen and get instant answers. Imagine snapping a photo of a garment and asking, “What brand of shirt is this?” or pointing the camera at a complex program and saying, “What’s going on in this code?” GPT-4o’s enhanced vision allows ChatGPT to analyze the image and provide relevant information, making it a more versatile and helpful assistant.
ChatGPT’s desktop app in use in a coding task.
Image Credits: OpenAI
Mira Murati, OpenAI’s CTO, emphasizes that these features are constantly evolving. While GPT-4o can currently translate a picture of a menu written in a foreign language, future iterations could empower ChatGPT to “watch” a live sports game and explain the rules in real-time.
“These models are becoming increasingly complex,” Murati acknowledges. “However, our primary goal is to streamline the user experience. We want interaction with ChatGPT to feel natural and effortless, where you can focus entirely on collaborating with the AI, not navigating a complex interface.”
Murati highlights a shift in OpenAI’s priorities. “For years, we’ve been laser-focused on boosting the intelligence of these models,” she explains. “With GPT-4o, however, we’re taking a significant leap forward in terms of user-friendliness.” This focus on ease of use suggests a future where interacting with AI assistants becomes seamless and intuitive, opening doors for broader adoption and impactful applications.
GPT-4o is now accessible within the free tier of ChatGPT. Additionally, subscribers to OpenAI’s premium ChatGPT Plus and Team plans will benefit from “5x higher” message limits. It’s important to note that when users reach the rate limit, ChatGPT will automatically switch to GPT-3.5, an older and less capable model.
Source: here
Discussion about this post