The recent image generation capabilities of Chatgpt have challenged our previous understanding of AI-generated media. The recently announced GPT-4O model displays remarkable abilities to explain images with high accuracy and re-create them with viral impacts, such as inspired by studios. Even it mastery the text in AI-borne images, which was difficult for AI earlier. And now, it is launching two new models, which is able to dissect images for cues to gather more information that can also fail a human look.
Openai announced two new models earlier this week, which takes chatgip’s thinking abilities to a rung. Its new O3 model, which Openai calls its “most powerful logic model”, improves existing interpretation and perception capabilities, is getting better in “coding, mathematics, science, visual perception, and more”, the organization claims. Meanwhile, O4-Mini is a small and sharp model for “cost-skilled arguments” in the same way. The news follows the recent launch of the GPT-4.1 square model of Openai, which brings rapid processing and deep references.
Chatgpt is now “thinking with images”
With improving their abilities for logic, both models can now include images in their logic process, which enables them to “think with images”, Openai announcement. With this change, both models can integrate images in the series of ideas. Going beyond the basic analysis of images, O3 and O4-Mini models can check images more closely and even manipulate them through tasks such as crop, zooming, flipping, or enrichment, which can provide details to obtain any visual signs from images that can improve the potential to provide potential solutions.
With the announcement, it is said that models mix visual and textual arguments, which can be integrated with other chatgpt features such as web search, data analysis and code generation, and it is expected that they form the basis for more advanced AI agents with multimodal analysis.
In other practical applications, you can expect a crowd of objects, to include a congestion of such flow charts or scribe from handwritten notes to the images of real -world objects, and a chatgip is a deeper understanding for a better output, even without a descriptive text Prompt. With this, Openai is close to Gemini of Google, which provides effective ability to interpret the real world through live videos.
Despite the bold claims, Openai is only limiting access to paid members, possibly to prevent its GPU from “melting” again, as it struggles to maintain the demand for calculation for new arguments. So far, O3, O4-Mini, and O4-Mini-HIGH models will be specially available for Chatgpt Plus, Pro, and team members, while enterprise and education tier users receive it in a week’s time. Meanwhile, free users will be able to limit limited access to O4-Mins when they select the “Think” button in the Prompt bar.