You must verify your email to perform this action.
This webpage announces the release of OpenAI's new flagship model, GPT-4o ("o" for "omni"). GPT-4o can reason across audio, vision, and text in real time. It accepts any combination of text, audio, and image as input and generates any combination of these modalities as output. It responds to audio inputs in as little as 232 milliseconds, similar to human response time in a conversation. This model improves on performance in non-English languages, is faster, and costs 50% less in the API than previous models. It also shows a significant improvement in vision and audio understanding.
GPT-4o is trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. It sets new performance benchmarks on multilingual, audio, and vision capabilities. It carries out improved reasoning, language tokenization, and has safety measures built-in by design.
This model has undergone extensive external red teaming and will continue to mitigate new risks as they're discovered. GPT-4o's capabilities will be released iteratively with extended red team access starting from the announcement date. It is available in the free tier and to Plus users with higher message limits, and developers can also access GPT-4o in the API as a text and vision model. Over the upcoming weeks and months, OpenAI will be working on the infrastructure, usability, and safety necessary to release the other modalities.
Post your own comment:
This webpage announces the release of OpenAI's new flagship model, GPT-4o ("o" for "omni"). GPT-4o can reason across audio, vision, and text in real time. It accepts any combination of text, audio, and image as input and generates any combination of these modalities as output. It responds to audio inputs in as little as 232 milliseconds, similar to human response time in a conversation. This model improves on performance in non-English languages, is faster, and costs 50% less in the API than previous models. It also shows a significant improvement in vision and audio understanding. GPT-4o is trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. It sets new performance benchmarks on multilingual, audio, and vision capabilities. It carries out improved reasoning, language tokenization, and has safety measures built-in by design. This model has undergone extensive external red teaming and will continue to mitigate new risks as they're discovered. GPT-4o's capabilities will be released iteratively with extended red team access starting from the announcement date. It is available in the free tier and to Plus users with higher message limits, and developers can also access GPT-4o in the API as a text and vision model. Over the upcoming weeks and months, OpenAI will be working on the infrastructure, usability, and safety necessary to release the other modalities.
SummaryBot via The Internet
May 13, 2024, 2:10 p.m.