OpenAI ChatGPT GPT-4 Turbo Gets A Mid-Life Boost, Here’s What You Should Know
In the most recent update, which has no fancy name, GPT4 Turbo is now "significantly smarter and more pleasant to use", according to OpenAI founder Sam Altman. While he didn't elaborate, it seems like Altman is primarily talking about changes to the model that have made its responses when being used as a chatbot "more direct, less verbose, and more conversational", for which OpenAI provides the following example as proof:
The updated model also scores higher on most common AI benchmarks, including the Graduate-Level Google-Proof Q&A Benchmark. That challenging dataset was designed to test the abilities of LLMs and comprises a 448-question multiple-choice test with questions spread across every scientific domain. The questions are designed by experts in the respective fields to judge not only how well LLMs can answer questions, but also how well they can be overseen by humans. This test is GPT-4's weakest benchmark, and the new version improves its score on this test from approximately 35% to just under 50%, which is an excellent result on this difficult benchmark.
Other benchmarks that see gains include the reasoning-focused MATH test, the Multilingual Grade School Math (MGSM) benchmark, and the Discrete Reasoning Over Paragraphs (DROP) benchmark. DROP in particular is one of the most taxing AI benchmarks, and GPT-4 Turbo was already one of the best models in this test, but the new release improves its score on this difficult test to a bit over 80%, putting it in the exclusive category of models to reach such heights that includes, uh, itself. (The next best result is from Google's Gemini 1.5 Turbo at 78.9%.)