NVIDIA GPUs Sweep MLPerf Training Benchmark, Flexing AI Dominance

Rows of NVIDIA servers.
There’s no doubt that NVIDIA is the dominant player in graphics processing units (GPUs), both for gaming and inside the data center—one only needs to look at NVIDIA’s earnings over the past year to see just how well it’s doing quarter after quarter, with monster gains becoming the norm. Beyond the dollars and cents, however, NVIDIA’s is now touting a clean sweep of MLPerf Training benchmarks, which highlight how powerful its GPU hardware is for emerging AI workloads.

This is important because AI is a burgeoning industry with billions of dollars at stake. From laptops (including Microsoft’s Copilot+ initiative) to smartphones and everything else, AI processing, training, and capabilities are key to bringing next-gen experiences into the mainstream.

While AI has been a hot topic recently, NVIDIA recognized the trend much earlier, hence why its data center business caught up to and surpassed its gaming business a long while ago. Those early investments in the data center are now paying off in a big way. It’s not just about releasing new hardware all the time, though—scalability and software development both play a key role, too.

NVIDIA slide showing triple the LLM training performance versus last year.

This is reflected in the latest round of MLPerf Training (version 4.0) benchmarks. NVIDIA’s EOS-DFW DGX SuperPOD was able to more than triple its performance on the large language model (LLM) benchmark that’s based on GPT-3 175B.

“Now featuring 11,616 NVIDIA H100 Tensor Core GPUs connected with NVIDIA Quantum-2 InfiniBand networking, EOS achieved this remarkable feat through larger scale and extensive full-stack engineering,” NVIIDA says.

NVIDIA slide showing Hopper's 27% MLPerf benchmark gain versus last year.

NVIDIA also points out that its submission of a 512 H100 GPU configuration is now up to 27% faster versus just one year ago,thanks to various optimizations in the company’s software stack.

“This improvement highlights how continuous software enhancements can significantly boost performance, even with the same hardware. The result of this work is a 3.2x performance increase in just a year coming from a larger scale, and significant software improvements. This combination also delivered nearly perfect scaling—as the number of GPUs increased by 3.2x, so did the delivered performance,” NVIDIA adds.

NVIDIA MLPerf Training Benchmarks records (slide).

NVIDIA’s clean sweep also included five new records, including Graph Neural Network, LLM Fine-Tuning, LLM, Text-to-Image, and Object Detection. These are in addition to existing records NVIDIA had already set in Image Classification, NLP, Medical Imaging, and Recommendation, as outlined in the slide above.

NVIDIA slide showing HGX H200 driving TOC.

The upshot for businesses is that this all enables faster deployment of generative AI models, which in turn saves both time and money.

NVIDIA also points to widespread industry adoption, noting that no less than 10 of its partners submitted results, including ASUS, Dell, Fujitsu, Gigabyte, HP, Lenovo, Oracle, Quanta Cloud Technology, Supermicro, and Sustainable Metal Cloud. What this all means is that NVIDIA’s spirited push into the data center should continue to deliver monster gains, especially as AI gains more momentum.