AMD Zyphra GPU Cluster Gives Birth To ZAYA 1 MoE AI Model, Smokes Llama3.1

AMD Instinct MI300X render.
AMD is in a celebratory mood after AI research firm Zyphra successfully trained its cutting-edge, large-scale Mixture-of-Experts (MoE) model, ZAYA1, entirely on AMD’s accelerated computing platform, which consists of Instinct MI300X GPUs, Pensando Pollara 400 networking hardware, and the ROCm software stack.

What are MoEs, exactly? You can think of them as breaking up a single, very large language model into say eight individual experts that each have their own area of expertise - one language, one reason, one image recognition, and so forth.

Then there's an intermediary model in front of those expert models that takes input and basically decides, 'Okay, this workload needs experts two, four, six, and eight, and with these weights on each of them'. It's an oversimplified explanation, but enough to get the gist.

From AMD's vantage point, the achievement proves its platform is a viable, high-performance, and production-ready alternative to scale frontier AI. This is a point it made during the Advancing AI event in San Jose, California earlier this year. It also represents a successful collaboration between AMD, IBM Cloud, and of course, Zyphra, all of which worked closely together to deploy a large-scale training cluster.

"This jointly engineered cluster, powered by AMD Instinct MI300X GPUs and utilizing IBM Cloud’s high-performance networking fabric, delivered over 750 PFLOPs1 of Max Achievable FLOPS in training performance," AMD said in blog post.

Zypria credited the massive 192GB of high-bandwidth memory (HBM) found on AMD's Instinct MI300X GPUs as playing a pivotal role in the achievement, as it allowed the firm to simplify its training capabilities while claiming 10x faster model save times using AMD's optimized distributed I/O. Time is money, and such a reduction is certainly significant.

ZAYA1 Base benchmarks graph.
Source: AMD and Zyphra

According to Zypria and AMD, ZAYA-1 Base was able to match or exceed the performance of competing models like Qwen3-4B (Alibaba), Gemma3-12G (Google), Llama-3-8B (Meta), and OLMoE using a fraction of the active parameters (8.3 billion total, 760 million active). The firm also says it demonstrated performance approaching state-of-the-art reasoning models like Qwen3-4B-Thinking, even before explicit instruction turning (SFT/RL).

Single node architecture diagram.

Cluster topology diagram.

To help achieve this, Zphra's clusters employed eight dedicated AMD Pensando Pollara 400Gbps NICs per node, connected via a rails-only topology to deliver 3.2Tbps of bandwidth per node.

Zyphra's technical report gets deep into the weeds, but the major high-level takeaway is that AMD's platform is able to compete with leading open models across reasoning, mathematics, and coding benchmarks.

"Efficiency has always been a core guiding principle at Zyphra. It shapes how we design model architectures, develop algorithms for training and inference, and choose the hardware with the best price-performance to deliver frontier intelligence to our customers," said Krithik Puthalath, CEO of Zyphra.

"ZAYA1 reflects this philosophy and we are thrilled to be the first company to demonstrate large-scale training on an AMD platform. Our results highlight the power of co-designing model architectures with silicon and systems, and we’re excited to deepen our collaboration with AMD and IBM as we build the next generation of advanced multimodal foundation models," Puthalath added.

Outside of the raw performance data, AMD says Zyphra's work confirms that its GPUs, networking, and software stack are both mature and robust enough to be a viable, competitive option for large-scale LLM pretraining.
Paul Lilly

Paul Lilly

Paul is a seasoned geek who cut this teeth on the Commodore 64. When he's not geeking out to tech, he's out riding his Harley and collecting stray cats.