Intel's Powerhouse BMG-G31 GPU For Arc Pro B70 Breaks Cover In Official Document
If you can't read the tiny image, here are the relevant parts: "G31 validation has been added in this release," followed by "performance is measured on a non-golden setup B70 system". This clearly indicates that BMG-G31 and B70 are one and the same. What's interesting is the next line, though: "compare with G21: 1.49x geomean under SLA constraints and 1.13x geomean at fixed batch size."
To break that down, what the Intel engineer who wrote this is saying is that the new GPU offers nearly 50% higher performance than BMG-G21 when you actually enforce a Service-Level Agreement, which usually comes with strict latency and throughput requirements. Throughput gains over BMG-G21 are less impressive when you can throw latency out the window and crank batch size, though, which suggests scaling isn't purely linear at fixed batch sizes, likely due to communication or memory bottlenecks.
Most likely, it's PCI Express overhead, because we also see the note "limited perf for allreduce with small message size" in the changelog. Allreduce is a collective communication operation used in multi-GPU workloads to synchronize intermediate results across GPUs. Given that, we know Intel is working on multi-GPU performance, which makes sense, as clustering multiple GPUs is the only way Intel is going to compete with NVIDIA or AMD in raw AI performance given the hardware stack it has on hand.
PCI Express is fast, but it's not nearly as fast as the dedicated GPU to GPU interconnects (such as NVLink) used for synchronizing GPUs in other systems. Most likely, multi-GPU scaling overhead isn't fully optimized yet on this system, and we can infer that from the engineer's remark that "throughput should be better on system with golden BKC setup." BKC in this case is "best-known configuration," or Intel's recommended setup.
What does all this mean? Well, Intel is testing real multi-GPU inference scaling, and the folks doing it care about SLA-bound serving performance, which means that this is targeting production inference, not just hobbyist anime waifu generation. We also can guess that inter-GPU communication is still a bottleneck, but that the company is aware of this and working on it.
None of this is a big surprise considering the existence of Intel's Project Battlematrix system (pictured top) with up to eight Arc Pro B60 GPUs hooked to a single Xeon 6 processor. If Intel wants to move these Arc Pro GPUs for AI workloads, it's going to have to make sure its inter-GPU communication is silky smooth to extract the maximum performance out of the octo-barreled workstations. It also follows that a larger GPU is a quick path to increasing performance, especially when latency constraints are in place.
Really, the most important part of today's news is simply the confirmation direct from Intel that yes, BMG-G31 is coming, and yes, it's going to be the Arc Pro B70. We're still hoping against hope that some of those BMG-G31 GPUs make their way to the Arc gaming segment, but we're not holding our breath.
Top image: an Intel Battlematrix system with four Arc Pro B60 GPUs. Image: Intel

