You Can Pair Intel Arc And NVIDIA GPUs For A 70% Speed Boost, But There's A Catch

hero fluidx3d render

Every application presents a given workload, and disparate discrete GPUs are typically capable of computing and completing that workload at various performance levels. With that understanding, why not use multiple GPUs for the same application? The answer, of course, is that you can—it just depends on the specific application that you're running.

top bottom domain

In the case of FluidX3D, a GPU-accelerated computational fluid dynamics simulator, you can make use of every single CPU and GPU in your system. This is thanks in part to FluidX3D's foundations in the OpenCL API. Processor manufacturers produce system-level drivers that expose their accelerators to OpenCL, and applications that hit the API can send standard operations and data types to those chips—whatever they may be.

system information fluidx3d
FluidX3D screenshot showing the two graphics cards installed (click to zoom).

This isn't exactly a novel concept; OpenCL was actually created specifically to take advantage of heterogeneous processing. However, the results from FluidX3D are pretty phenomenal. By adding an NVIDIA Titan Xp 12GB card to his machine based around an Intel Arc A770 16GB GPU, developer Moritz Lehmann (@ProjectPhysx on Mastodon and /u/ProjectPhysX on Reddit) was able to accelerate the operation by about 70% over what either card could do alone.

Now, there's no real "trick" or gotcha here. If you're doing computational fluid dynamics, slap as many GPUs into your system as you can, because OpenCL will gladly use them all. However, that's actually exactly the "catch" we mentioned in the headline, because obviously, this solution doesn't work for video games. But why not? After all, just like computational fluid dynamics, games are a finite and well-defined workload; why not spread them across multiple GPUs?

hitman absolution frame pacing chart
This looks really bad in practice, but it was commonplace with multi-GPU setups.

The answer comes down mainly to two things. First of all, video games are highly latency-sensitive, which means the delivery of frames needs to be both rapid and consistent. This is difficult enough when you're simply working with two identical GPUs from the same vendor, nevermind if you're having to deal with separate GPU driver stacks and distinct GPU architectures.

The more pressing problem is that there are actually many stages of preparing a frame for output within a video game, and GPUs are designed to do all of these stages more or less in order. It's not like you can have two different GPUs working on the same frame at the same time. The multi-GPU solutions of yore, starting with the ATI Rage Fury MAXX (which we reviewed in 2001!), worked around this problem by doing "Alternate Frame Rendering" where each graphics processor simply worked on the "even" frames, while the other chip did the "odd" frames.

combined memory pools
Slide from AMD explaining the potential (and unrealized) benefits of DX12 EMA mode.

This is a decent solution, but you still run into the synchronization issue we noted before. Microsoft attempted to address the multi-GPU problem with "Explicit Multi-Adapter" (EMA) mode in DirectX 12, but because ultimately the number of people with systems sporting dual GPUs of similar strength is proportionally equivalent to "zero" (relative to the PC games market as a whole), very few developers even experimented with this method.

Ultimately, the thing to take away from Dr. Lehmann's results is that while it may have died out in the world of gaming, multi-GPU is definitely still alive and well in the domain of GPU compute. It's not just physics simulation, either. Workloads like large-scale video processing, offline rendering, protein folding, and of course, deep learning can all make use of as much GPU power as you can throw at them.