NVIDIA Sheds Light On Lack Of PhysX CPU Optimizations


NVIDIA's Response

We spoke to NVIDIA regarding the state of their PhysX SDK and why Kanter's evaluation shows so little vectorization. If you don't want to dig through all the details, the screenshot below from The Incredibles summarizes NVIDIA's response quite well.


We're not happy, Dave. Not. Happy.

Those of you who want a more detailed explanation, keep reading:

PhysX Evolution
In 2004, Ageia acquired a physics middleware company named NovodeX. Back then, what we now call PhysX was a software-only solution, similar to Havok. Ageia's next step was to build a PPU (Physics Processing Unit) that could accelerate PhysX in hardware. This hardware-accelerated version of the SDK was labeled Version 2, but while it added PPU acceleration, the underlying engine was still using NovodeX code. According to the former Ageia employees still on staff at NVIDIA, NovodeX had begun building the original SDK as far back as 2002-2003.

By the time NVIDIA bought Ageia in 2008, the company had already ported PhysX to platforms like the XBox 360 and the PS3. NVIDIA's first goal was to port PhysX over to the GPU and it logically focused its development in that area. According to NVIDIA, it's done some work to improve the SDK's multithreading capabilities and general performance, but there's a limit to how much it can do to optimize an eight-year-old engine without breaking backwards compatibility.

Why The Timeline Matters:
If we accept NVIDIA's version of events, the limitations Kanter noted make more sense. Back in 2002-2003, Intel was still talking about 10GHz Pentium 4's, multi-core processors were a dim shadow on the horizon, and the a significant chunk of gamers/developers owned processors that didn't support SSE and/or SSE2.

One thing NVIDIA admitted to us when we talked to the company's PhysX team is that it's spent significantly more time optimizing PhysX to run on the XBox 360's Xenon and PS3's Cell processor as compared to the x86 platform. As far as Cell is concerned, there's good technological reasons to do so. If you hand a Cell code that's been properly tuned and tweaked, it can blow past the fastest x86 processors by an order of magnitude. If these optimizations aren't performed, however, the Broadband Engine's throughput might make you wish for a 486.


In theory, properly optimized PhysX could make the image on the left look much more like the GPU- PhysX image created on the right.

Other factors include the fact that the majority of game development is done with consoles in mind, and the simple reason that NVIDIA wants PC users to buy GPUs because of PhysX, which does make it less interested in optimizing CPU PhysX.

Modernized SDK Under Development:
It'll be awhile, but we'll eventually find out whether NVIDIA is purposefully maintaining deprecated standards, or if the problem has more to do with the age of the company's development API. NV isn't giving out any release dates, but the company is hard at work on a new version of the PhysX SDK. Rather than trying to continually patch new capabilities into an old code base, the PhysX team is "rearchitecting" the entire development platform.

In theory, this revamp will address all of the issues that have been raised regarding x86 performance, though it may still be the developer's responsibility to use and optimize certain capabilities. Even after version 3.xx is available, we'll have to wait for games that make full use of it, but if NVIDIA's been sincere, we'll see a difference in how modern CPUs perform. 

Related content