NVIDIA Sheds Light On Lack Of PhysX CPU Optimizations


About four months ago, we covered the latest round of shin-kicking between ATI and NVIDIA, with ATI claiming that NVIDIA purposefully crippled CPU performance when running PhysX code and coerced developers to make use of it. NVIDIA denied all such claims, particularly those that implied it used its "The Way It's Meant To Be Played" program as a bludgeon to force hardware PhysX on developers or gamers.

A new report from David Kanter at Real World Technologies has dug into how PhysX is executed on a standard x86 CPU; his analysis confirms some of AMD's earlier statements. In many cases, the PhysX code that runs in a given title is both single-threaded and decidedly non-optimized. And instead of taking advantage of the SSE/SSE2 vectorization capabilities at the heart of every x86 processor sold since ~2005, PhysX calculations are done using ancient x87 instructions.

When in doubt, blame the PPU.

Before the introduction of SIMD sets like SSE and SSE2, if you wanted to do floating point calculations on an x86 processor, you used the x87 series of commands. In the past 11 years, however, Intel, AMD, and VIA have all three adopted SSE and SSE2. Both allow for much higher throughput than the classic x87 instruction set—given the ubiquity of support across the PC market, it's hard to tell why NVIDIA hasn't specifically mandated their use.

As RWT's analysis shows, however, virtually all of the applicable uops in both Cryostasis and Soft Body Physics use x87; SSE accounts for just a tiny percentage of the whole. Toss in the fact that CPU PhysX is typically single-threaded while GPU PhysX absolutely isn't, and Kanter's data suggests that NVIDIA has consciously chosen to avoid any CPU optimizations, and, in so doing, has artificially widened the gap between CPU and GPU performance. If that allegation sounds familiar, it's because we talked about it just a few weeks back, after Intel presented a whitepaper claiming that many of NVIDIA's test cases when claiming huge GPU performance advantages were unfairly optimized.

Related content