NVIDIA Sheds Light On Lack Of PhysX CPU Optimizations
Introduction
A new report from David Kanter at Real World Technologies has dug into how PhysX is executed on a standard x86 CPU; his analysis confirms some of AMD's earlier statements. In many cases, the PhysX code that runs in a given title is both single-threaded and decidedly non-optimized. And instead of taking advantage of the SSE/SSE2 vectorization capabilities at the heart of every x86 processor sold since ~2005, PhysX calculations are done using ancient x87 instructions.
When in doubt, blame the PPU.
As RWT's analysis shows, however, virtually all of the applicable uops in both Cryostasis and Soft Body Physics use x87; SSE accounts for just a tiny percentage of the whole. Toss in the fact that CPU PhysX is typically single-threaded while GPU PhysX absolutely isn't, and Kanter's data suggests that NVIDIA has consciously chosen to avoid any CPU optimizations, and, in so doing, has artificially widened the gap between CPU and GPU performance. If that allegation sounds familiar, it's because we talked about it just a few weeks back, after Intel presented a whitepaper claiming that many of NVIDIA's test cases when claiming huge GPU performance advantages were unfairly optimized.