Ex-Intel Engineer Slams Misguided And Flawed Apple M1 Benchmarking Practices

Apple MacBook Air

Apple has been garnering a lot of praise for its M1 system-on-chip (SoC), the company's first custom Apple Silicon design specifically for its MacBook laptops and Mac Mini desktops. The M1 also represents the beginning of Apple's two-year transition plan to move away from Intel CPUs. This has led to added interest in benchmark performance, particularly in how the Arm-based M1 compares to traditional x86 Intel silicon. It also happens that this has prompted a former distinguished Intel engineer to highlight flaws in common benchmark practices that could paint a more favorable picture of the M1, and specifically its IPC (instructions per clock/cycle) performance, than it perhaps rightfully deserves.

Comparing IPC performance between different chips can be tricky. It's sometimes misunderstood that IPC is not the same thing as single-threaded performance. Simply put, IPC is a measurement of what a processor can accomplish in a single clock cycle. And to compare IPC between two different processors, the clockspeeds need to of course be identical as well as the workloads.

Processor architectures are complex, though, in part because of the different levels of cache. This is precisely where François Piednoël, a former high level Intel engineer for two decades, says the M1 can have an unfair advantage in certain canned benchmarks, and potentially lead to misleading conclusions about IPC.

In a 5-minute video posted to YouTube, Piednoël points out that the M1 has a very large L1 cache, which can artificially inflate "measured IPC." How so? Some benchmarks can fit completely in cache, and in those cases, modern processor with fat caches can show higher scores that are not necessarily indicative of true IPC performance.

"Usually when you try to estimate the instruction per clock, what you do is you try to insulate the instruction per clock from the memory subsystem. But there, the overall performance of the system is better because you have a larger L1," Piednoël explains.

According to Piednoël, this leads people to "conclude the wrong thing for the wrong reasons." The L1 cache advantage that the M1 has over many x86 processors means that certain benchmarks are run fully in cache on the former, and only partially in cache on the latter.

"So you are not really measuring IPC, you are measuring the performance of the system," he says.

Piednoël also claims that part of the problem is some benchmark algorithms have been in use for a decade or longer (Spec2006, POVRay, and even Geekbench have all been around a long time) and are old and outdated—they do not stress modern CPU architectures properly like real-world, modern application workloads.

This is the reason why Intel has been pushing RUG (Representative Usage Guide) benchmarks as of late. They consist of pre-scripted and packaged benchmark test suites that Intel prepares for the press and analyst community, and are intended to represent real-world use cases utilizing common popular software packages like Office 365 and Adobe Premier and Photoshop. You may recall we included RUGs (along with other benchmarks) in our early look at a pre-production Tiger Lake laptop back in September.

Exacerbating the problem with accurate IPC comparisons is that we don’t have a lot of apples-to-apples real-world apps that can easily be benchmarked across platforms, that will have identical workloads. And even then, if M1 has to run in Rosetta emulation, though it’s still real-world and what a user would experience, it’s not the same level playing field as an Apple M1-compiled app, obviously.

In short, not all canned benchmarks show the M1 in the proper light versus other x86 architectures, versus real application performance. To be clear, none of this means that M1 systems are being mis-represented in their actual performance capabilities. As Piednoël notes, "Apple has a very fast chip because it has a very large L1, but don't conclude the IPC is outstanding because of this, because the benchmarks are all getting tricked right now."

Have thoughts on what François had to say? Share them with all in the comments section below.