Intel Core i9-7900X And Core i7-7740X CPU Review: Skylake-X and Kaby Lake-X Debut


Intel Skylake-X And Kaby Lake-X Architectures

The new Core X series of processors is based on two different micro-architectures. The entry-level Core i5 and Core i7 models are based on Kaby Lake-X, while the bigger and beefier Core i7s and Core i9s are based on Skylake-X.

Despite the X designations, at their core, these micro-architectures are similar to the Skylake and Kaby Lake that have previously appeared in Intel’s mainstream CPU line-up, so we won't be going too in-depth again here. For deeper coverage of Skylake and Kaby Lake, we strongly suggest checking out our launch pieces. Skylake X, however, has received some significant updates worth mentioning.

skylake arch

Before we dive into the big changes, here’s a quick-take on what Skylake brings to the table versus the previous-gen Broadwell and Haswell micro-architectures. Skylake has gotten “wider and deeper”, in that is can handle more in-flight loads and stores, and scheduler entries, and it’s got larger register files and allocation queues as well. The brand predictor and prefecter have been updated too and it's got an improved divider. All told, the changes in Skylake help improve IPC and extract more instruction level parallelism, to ultimately improve performance across an array of workloads.

new cache

Intel has completely restricted the cache hierarchy in Skylake-X, though. Intel quadrupled the size of the L2 cache, bringing it up to 1MB per core, but reduced the size of the L3. In its entirety, however, there is still roughly the same amount of total cache. Not only have the cache sizes changed, but how they are utilized has been tweaked as well. With previous-gen processors, an inclusive cache structure was used. With such a large L2 cache, however, and inclusive cache didn’t make sense anymore, because entire copies of the cache data needed to be maintained in both places, which would effectively reduce the amount of L3 available. And duplicating such a large amount of L2 on a smaller L3, didn’t make any sense. To keep the L3 cache size larger, Intel would have had to sacrifice core count, so the decision was made to move to a non-inclusive cache structure. This change results in a better hit-rate in the larger, lower-latency L2 cache, and a negligible though somewhat lower hit-rate on the smaller L3. As you'll bee in the benchmarks, the 7900X's performance scales as you would expect, so restructuring the cache doesn't seem to have completely changed Skylake-X's performance profile.

turbo boost 3

The Intel Core i7-7820X and all Core i9 series processors, including the Core i9-7900X featured here, will support an updated Turbo Boost Max 3.0 implementation that allows specific "Best Core" boost clocks for up to two processor cores, improving both single- and dual-core performance. The first implementation of Turbo Boost 3.0 that arrived with the Core i7-6950X was only optimized for a single core.

Skylake-X also features support for AVX-512. AVX-512 is a set of new instructions that can accelerate performance for workloads like scientific simulations, financial analytics, artificial intelligence (AI)/deep learning, 3D modeling and analysis, image and audio/video processing, cryptography and data compression. AVX-512 code can feature eight double precision (DP) and sixteen single precision (SP) floating point numbers within the 512-bit vectors, as well as eight 64-bit and sixteen 32-bit integers. This enables the workload to achieve more work per CPU cycle (with double the width of data registers) and helps to minimize latency and overhead (with double the number of registers), compared to AVX2. Applications must be coded to support AVX-512, however.

speed shift

Versus previous-gen processors, Skylake-X also features improvements to Intel’s SpeedShift Technology. SpeedShift allows Skylake to switch P states (power states) much faster than previous-gen products. Skylake can control P states fully in hardware, whereas older processors required OS control, which can to switch P states in as fast as 1ms. It takes roughly 30ms with older processors. Though it doesn’t affect peak performance of the processor, SpeedShift can enhance responsiveness and efficiency, because the processor is able to ramp up and down more quickly.

Linking all of the cores, cache, and I/O in Skylake-X is the new mesh architecture we told you about here. In previous-generation, many-core Xeon processors, Intel has used a ring interconnect architecture to link the CPU cores, cache, memory, and various I/O controllers on the chips. As the number of cores in the processors, and memory and I/O bandwidth has increased, however, it has become increasingly more difficult to achieve peak efficiency with a ring interconnect, because a ring architecture could require data to be sent across long stretches (relatively speaking) of the ring to reach its intended destination. The new mesh architecture addresses this limitation by interconnecting on-chip elements in a more pervasive way, to ultimately increase the number of pathways and improve the efficiency.

intel mesh xeon arch
Intel Mesh Interconnect Architecture

Above is a visual representation of the new mesh architecture. In the diagram, processor cores, on-chip cache banks, memory controllers, and I/O controllers are organized in rows and columns. Wires and switches connect the various on-chip elements and provide a more direct paths than the prior ring interconnect architecture. The nature of a mesh also allows for many more pathways to be implemented, which further minimizes bottlenecks, and also allows Intel to operate the mesh at a lower frequency and voltage, yet still deliver high bandwidth and low latency.

As for Kaby Lake-X, which will be employed in the Core i7-7740X and Core i5-7640X, it is essentially unchanged from the existing Kaby Lake chips. The GPU has been disabled and it employs new packaging so it can be used in the LGA 2066 socket, but underneath it all, it's the Kaby Lake we already know. With that said, the lack of on-processor graphics and beefed up power delivery on the platform may aid with stability when overclocking -- we certainly realized a hefty overclock on our chip, as you'll see a little later.

Related content