45nm, Cache, SSE4, Tick-Tock Cadence
As we've already mentioned, Penryn is not just a simple die shrink of Conroe. Yorkfield, the code name of the quad-core desktop variant of the Penryn core, is different from Kentsfield in a number of ways...
45nm Manufacturing Process: A major issue that becomes more significant as manufacturing processes get smaller is current leakage. Leakage occurs through multiple parts of a semiconductor, but one of the most problematic situations occurs when unwanted current flows through the gate dielectric in a transistor. Ideally, the gate dielectric would act as a perfect insulator. But because it is made ever thinner as manufacturing processes advance and die geometries continue to shrink, current leaks through the gate dielectric. In Intel's 65nm process, it is only 5 atomic layers thick. This leads to undesirable results and the transistor consumes more power than it should.
With their 45nm process, however, Intel has been able to develop and successfully implement a high-k (capacitance) and metal gate transistor that significantly reduce leakage current. According to Intel, the combination of manufacturing processors using their 45nm process, in conjunction with the high-k and metal gate transistor breakthrough will offer a number of key benefits:
- ~2x improvement in transistor density, for either smaller chip size or increased transistor count
- ~30% reduction in transistor switching power
- >20%improvement in transistor switching speed or >5x reduction in source-drain leakage power
- >10x reduction in gate oxide leakage power
The Core 2 Extreme QX9650 is built using Intel's 45nm process. The CPU is comprised of two dual-core dies on a single package, similar to Kentsfield. Each die on the QX9650 is comprised of approximately 410M transistors and is about 107mm squared. If you're keeping count, that means Yorkfield is comprised of 820M transistors and is about 214mm squared.
Larger Cache: Because Intel is able to increase transistor density with their 45nm process, the Core 2 Extreme QX9650 also features more L2 cache than its predecessors. Each dual-core die on the QX9650 is outfitted with 6MB of L2 cache, for a total of 12MB, as opposed to 4MB per die and a total of 8MB on Kentsfield. In addition to having more L2 cache, Penryn derivatives like Yorkfield also have a 24-way set associative cache, as opposed to the 16-way set associative cache on the previous generation. Having a higher set associativity, in addition to the larger cache, means there should be fewer cache misses with Yorkfield. This should decreases the number of times the CPU will have access main memory due to a cache miss, which in turn should increase performance.
SSE4 Instructions: Penryn derivatives like the Yorkfield core used in the Core 2 Extreme QX9650 will also feature new SSE4 instructions. SSE4 should offer performance enhancements to media codecs that take advantage of the technology. This is accomplished through new instructions and a new Super Shuffle Engine that improves performance for SSE2, SSE3 and SSE4 instructions that have shuffle-like operations such as pack, unpack and wider packed shifts. In a recent article, we published some benchmark scores that used SSE4 optimized video encoding applications that showed huge performance increases:
The 45nm CPU listed on the left supports SSE4, while the CPU on the right does not. As you can see, SSE4 has a major impact on performance when it is used.
Clock For Clock Improvements: The Core 2 Extreme QX9650 is also built upon and enhanced Core microarchitecture designed to offer greater performance at a given frequency, while at the same time being able to operate at even higher frequencies. Intel disclosed that Penryn will feature a 4-bit per cycle divider, that the company claims will offer 4X the performance of current processors for square root operations and increased performance computing transcendentals. Intel has dubbed this new feature their Fast Radix-16 Divider.
Tick-Tock: Ever since the introduction of Conroe, Intel has talked about their new Tick-Tock strategy as it relates to processor development.
The 'Tick' refers to a new microarchitecture, while the 'Tock' signifies new releases based on enhancements incorporated into the original design. In this case, Penryn in the 'Tock' to Conroe's (Intel Core microarchitecture) 'Tick'. Nehalem is next.