Introduction and Penryn Details
       
    
Intel held a briefing today to further disclose and discuss details regarding their upcoming 45nm High-K Penryn and Nehalem processor cores. Roughly two years ago, Intel talked about their proposed "tick-tock" product strategy which entailed the shift to a new process technology followed by an enhanced or entirely new microarchitecture approximately every year.
Today we have more details regarding 2007's "tick", the Penryn core, and next year's "tock", the Nehalem core, which also ushers in significant changes to Intel's platform architecture as a whole.
|   | 
 | 
| A Range of Products - Six Penryn family processors, including dual and quad-core desktop processors and a dual core mobile processor are all under the Intel Core processor brand name as well as new dual and quad-core server processors under the Intel Xeon processor brand name. A processor for higher-end server multiprocessing systems is also under development. As previously noted, Intel already has a total of 15 45nm products scheduled. Technical Marvel -- 45nm next-generation Intel Core 2 quad-core processors will have 820 million transistors. Thanks to our high-k metal transistor invention, think of 820 million more power efficient light bulbs going on and off at light-speeds. The dual-core version has a die size of 107mm2, which is 25 percent smaller than Intel's current 65nm products -- and quarter of the size of the average postage stamp -- and operate at the same or lower power than Intel's current dual core processors. Deep Power Down for Energy Savings, Improved Battery Life -- The mobile Penryn processor has a new advanced power management state called Deep Power Down Technology that significantly reduces the power of the processor during idle periods such that internal transistor power leakage is no longer a factor. This helps extend battery life in laptops. This is a major advancement over previous generation industry leading Intel mobile processors. Intel Dynamic Acceleration Technology Enhanced Performance for Single Threaded Apps -- For the mobile Penryn processor, Intel has enhanced the Intel Dynamic Acceleration Technology available in current Intel Core 2 processors. This feature uses the power headroom freed up when a core is made inactive to boost the performance of another still active core. Imagine a shower with two powerful water shower heads, when one shower head is turned off, the other has increased water pressure (performance). Speeding Up Video, Photo Imaging, and High Performance Software -- Penryn includes Intel Streaming SIMD Extensions 4 (SSE4) instructions, the largest unique instruction set addition since the original SSE Instruction Set Architecture (ISA). This extends the Intel 64 instruction set architecture to expand the performance and capabilities of the Intel Architecture. | Microarchitecture Optimizations -- Increases the overall performance and energy efficiency of the already leading Intel Core microarchitecture to deliver more instruction executions per clock cycle, which results in more performance and quicker PC responsiveness. Enhanced Intel Virtualization Technology -- Penryn speeds up virtual machine transition (entry/exit) times by an average of 25 to 75 percent. This is all done through microarchitecture improvements and requires no virtual machine software changes. Virtualization partitions or compartmentalizes a single computer so that it can run separate operating systems and software, which can better leverage multicore processing power, increase efficiency and cut costs by letting a single machine act as many virtual "mini" computers. Higher Frequencies -- Penryn family of products will deliver higher overall clock frequencies within existing power and thermal envelopes to further increase performance. Desktop and server products will introduce speeds at greater than 3GHz. o Fast Division of Numbers - Penryn-based processors provide fast divider performance, roughly doubling the divider speed over previous generations for computations used in nearly all applications through the inclusion of a new, faster divide technique called Radix 16. The ability to divide instructions and commands faster increases a computer's performance. Larger Caches -- Penryn processors include up to a 50 percent larger L2 cache with a higher degree of associativity to further improve the hit rate and maximize its utilization. Dual-core Penryn processors will feature up to a 6MB L2 cache and quad-core processors up to a 12MB L2 cache. Cache is a memory reservoir where frequently accessed data can be stored for more rapid access. Larger and faster cache sizes speed a computer's performance and response time. Unique Super Shuffle Engine -- By implementing a full-width, single-pass shuffle unit that is 128-bits wide, Penryn processors can perform full-width shuffles in a single cycle. This significantly improves performance for SSE2, SSE3 and SSE4 instructions that have shuffle-like operations such as pack, unpack and wider packed shifts. This feature will increase performance for content creation, imaging, video and high-performance computing. | 
As we mentioned in our coverage of their 45nm High-K and metal gate transistor re-announcement in January, Penryn is the lead vehicle for Intel's 45nm manufacturing process. Penryn will offer a number of enhancements over current Conroe and Kentsfield-base Core 2 processors.
Penryn will be the first core to benefit from the 45nm High-K and metal gate transistor technology and will be the foundation of future processors that span each product segment (mobile, desktop, and server) and power envelope. Penryn, however, is not just a die shrink of Conroe. Penryn is built upon and enhanced Core microarchitecture designed to offer greater performance at a given frequency, while at the same time operating at even higher frequencies. Penryn also ushers in new SSE4 instructions for Media / Gaming / Graphics developers, new levels of Energy Efficiency, improved Virtualization performance, larger caches, and faster Buses.
Today Intel disclosed that Penryn will feature a 4-bit per cycle divider, that the company claims will offer 4X the performance of current processors for square root operations and increased performance computing transcendental. Intel has dubbed this new feature their Fast Radix-16 Divider. In addition to this, Penryn will also feature new Deep Power Down Technology. Deep Power Down Technology ia essentially a new CPU power state that turns off the clocks and caches and significantly lowers voltage while idle. With Deep Power Down Technology in addition to the benefits inherent to 45nm High-K and metal gate transistors, Penryn should offer huge improvements in idle leakage power consumption. Intel did point out that it does take longer for the CPU to get into and out of the Deep Power Down state versus other C states, however.
Another new feature incorporated into Penryn is called Enhanced Intel Dynamic Acceleration Technology. This feature allows one core within the processor to take advantage of the power budget of a second core, when that second core is not being fully utilized. This feature is designed to enhance the performance of single-threaded workloads. Enhanced Intel Dynamic Acceleration Technology is basically dynamic optimization of the power budget, which enhances the efficiency of the processor with certain workloads.
Along with the architectural details, Intel also shared some preliminary performance data regarding Penryn. In the mobile and desktop arenas, a Penryn derivative running at 3.2GHz offered performance roughly 20% higher than today's most powerful Merom or Conroe-based systems while running existing software.
Taking Penryn's higher clocks into consideration as well, SSE4 should offer performance enhancements to media codecs that take advantage of the technology of greater than 40%. This is accomplished through new instructions and a new Super Shuffle Engine that improves performance for SSE2, SSE3 and SSE4 instructions that have shuffle-like operations such as pack, unpack and wider packed shifts.
And in servers Intel is saying >45% performance improvements are possible in situations that are bandwidth and floating point intensive. We should point out that these improvements represent the performance deltas between the fastest Penryn derivatives versus the fastest processors of today. In the server space, the comparison was made between a 2.67GHz Clovertown CPU and a >3GHz Penryn riding on a 1600MHz FSB. An interesting side note to this performance data is that Intel ran the entire presentation on a 3.33GHz Penryn with a 1333MHz FSB, to further demonstrate the technology is working and on schedule for production this year. Penryn will appear in server space first, followed by the desktop and mobile spaces.





 
                         
                         
                         
                         
                        