|
|
| Intro, Specifications and Related Info | |||||||
It seems like Intel started talking about the Penryn core as soon as the Conroe core launched in the form of the first Core 2 Duo and Core 2 Extreme processors. Penryn was to be the next evolution in Intel’s Core microarchitecture and would be the foundation of a new class of mobile, desktop, and server processor built using the company’s advanced 45nm manufacturing process. Penryn wouldn’t be a straight die-shrink of Conroe, however. With Penryn, Intel planned to introduce new SSE4 instructions, increase the amount of L2 cache per core, reduce power consumption, and generally enhance overall performance, clock for clock. All things that sound good to a PC enthusiast. After many months of trickling out information regarding Penryn and Intel’s 45nm manufacturing process, we’re finally able to offer you some firsthand information regarding Yorkfield, Intel’s quad-core, desktop Penryn derivative. We recently got our hands on a new Core 2 Extreme QX9650 processor and were able to run it through a host of benchmarks and overclock it as well. Read on to see how the QX9650 performed and whether or not Intel’s 45nm manufacturing process is all the company has claimed it is cracked up to be...
We've published a number of articles relating to Intel's Core microarchitecture, Core 2 Duo and Extreme family of processors, Penryn, and Intel's 45nm manufacturing process in the past here at HotHardware.com. For more detail or a refresher on the technologies employed by the new Core 2 Extreme QX9650 and Intel's platform as a whole, we suggest taking a look at the following related articles. These articles contain detailed explanations of some of the features common to Intel's legacy products, compatible chipsets, and the processes used to build these new processors:
At the very least, we suggest you read our Intel 45nm fab process preview and the Intel Penryn and Nehalem details articles to get familiar with the new technologies employed in the Yorkfield core which is at the heart of the new QX9650 and Intel's advanced 45nm manufacturing process. Those two articles in particular will lay the foundation for what we're going to show you on the pages ahead. |
| 45nm, Cache, SSE4, Tick-Tock Cadence |
As we've already mentioned, Penryn is not just a simple die shrink of Conroe. Yorkfield, the code name of the quad-core desktop variant of the Penryn core, is different from Kentsfield in a number of ways... 45nm Manufacturing Process: A major issue that becomes more significant as manufacturing processes get smaller is current leakage. Leakage occurs through multiple parts of a semiconductor, but one of the most problematic situations occurs when unwanted current flows through the gate dielectric in a transistor. Ideally, the gate dielectric would act as a perfect insulator. But because it is made ever thinner as manufacturing processes advance and die geometries continue to shrink, current leaks through the gate dielectric. In Intel's 65nm process, it is only 5 atomic layers thick. This leads to undesirable results and the transistor consumes more power than it should. With their 45nm process, however, Intel has been able to develop and successfully implement a high-k (capacitance) and metal gate transistor that significantly reduce leakage current. According to Intel, the combination of manufacturing processors using their 45nm process, in conjunction with the high-k and metal gate transistor breakthrough will offer a number of key benefits:
The Core 2 Extreme QX9650 is built using Intel's 45nm process. The CPU is comprised of two dual-core dies on a single package, similar to Kentsfield. Each die on the QX9650 is comprised of approximately 410M transistors and is about 107mm squared. If you're keeping count, that means Yorkfield is comprised of 820M transistors and is about 214mm squared.
Larger Cache: Because Intel is able to increase transistor density with their 45nm process, the Core 2 Extreme QX9650 also features more L2 cache than its predecessors. Each dual-core die on the QX9650 is outfitted with 6MB of L2 cache, for a total of 12MB, as opposed to 4MB per die and a total of 8MB on Kentsfield. In addition to having more L2 cache, Penryn derivatives like Yorkfield also have a 24-way set associative cache, as opposed to the 16-way set associative cache on the previous generation. Having a higher set associativity, in addition to the larger cache, means there should be fewer cache misses with Yorkfield. This should decreases the number of times the CPU will have access main memory due to a cache miss, which in turn should increase performance.
SSE4 Instructions: Penryn derivatives like the Yorkfield core used in the Core 2 Extreme QX9650 will also feature new SSE4 instructions. SSE4 should offer performance enhancements to media codecs that take advantage of the technology. This is accomplished through new instructions and a new Super Shuffle Engine that improves performance for SSE2, SSE3 and SSE4 instructions that have shuffle-like operations such as pack, unpack and wider packed shifts. In a recent article, we published some benchmark scores that used SSE4 optimized video encoding applications that showed huge performance increases:
Clock For Clock Improvements: The Core 2 Extreme QX9650 is also built upon and enhanced Core microarchitecture designed to offer greater performance at a given frequency, while at the same time being able to operate at even higher frequencies. Intel disclosed that Penryn will feature a 4-bit per cycle divider, that the company claims will offer 4X the performance of current processors for square root operations and increased performance computing transcendentals. Intel has dubbed this new feature their Fast Radix-16 Divider.
Tick-Tock: Ever since the introduction of Conroe, Intel has talked about their new Tick-Tock strategy as it relates to processor development.
The 'Tick' refers to a new microarchitecture, while the 'Tock' signifies new releases based on enhancements incorporated into the original design. In this case, Penryn in the 'Tock' to Conroe's (Intel Core microarchitecture) 'Tick'. Nehalem is next. |
| Test Systems and SiSoft SANDRA | ||||||||||||||||
|
How we configured our test systems: When configuring our test systems for this article, we first entered their respective system BIOSes and set each board to its "Optimized" or "High performance Defaults". We then saved the settings, re-entered the BIOS and set memory timings for either DDR2-800 with 4,4,4,12 timings or DDR3-1333 with 5,5,5,15 timings. The hard drives were then formatted, and Windows XP Professional (SP2) was installed. When the Windows installation was complete, we installed the drivers necessary for our components, and removed Windows Messenger from the system. Auto-Updating and System Restore were then disabled and we set up a 1024MB permanent page file on the same partition as the Windows installation. Lastly, we set Windows XP's Visual Effects to "best performance," installed all of our benchmarking software, defragged the hard drives, and ran all of the tests.
We began our testing with SiSoftware's SANDRA XII, the System ANalyzer, Diagnostic and Reporting Assistant. We ran six of the built-in subsystem tests that partially comprise the SANDRA XII suite with the Core 2 Extreme QX9650 (CPU Arithmetic, Multimedia, Multi-Core Efficiency, Memory, Cache, and Memory Latency). All of the scores reported below were taken with the processor running at its default clock speed of 3.0GHz. The SANDRA processor arithmetic and multimedia benchmarks show the new Core 2 Extreme QX9650 finishing just ahead of similarly clocked Core micro-architecture based Core 2 Extreme and Xeon processors, which is just what we had expected. The memory bandwidth benchmark shows the QX9650 / P35 / DDR3 combination with a peak of 7.4GB/s of available memory bandwidth and the memory latency benchmark has the platform outpacing all of the similar reference points in the SANDRA database. The really interesting results here are the 'cache and memory' and multi-core efficiency tests, however. In the cache and memory test, the new Core 2 Extreme QX9650 has a large combined cache/memory bandwidth advantage as well as a lower (better) speed factor rating. And in the multi-core efficiency benchmark, once we get past the 32k block size mark, the QX9650 has vastly improved inter-core bandwidth in comparison to the other Core 2 Quad processors listed. This should pay dividends in multi-threaded applications where data must be shared between the processor cores. |
| PCMark05: CPU and Memory | |||||
|
For our next round of synthetic benchmarks, we ran the CPU and Memory performance modules built into Futuremark's PCMark05 suite. The following tests are synthetic benchmarks designed to show relative performance metrics, but may or may not equate to "real-world" performance.
The new Yorkfield-based Core 2 Extreme QX9650 showed a slight improvement over Intel's previous flagship quad-core desktop processor, the QX6850, in PCMark05's CPU performance module. The difference of 108 points in this test equates to an approximate 1.1% advantage for the QX9650, however, which is not significant in a benchmark like this one.
"The Memory test suite is a collection of tests that isolate the performance of the memory subsystem. The memory subsystem consists of various devices on the PC. This includes the main memory, the CPU internal cache (known as the L1 cache) and the external cache (known as the L2 cache). As it is difficult to find applications that only stress the memory, we explicitly developed a set of tests geared for this purpose. The tests are written in C++ and assembly. They include: Reading data blocks from memory, Writing data blocks to memory performing copy operations on data blocks, random access to data items and latency testing." - Courtesy FutureMark Corp.
PCMark05's memory performance module is affected not only by system memory bandwidth and latency, but by L2 cache performance as well. As such, the new Yorkfield-based Intel Core 2 Extreme QX9650 with its larger, 24-way set associative cache puts up a measurably better score than the similarly clocked QX6850. The QX9650's 174 point edge equates to a 2.7% increase in performance according to this test. |
| Office XP and Photoshop | ||||
|
PC World Magazine's Worldbench 5.0 is a Business and Professional application benchmark. The tests consist of a number of performance modules that each utilizes one, or a group of popular desktop applications to gauge performance.
|
| LAME MT and Sony Vegas | ||||||||
|
In our custom LAME MT MP3 encoding test, we convert a large WAV file to the MP3 format, which is a popular scenario that many end users work with on a day-to-day basis to provide portability and storage of their digital audio content. LAME is an open-source mid to high bit-rate and VBR (variable bit rate) MP3 audio encoder that is used widely around the world in a multitude of third party applications.
Although LAME MT is not optimized for Yorkfield and does not make use of its new SSE4 instructions, the QX9650's performance is significantly improved in this test. As you can see, at similar clock speeds, the Core 2 Extreme QX9650 is between 3 and 4 seconds faster than the older QX6850 - that's an improvement of roughly 10%.
Sony's Vegas DV editing software is heavily multi-threaded as it processes and mixes both audio and video streams. This is a new breed of digital video editing software that takes full advantage of current dual and multi-core processor architectures. Like the LAME MT results above, the Sony Vegas video rendering benchmark also showed a marked improvement for the new Core 2 Extreme QX9650. In this test, the new Yorkfield-based processor finished the video rendering process about 21 seconds faster than the similarly clocked Core 2 Extreme QX6850. |
| Cinebench R9.5 and 3DMark06 | ||||||||
|
Cinebench 9.5 is an OpenGL 3D rendering performance test based on Cinema 4D. Cinema 4D from Maxon is a 3D rendering and animation tool suite used by 3D animation houses and producers like Sony Animation and many others. It's very demanding of system processor resources and is an excellent gauge of pure computational throughput.
As we've seen with a couple of our previous benchmarks, the new Core 2 Extreme QX9650 has a measurable clock-for-clock advantage over the QX6850 in the Cinebench R9.5 rendering benchmark. In the single threaded tests, the QX9650 finished four seconds faster than the QX6850 and the multi-threaded test it was a full two seconds faster.
3DMark06's built-in CPU test is a multi-threaded DirectX gaming metric that's useful for comparing relative performance between similarly equipped systems. This test consists of two different 3D scenes that are processed with a software renderer that is dependent on the host CPU's performance. Calculations that are normally reserved for your 3D accelerator are instead sent to the CPU for processing and rendering. The frame-rate generated in each test is used to determine the final score. Once again, the new Core 2 Extreme QX9650 shows a marked improvement over the similarly clocked Core 2 Extreme QX6850 in the 3DMark06 CPU benchmark. The Core 2 Extreme QX9650 put up a score exactly 300 points higher than the QX6850, which equates to a difference of 6.4%. |
| Quake 4 and F.E.A.R. | ||||
|
The same performance trend we've seen on the previous pages held true in our in-game tests. Here, the new Core 2 Extreme QX9650 was 9.1 frames per second faster than the similarly clocked Core 2 Extreme QX6850 in our custom Quake 4 benchmark and 2 frames per second faster in the F.E.A.R. benchmark. The F.E.A.R. result is essentially a wash, but the Quake 4 test shows the QX9650 with a 5.4% advantage. |
| Power Consumption | ||||
Before we bring this analysis to a close we wanted to give you an idea of how much power each of the system configurations we tested used, while idling and while under a workload.
|
| Our Summary and Conclusion | ||||
|
Performance Summary: Throughout our entire benchmark suite, the new Yorkfield-based Core 2 Extreme QX9650 outperformed a similarly clocked Kenstfield-based Core 2 Extreme QX6850, while at the same time using much less power. In some of the synthetic and less taxing real-world application benchmarks, the QX9650 performed on par with or slightly better than the QX6850. In a few of the more taxing audio encoding and 3D or video encoding benchmarks, like LAME MT and Cinebench, the new QX9650 showed significant clock-for-clock performance gains, sometimes larger than 10%. We can't help but think the new Core 2 Extreme QX9650 is but a glimpse of what Intel has in store for us in the future. Intel has been talking about their 45nm process technology for what seems like an eternity. When a major company like Intel is as open and talkative about a new technology or product years before its release, as Intel has been, it usually means one of two things; either the technology is not all it's cracked up to be and the PR machine is running full force to play up its strengths, or the technology is the real deal and the company wants everyone to know it. After experimenting with the QX9650 and seeing multiple products built using Intel's 45nm process technology first hand over the past few months, we can't help but think it is the real deal.
|