Logo   Banner   TopRight
TopUnder
Transparent
Intel Core 2 Extreme QX9650 - Yorkfield Has Landed
Transparent
Date: Oct 28, 2007
Section:Processors
Author: Marco Chiappetta
Transparent
Intro, Specifications and Related Info

intel_logo.jpg


It seems like Intel started talking about the Penryn core as soon as the Conroe core launched in the form of the first Core 2 Duo and Core 2 Extreme processors.  Penryn was to be the next evolution in Intel’s Core microarchitecture and would be the foundation of a new class of mobile, desktop, and server processor built using the company’s advanced 45nm manufacturing process.

Penryn wouldn’t be a straight die-shrink of Conroe, however.  With Penryn, Intel planned to introduce new SSE4 instructions, increase the amount of L2 cache per core, reduce power consumption, and generally enhance overall performance, clock for clock.  All things that sound good to a PC enthusiast.

After many months of trickling out information regarding Penryn and Intel’s 45nm manufacturing process, we’re finally able to offer you some firsthand information regarding Yorkfield, Intel’s quad-core, desktop Penryn derivative.  We recently got our hands on a new Core 2 Extreme QX9650 processor and were able to run it through a host of benchmarks and overclock it as well.  Read on to see how the QX9650 performed and whether or not Intel’s 45nm manufacturing process is all the company has claimed it is cracked up to be...
 

 
A Penny For Your Thoughts...

Intel Core 2 Extreme QX9650 Processor
Specifications & Features

  • Core Frequency - 3.0GHz
  • System Bus Frequency - 1333MHz
  • TDP (Thermal Design Power) - 130W
     
  • Stepping -  6
  • Number of CPU Cores - 4
  • L2 Cache - 12MB (2 x 6MB)
  • Max processor input voltage (VID) - 1.360v
  • .045-micron manufacturing process
     
  • Shared Smart Cache Technology
  • PECI Enabled
  • Enhanced Intel SpeedStep Technology (EIST)
  • Extended HALT State (C1E) Enabled
  • Execute Disable Bit (XD) Enabled
  • Intel 64 Technology
  • Intel Virtualization Technology (VT)
  • Packaging -  Flip Chip LGA775
  • Total Die Size: Approximately 214mm2 (107mm2 x2) 
  • Approximately 820M Transistors
  • MSRP - $TBA
 
Penryn Die (Yorkfield = 2X)


  
Intel Core 2 Extreme QX9650: Top and Bottom


We've published a number of articles relating to Intel's Core microarchitecture, Core 2 Duo and Extreme family of processors, Penryn, and Intel's 45nm manufacturing process in the past here at HotHardware.com.  For more detail or a refresher on the technologies employed by the new Core 2 Extreme QX9650 and Intel's platform as a whole, we suggest taking a look at the following related articles.  These articles contain detailed explanations of some of the features common to Intel's legacy products, compatible chipsets, and the processes used to build these new processors:

At the very least, we suggest you read our Intel 45nm fab process preview and the Intel Penryn and Nehalem details articles to get familiar with the new technologies employed in the Yorkfield core which is at the heart of the new QX9650 and Intel's advanced 45nm manufacturing process.  Those two articles in particular will lay the foundation for what we're going to show you on the pages ahead.

Transparent
45nm, Cache, SSE4, Tick-Tock Cadence

intel_logo.jpg


As we've already mentioned, Penryn is not just a simple die shrink of Conroe.  Yorkfield, the code name of the quad-core desktop variant of the Penryn core, is different from Kentsfield in a number of ways...

45nm Manufacturing Process: A major issue that becomes more significant as manufacturing processes get smaller is current leakage. Leakage occurs through multiple parts of a semiconductor, but one of the most problematic situations occurs when unwanted current flows through the gate dielectric in a transistor. Ideally, the gate dielectric would act as a perfect insulator. But because it is made ever thinner as manufacturing processes advance and die geometries continue to shrink, current leaks through the gate dielectric. In Intel's 65nm process, it is only 5 atomic layers thick.  This leads to undesirable results and the transistor consumes more power than it should.

With their 45nm process, however, Intel has been able to develop and successfully implement a high-k (capacitance) and metal gate transistor that significantly reduce leakage current. According to Intel, the combination of manufacturing processors using their 45nm process, in conjunction with the high-k and metal gate transistor breakthrough will offer a number of key benefits:

  • ~2x improvement in transistor density, for either smaller chip size or increased transistor count
  • ~30% reduction in transistor switching power
  • >20%improvement in transistor switching speed or >5x reduction in source-drain leakage power
  • >10x reduction in gate oxide leakage power

The Core 2 Extreme QX9650 is built using Intel's 45nm process.  The CPU is comprised of two dual-core dies on a single package, similar to Kentsfield.  Each die on the QX9650 is comprised of approximately 410M transistors and is about 107mm squared.  If you're keeping count, that means Yorkfield is comprised of 820M transistors and is about 214mm squared.

 

Larger Cache: Because Intel is able to increase transistor density with their 45nm process, the Core 2 Extreme QX9650 also features more L2 cache than its predecessors.  Each dual-core die on the QX9650 is outfitted with 6MB of L2 cache, for a total of 12MB, as opposed to 4MB per die and a total of 8MB on Kentsfield.  In addition to having more L2 cache, Penryn derivatives like Yorkfield also have a 24-way set associative cache, as opposed to the 16-way set associative cache on the previous generation.  Having a higher set associativity, in addition to the larger cache, means there should be fewer cache misses with Yorkfield.  This should decreases the number of times the CPU will have access main memory due to a cache miss, which in turn should increase performance. 

 

SSE4 Instructions: Penryn derivatives like the Yorkfield core used in the Core 2 Extreme QX9650 will also feature new SSE4 instructions. SSE4 should offer performance enhancements to media codecs that take advantage of the technology. This is accomplished through new instructions and a new Super Shuffle Engine that improves performance for SSE2, SSE3 and SSE4 instructions that have shuffle-like operations such as pack, unpack and wider packed shifts.  In a recent article, we published some benchmark scores that used SSE4 optimized video encoding applications that showed huge performance increases:


perf2.png


The 45nm CPU listed on the left supports SSE4, while the CPU on the right does not.  As you can see, SSE4 has a major impact on performance when it is used.

 

Clock For Clock Improvements: The Core 2 Extreme QX9650 is also built upon and enhanced Core microarchitecture designed to offer greater performance at a given frequency, while at the same time being able to operate at even higher frequencies.  Intel disclosed that Penryn will feature a 4-bit per cycle divider, that the company claims will offer 4X the performance of current processors for square root operations and increased performance computing transcendentals. Intel has dubbed this new feature their Fast Radix-16 Divider.

 

Tick-Tock: Ever since the introduction of Conroe, Intel has talked about their new Tick-Tock strategy as it relates to processor development.


 

The 'Tick' refers to a new microarchitecture, while the 'Tock' signifies new releases based on enhancements incorporated into the original design.  In this case, Penryn in the 'Tock' to Conroe's (Intel Core microarchitecture) 'Tick'.  Nehalem is next.

Transparent
Vital Signs and Overclocking

intel_logo.jpg  

Because they use the same packaging and have the same integrated heat spreader design, the new Core 2 Extreme QX9650 looks just like the QX6850 it is supplanting at the top of Intel's desktop quad-core processor line-up.  If you flip it over, however, there are some things that differentiate the newer 45nm processor from its 65nm counterpart.


  

Although they may use the same socket and thus have the same 775 pads on their underside, the surface mounted components on the Core 2 Extreme QX9650 (right) are arranged in a different configuration.  This isn't an earth shattering discovery by any stretch of the imagination, but we thought you'd like to see the differences first-hand nonetheless. 


     
Core 2 Extreme QX9650: CPU-Z Core and Cache Information

The information provided by CPU-Z also shows some of the main differences between the new Core 2 Extreme QX9650 and its older 65nm counterparts.  The core name is properly listed as Yorkfield (Wolfdale is the dual-core desktop variant), the stepping is listed as '6', process is correctly listed as 45nm, and SSE4 instructions are listed as well.  The dual 6MB L2 cache of the each dual core die on the processor is also correctly identified.  It's on the Cache tab that another of Yorkfield's interesting changes are listed.  As we've already mentioned, in addition to increasing the size of the L2 cache, Intel has tweaked the configuration to be a 24-way set associative cache - Conroe and Kentsfield are 16-way.  The L1 cache remains 8-way set associative on Yorkfield.

Overclocking The QX9650 To 3.9+GHz
Quad-Core Flat-Out

We're sure many of you are wondering just how much clock speed headroom the Core 2 Extreme QX9650 has left under the hood, so we spent some time overclocking our sample as well.


Core 2 Extreme QX9650: Overclocked to 3.9GHz

Using a stock, Intel cooler and an Asus Blitz Extreme motherboard based on the P35 chipset, we set out overclocking the QX9650 by first increasing its voltage to 1.4v.  Then we increased its multiplier and found that 11 was the sweet spot - with the multiplier increased to 12, which resulted in a CPU clock speed of 4GHz, the system wasn't completely stable.  Then finally we slowly increased the FSB frequency until we found our particular processor's peak, stable overclocked speed. Ultimately, the CPU hit an impressive 3.9GHz.

At that speed, the processor idled at only 37ºC and under load it never broke the 60ºC mark. We also noticed that the CPU's temperature dropped rapidly when entering the idle state.  It's clear that even at this relatively early stage, Intel's 45nm manufacturing process is healthy and ready for prime time.

Transparent
Test Systems and SiSoft SANDRA

intel_logo.jpg

How we configured our test systems: When configuring our test systems for this article, we first entered their respective system BIOSes and set each board to its "Optimized" or "High performance Defaults". We then saved the settings, re-entered the BIOS and set memory timings for either DDR2-800 with 4,4,4,12 timings or DDR3-1333 with 5,5,5,15 timings. The hard drives were then formatted, and Windows XP Professional (SP2) was installed. When the Windows installation was complete, we installed the drivers necessary for our components, and removed Windows Messenger from the system. Auto-Updating and System Restore were then disabled and we set up a 1024MB permanent page file on the same partition as the Windows installation. Lastly, we set Windows XP's Visual Effects to "best performance," installed all of our benchmarking software, defragged the hard drives, and ran all of the tests.

 

 HotHardware's Test Systems
 Intel and AMD - Head To Head 

System 1:
Core 2 Extreme QX9650
(3.0GHz - Quad-Core)
Core 2 Extreme QX6850
(3.0GHz - Quad-Core)
Core 2 Duo E6800
(3.0GHz - Quad-Core) 
Core 2 Duo E6750
(2.66GHz - Dual-Core)

Asus Blitz Extreme
(P35 Chipset)

2x1GB Corsair PC3-14400
CL 5-5-5-15 - DDR3-1333

GeForce 8800 GTX
On-Board Ethernet
On-board Audio

WD740 "Raptor" HD
10,000 RPM SATA

Windows XP Pro SP2
Intel INF 8.0.3.1013
NVIDIA Forceware v158.22
DirectX 9.0c (June 2007)
 

System 2:
AMD Athlon X2 6000+
(3.0GHz)

Asus CrossHair
(NVIDIA nForce 590 SLI)

2x1GB Corsair PC-6400
CL 4-4-4-12 - DDR2-800

GeForce 8800 GTX
On-Board Ethernet
On-board Audio

WD740 "Raptor" HD
10,000 RPM SATA

Windows XP Pro SP2
nForce Drivers v9.35
NVIDIA Forceware v158.22
DirectX 9.0c

 Preliminary Testing with SiSoft SANDRA XI
 Synthetic Benchmarks

We began our testing with SiSoftware's SANDRA XII, the System ANalyzer, Diagnostic and Reporting Assistant. We ran six of the built-in subsystem tests that partially comprise the SANDRA XII suite with the Core 2 Extreme QX9650 (CPU Arithmetic, Multimedia, Multi-Core Efficiency, Memory, Cache, and Memory Latency).  All of the scores reported below were taken with the processor running at its default clock speed of 3.0GHz.


 
C2E QX9650 @ 3.0GHz
CPU Arithmetic

 
C2E QX9650 @ 3.0GHz
Multimedia

 
C2E QX9650 @ 3.0GHz
Multi-Core Efficiency

 


C2E QX9650 @ 3.0GHz
Memory Bandwidth

 


C2E QX9650 @ 3.0GHz
Cache and Memory

 


C2E QX9650 @ 3.0GHz
Memory Latency



The SANDRA processor arithmetic and multimedia benchmarks show the new Core 2 Extreme QX9650 finishing just ahead of similarly clocked Core micro-architecture based Core 2 Extreme and Xeon processors, which is just what we had expected.  The memory bandwidth benchmark shows the QX9650 / P35 / DDR3 combination with a peak of 7.4GB/s of available memory bandwidth and the memory latency benchmark has the platform outpacing all of the similar reference points in the SANDRA database.  The really interesting results here are the 'cache and memory' and multi-core efficiency tests, however.  In the cache and memory test, the new Core 2 Extreme QX9650 has a large combined cache/memory bandwidth advantage as well as a lower (better) speed factor rating.  And in the multi-core efficiency benchmark, once we get past the 32k block size mark, the QX9650 has vastly improved inter-core bandwidth in comparison to the other Core 2 Quad processors listed.  This should pay dividends in multi-threaded applications where data must be shared between the processor cores.

Transparent
PCMark05: CPU and Memory
 

  intel_logo.jpg

For our next round of synthetic benchmarks, we ran the CPU and Memory performance modules built into Futuremark's PCMark05 suite.  The following tests are synthetic benchmarks designed to show relative performance metrics, but may or may not equate to "real-world" performance.
  

 Futuremark PCMark05
 More Synthetic CPU and Memory Benchmarks


"The CPU test suite is a collection of tests that are run to isolate the performance of the CPU. The CPU Test Suite also includes multithreading: two of the test scenarios are run multithreaded; the other including two simultaneous tests and the other running four tests simultaneously. The remaining six tests are run single threaded. Operations include, File Compression/Decompression, Encryption/Decryption, Image Decompression, and Audio Compression" - Courtesy FutureMark Corp.

 


The new Yorkfield-based Core 2 Extreme QX9650 showed a slight improvement over Intel's previous flagship quad-core desktop processor, the QX6850, in PCMark05's CPU performance module.  The difference of 108 points in this test equates to an approximate 1.1% advantage for the QX9650, however, which is not significant in a benchmark like this one.


"The Memory test suite is a collection of tests that isolate the performance of the memory subsystem. The memory subsystem consists of various devices on the PC. This includes the main memory, the CPU internal cache (known as the L1 cache) and the external cache (known as the L2 cache). As it is difficult to find applications that only stress the memory, we explicitly developed a set of tests geared for this purpose. The tests are written in C++ and assembly. They include: Reading data blocks from memory, Writing data blocks to memory performing copy operations on data blocks, random access to data items and latency testing."  - Courtesy FutureMark Corp. 
 

PCMark05's memory performance module is affected not only by system memory bandwidth and latency, but by L2 cache performance as well.  As such, the new Yorkfield-based Intel Core 2 Extreme QX9650 with its larger, 24-way set associative cache puts up a measurably better score than the similarly clocked QX6850.  The QX9650's 174 point edge equates to a 2.7% increase in performance according to this test.

Transparent
Office XP and Photoshop

  intel_logo.jpg

PC World Magazine's Worldbench 5.0 is a Business and Professional application benchmark.  The tests consist of a number of performance modules that each utilizes one, or a group of popular desktop applications to gauge performance.
 

Worldbench 5.0: Office XP SP2 and Photoshop 7 Modules
Real-World Application Performance


Below we have the results from Worldbench 5.0's Office XP SP2 and Photoshop 7 performance modules, recorded in seconds.  Lower times indicate better performance here, so the shorter the bar the better.


 

 



 


Neither Office XP nor Photoshop 7 can truly exploit the resources available in today's high-end quad-core processors, hence the similar performance between the three quad-core processors represented here.  Technically speaking, however, the new Core 2 Extreme QX9650 did put up the best scores in both tests, albeit by just a few seconds, no doubt aided by its larger cache, and high CPU and front side bus frequencies.

Transparent
LAME MT and Sony Vegas

  intel_logo.jpg

In our custom LAME MT MP3 encoding test, we convert a large WAV file to the MP3 format, which is a popular scenario that many end users work with on a day-to-day basis to provide portability and storage of their digital audio content.  LAME is an open-source mid to high bit-rate and VBR (variable bit rate) MP3 audio encoder that is used widely around the world in a multitude of third party applications.
 

 LAME MT MP3 Encoding Test
 Converting a Large WAV To MP3


In this test, we created our own 223MB WAV file (a hallucinogenically-induced Grateful Dead jam) and converted it to the MP3 format using the multi-thread capable LAME MT application in single and multi-thread modes. Processing times are recorded below, listed in seconds. Once again, shorter times equate to better performance.



  

Although LAME MT is not optimized for Yorkfield and does not make use of its new SSE4 instructions, the QX9650's performance is significantly improved in this test.  As you can see, at similar clock speeds, the Core 2 Extreme QX9650 is between 3 and 4 seconds faster than the older QX6850 - that's an improvement of roughly 10%.


Sony Vegas Digital Video Rendering Test
Video Rendering Performance

Sony's Vegas DV editing software is heavily multi-threaded as it processes and mixes both audio and video streams. This is a new breed of digital video editing software that takes full advantage of current dual and multi-core processor architectures.


  

Like the LAME MT results above, the Sony Vegas video rendering benchmark also showed a marked improvement for the new Core 2 Extreme QX9650.  In this test, the new Yorkfield-based processor finished the video rendering process about 21 seconds faster than the similarly clocked Core 2 Extreme QX6850.

Transparent
Cinebench R9.5 and 3DMark06

  intel_logo.jpg

Cinebench 9.5 is an OpenGL 3D rendering performance test based on Cinema 4D. Cinema 4D from Maxon is a 3D rendering and animation tool suite used by 3D animation houses and producers like Sony Animation and many others.  It's very demanding of system processor resources and is an excellent gauge of pure computational throughput.


 Cinebench 9.5 Performance Tests
 3D Modeling & Rendering Tests


This is a multi-threaded, multi-processor aware benchmark that renders a single 3D scene and tracks the length of the entire process. The time it took each test system to render the entire scene is represented in the graph below, listed in seconds.


  

As we've seen with a couple of our previous benchmarks, the new Core 2 Extreme QX9650 has a measurable clock-for-clock advantage over the QX6850 in the Cinebench R9.5 rendering benchmark. In the single threaded tests, the QX9650 finished four seconds faster than the QX6850 and the multi-threaded test it was a full two seconds faster.


 Futuremark 3DMark06 - CPU Test
 Simulated DirectX Gaming Performance

3DMark06's built-in CPU test is a multi-threaded DirectX gaming metric that's useful for comparing relative performance between similarly equipped systems.  This test consists of two different 3D scenes that are processed with a software renderer that is dependent on the host CPU's performance.  Calculations that are normally reserved for your 3D accelerator are instead sent to the CPU for processing and rendering.  The frame-rate generated in each test is used to determine the final score.


  

Once again, the new Core 2 Extreme QX9650 shows a marked improvement over the similarly clocked Core 2 Extreme QX6850 in the 3DMark06 CPU benchmark. The Core 2 Extreme QX9650 put up a score exactly 300 points higher than the QX6850, which equates to a difference of 6.4%.

Transparent
Quake 4 and F.E.A.R.

intel_logo.jpg

For our last set of tests, we moved on to some in-game benchmarking with Quake 4 and F.E.A.R. When testing processors and motherboards with Q4 or F.E.A.R, we drop the screen resolution and reduce all of the in-game graphical options to their minimum values, to isolate CPU and memory performance as much as possible.  However, the in-game effects and the level of detail for processing workloads such as physics calculations and particle systems, are left at their maximum values, since these actually do place some load on the CPU rather than GPU.
 

 Benchmarks with Quake 4 and F.E.A.R. v1.08
 DirectX 9 and OpenGL Gaming Performance
 


 
 


The same performance trend we've seen on the previous pages held true in our in-game tests. Here, the new Core 2 Extreme QX9650 was 9.1 frames per second faster than the similarly clocked Core 2 Extreme QX6850 in our custom Quake 4 benchmark and 2 frames per second faster in the F.E.A.R. benchmark.  The F.E.A.R. result is essentially a wash, but the Quake 4 test shows the QX9650 with a 5.4% advantage.

Transparent
Power Consumption

  intel_logo.jpg

Before we bring this analysis to a close we wanted to give you an idea of how much power each of the system configurations we tested used, while idling and while under a workload.

 Power Characteristics
 Processors and Platforms


Please keep in mind that we are looking at total system power consumption here at the electrical outlet, not just the power being drawn by the processors alone.  In this test, we're showing you a ramp-up of power from idle on the desktop to 100% processor load.  We tested with a combination of Cinebench 9.5 and SANDRA XII running on the CPU.
 




While idling, all of Intel quad-core processors consumed a similar amount of power, with the new Core 2 Extreme QX9650 falling right in between the QX6800 and QX6850.  Under load, however, the 45nm Yorkfield-based QX9650 consumed significantly less power than its Core 2 Extreme branded counterparts.  The most direct comparison between the QX9650 and QX6850 has the newer QX9650 using 49 fewer watts under load; a huge improvement.  As you can see, the QX9650 even consumed less power than the dual-core Athlon 64 X2 6000+ and only a few watts more than the E6750, which is a testament to the power-friendly design of Penryn and Intel's 45nm manufacturing process.

Transparent
Our Summary and Conclusion
 

  intel_logo.jpg

Performance Summary:  Throughout our entire benchmark suite, the new Yorkfield-based Core 2 Extreme QX9650 outperformed a similarly clocked Kenstfield-based Core 2 Extreme QX6850, while at the same time using much less power. In some of the synthetic and less taxing real-world application benchmarks, the QX9650 performed on par with or slightly better than the QX6850.  In a few of the more taxing audio encoding and 3D or video encoding benchmarks, like LAME MT and Cinebench, the new QX9650 showed significant clock-for-clock performance gains, sometimes larger than 10%.

 

 
 

We can't help but think the new Core 2 Extreme QX9650 is but a glimpse of what Intel has in store for us in the future.  Intel has been talking about their 45nm process technology for what seems like an eternity.  When a major company like Intel is as open and talkative about a new technology or product years before its release, as Intel has been, it usually means one of two things; either the technology is not all it's cracked up to be and the PR machine is running full force to play up its strengths, or the technology is the real deal and the company wants everyone to know it.  After experimenting with the QX9650 and seeing multiple products built using Intel's 45nm process technology first hand over the past few months, we can't help but think it is the real deal.

The Yorkfield-based Core 2 Extreme QX9650 is a success in every sense of the word.  The processor is faster, has new features, uses much less power, is less expensive to produce, and has more overclocking headroom than its predecessor.  What more is there to say?  Sure, the new features like SSE4 won't be fully exploited until applications are programmed to use them, but that is already happening and we suspect adoption will be relatively quick considering the available performance increases.  And the fact that chips built using Intel's 45nm process will be cheaper to produce doesn't just mean more profits for Intel.  It means the company can keep the price pressure on AMD while maintaining their bottom line, so expect aggressive pricing with future mainstream and mid-level dual and quad-core 45nm processors.

Intel hasn't disclosed pricing information just yet, but you can bet that the Core 2 Extreme QX9650 will be priced in-line with previous Extreme Edition processors, which is to say this chip is going to be expensive - think in the $1200 range.  Intel will be releasing final pricing information in mid-November when the processor ships alongside a few new Xeons.  Is that a heck of a lot of money to pay for a desktop processor?  Yes, it is.  But if money is no object and you want the fastest, most capable desktop processor on the planet currently, the Core 2 Extreme QX9650 is it.


  • Wicked Fast
  • SSE4 Support
  • Big Power Savings 
  • Plenty of Overclocking Headroom
  • It's going to be Pricey
  • Same Clock Speed as QX6850


Content Property of HotHardware.com