NVIDIA nForce 4 SLI Intel Edition

A Closer Look at the Memory Controller

Excluding the custom chipset used in the Xbox, the NVIDIA nForce 4 SLI Intel Edition is the first nForce core
logic designed for use with Intel's processors. And as we've mentioned earlier, because Intel's processors don't have on-die memory controllers like the Athlon 64, NVIDIA also had to design a new memory controller for the nForce 4 SLI Intel Edition. The memory controller the architects at NVIDIA came up with builds upon some existing NVIDIA IP, and includes features like a 128-bit wide DualDDR2 memory architecture, support for high-speed DDR2 memory, an updated version of NVIDIA's proprietary DASP technology and their new QuickSync technology.

The NVIDIA nForce 4 SLI Chipset: The Memory Controller
This version has one!

The DualDDR2 memory controller incorporated into the nForce4 SLI Intel Edition's SPP interleaves the two memory channels in such a way that CPU access can be sent simultaneously to both channels. But the type of interleaving used depends on whether the two memory channels are populated symmetrically or asymmetrically. When the channels are populated symmetrically, with identical memory modules, the DualDDR2 memory controller uses a finer-grain interleaving mode. If the channels are populated asymmetrically though, the DualDDR2 memory controller drops down to a coarser-grain interleaving mode. For the highest performance, it's best to populate both channels with matched memory, because system performance will be somewhat lower when the two channels are coarsely interleaved. We should mention, however, that the memory controller in nForce 4 SLI Intel Edition always operates in a 128-bit wide mode, regardless of whether or not the two memory channels are populated with matched DIMMs. Some competing solutions drop down to 64-bit when not populated symmetrically.

NVIDIA's memory controller can operate the memory interface at 667MHz (peak 10.6GB/s sent in parallel) and higher data rates, and includes other features designed to improve performance. One enhancement is a dedicated address and command bus (also known as the address bus) for each DIMM. Providing a dedicated address bus to each DIMM, rather than sharing busses across multiple DIMMs, should improve performance because there are fewer shared resources and the memory controller can also operate with a 1T command rate, which reduces overall memory latency.

   
Figure 1                                                           Figure 2

QuickSynch
As many of the Athlon 64 tweakers out there know, running a system with a 1T command rate is typically faster than running it with a 2T command rate, because 2T timing is equivalent to adding a full clock cycle to the CAS latency (Figure 1). The NVIDIA nForce 4 SLI Intel Edition is also designed to allow the use of asynchronous FSB and memory speeds with negligible impact on system performance. The nForce 4 SLI Intel Edition's memory controller features NVIDIA's new QuickSync synchronization technology that transfers memory requests and data between the FSB and memory clock domains in the shortest amount of time. QuickSync accomplishes this by speeding up the internal paths between the FSB clock domain and the memory clock domain as the FSB bus speed and/or the memory bus speed increases. QuickSync ensures that the memory controller has the shortest latency between receiving / placing CPU requests, and between receiving the data from memory and sending it to the CPU for all FSB and memory speeds (Figure 2).

DASP 3.0
NVIDIA's DASP (Dynamic Adaptive Speculatice Preprocessor is also back with the Intel nForce 4 SLI Intel Edition, only this time with more sophisticated data pre-fetch algorithms. DASP 3.0's preprocessors track each thread, and attempt to prefetch appropriate data, much like the pre-fetch logic in a CPU. Each DASP preprocessor can select the most effective prediction algorithm for its assigned thread. The preprocessors are also designed to be adaptive so that as a thread is executed, they can tweak the prediction algorithm on the fly, choose a different algorithm, or even create a hybrid algorithm that is a composite of multiple prefetch algorithms.


Related content