NVIDIA GeForce GTX Titan X Review: Efficient, Powerful
Introduction and Specifications
Today though, we can give you the full scoop. We’ve have a GeForce GTX Titan X in house for a little while now and have taken it for a spin, alongside some of NVIDIA’s other high-end cards—AMD’s too. We’ll have plenty of salacious info to share on the pages ahead, but first up we present you with the GeForce GTX Titan X’s main features and specifications, followed by some details regarding the GM200 GPU at the heart of the card. Check them out, and then strap yourself in as we take NVIDIA’s most powerful GPU to date for a ride...
NVIDIA GeForce GTX Titan X
|Graphics Processing Clusters||6|
|CUDA Cores (single precision)||3072|
|Memory Clock (Data rate)||3505MHz (Effective Speed - ~7Gbps)|
|L2 Cache Size||3075KB|
|Total Video Memory||12288 MB GDDR5|
|Total Memory Bandwidth||336.5 GB/s
|Texture Filtering Rate (Bilinear)||192 GigaTexels/sec
|Fabrication Process||28 nm
|Transistor Count||8 Billion|
3 x Display Port
|Form Factor||Dual Slot
|Power Connectors||One 6-Pin, One 8-Pin|
|Recommended Power Supply||600 Watts
|Thermal Design Power (TDP)||250 Watts
|Price||$999 MSRP - Find Them At Amazon|
The new GeForce GTX Titan X looks much like previous-gen GeForce GTX-branded graphics cards that feature NVIDIA’s in-house reference cooler. We’ll talk more about the card itself on the next page—for now, let’s talk a bit about the massive GM200 GPU at the heart of the card.
The fully-loaded GeForce GTX Titan X has a base clock of 1000MHz and a Boost clock of 1075MHz. The GPU is packing 3072 CUDA cores, with 192 texture units, and 96 ROPs. GeForce GTX Titan X cards will feature a whopping 12GB of fast 7GHz (effective GDDR5 data rate) memory and the memory links to the GPU via a wide 384-bit interface. At its reference clocks, the Titan X offers up a peak textured fillrate of 192 GTexels/s and 336.5 GB/s of memory bandwidth; those numbers are significantly higher than the GeForce GTX 980, but might seem low in light of the GeForce GTX 780 Ti’s 210 GTexels/s and 336 GB/s of memory bandwidth, but NVIDIA’s Maxwell architecture has other advantages which aid in performance and efficient utilization of resources.
While Maxwell is a newer GPU architecture for NVIDIA, the GM200 GPU does not leverage a new manufacturing process. The 8 billion transistor GM200 is still built on TSMC’s 28nm process. NVIDIA was able to optimize power efficiency, however, without moving to a new process, by tweaking virtually every part of the GPU. NVIDIA took what it learned with Kepler and its Tegra SoCs and put much of that knowledge into Maxwell. Maxwell is designed to boost efficiency through better GPU utilization, and ultimately improve performance per watt and per die area. NVIDIA claims that Maxwell SMs (Streaming Multiprocessors) offer double the performance of Kepler’s and double the perf per watt as well.
Maxwell’s Streaming Multiprocessors, or SMs, are also somewhat different than Kepler’s. With Maxwell, NVIDIA has made improvements to the control logic partitions for better workload balancing, and it also has finer-grained clock-gating and better compiler-based scheduling. Maxwell can also issue more instructions per clock cycle, all of which allow the Maxwell SM (also called an SMM in some NVIDIA docs) to exceed Kepler’s SMX in terms of efficiency. NVIDIA is claiming that Maxwell’s new SM architecture can deliver 40% more performance per CUDA Core on shader-limited workloads than Kepler, with up to double the performance per watt, despite using the same 28nm manufacturing process.
The GM200 GPU contains six GPCs, up to 24 Maxwell Streaming Multiprocessors (SM), and six 64-bit memory controller partitions (384-bit total). Each SM is partitioned into four separate processing blocks, each with its own instruction buffer, scheduler and 32 CUDA cores. With Kepler, the control logic had to route and schedule traffic to 192 CUDA cores, which were harder to keep fully utilized. This partitioning simplifies the design and scheduling logic, saving area and power, and reduces computation latency. The compute L1 cache function has now also been combined with the texture cache function, and shared memory is a separate unit shared across all four blocks.
There is more to Maxwell than just tech specs, however. Since we have covered many of the new features in NVIDIA’s “big” Maxwell GPU, we won’t do it again here, but will point you to our GeForce GTX 980 launch piece, in which we cover NVIDIA’s new memory compression technology, VXGI, Dynamic Super Resolution, MFAA, and VR Direct-related features like Asynchronous Warp.