NVIDIA GeForce GTX Titan X Review: Efficient, Powerful

Introduction and Specifications

A couple of weeks back at GDC, in a bit of a surprise move considering NVIDIA’s CEO Jen Hsun Huang just left the stage at the company’s own GPU Technology Conference (GTC), the GeForce GTX Titan X was unveiled. The unveiling, which took place during one of EPIC’s talks, was somewhat casual and only a couple of details were disclosed. Jen Hsun said that GeForce GTX Titan X cards featured 12GB of memory and a GPU that packed in roughly 8 billion transistors. Besides whatever we could discern from a few quick pictures, no other details were given.

Today though, we can give you the full scoop. We’ve have a GeForce GTX Titan X in house for a little while now and have taken it for a spin, alongside some of NVIDIA’s other high-end cards—AMD’s too. We’ll have plenty of salacious info to share on the pages ahead, but first up we present you with the GeForce GTX Titan X’s main features and specifications, followed by some details regarding the GM200 GPU at the heart of the card. Check them out, and then strap yourself in as we take NVIDIA’s most powerful GPU to date for a ride...
titan x angle 2
NVIDIA GeForce GTX Titan X
Specifications & Features

NVIDIA GeForce GTX Titan X

Graphics Processing Clusters 6
Streaming Multiprocessors 24
CUDA Cores (single precision) 3072
Texture Units 192
ROP Units 96
Base Clock 1000MHz
Boost Clock 1075MHz
Memory Clock (Data rate) 3505MHz (Effective Speed - ~7Gbps)
L2 Cache Size 3075KB 
Total Video Memory 12288 MB GDDR5
Memory Interface 384-Bit
Total Memory Bandwidth 336.5 GB/s
Texture Filtering Rate (Bilinear) 192 GigaTexels/sec
Fabrication Process 28 nm
Transistor Count 8 Billion

3 x Display Port
1 x Dual-Link DVI
1 x HDMI

Form Factor Dual Slot
Power Connectors One 6-Pin, One 8-Pin
Recommended Power Supply 600 Watts
Thermal Design Power (TDP) 250 Watts
Thermal Threshold 91°C
Price $999 MSRP - Find Them At Amazon

The new GeForce GTX Titan X looks much like previous-gen GeForce GTX-branded graphics cards that feature NVIDIA’s in-house reference cooler. We’ll talk more about the card itself on the next page—for now, let’s talk a bit about the massive GM200 GPU at the heart of the card.

The fully-loaded GeForce GTX Titan X has a base clock of 1000MHz and a Boost clock of 1075MHz. The GPU is packing 3072 CUDA cores, with 192 texture units, and 96 ROPs. GeForce GTX Titan X cards will feature a whopping 12GB of fast 7GHz (effective GDDR5 data rate) memory and the memory links to the GPU via a wide 384-bit interface. At its reference clocks, the Titan X offers up a peak textured fillrate of 192 GTexels/s and 336.5 GB/s of memory bandwidth; those numbers are significantly higher than the GeForce GTX 980, but might seem low in light of the GeForce GTX 780 Ti’s 210 GTexels/s and 336 GB/s of memory bandwidth, but NVIDIA’s Maxwell architecture has other advantages which aid in performance and efficient utilization of resources.
block diag
NVIDIA GM200 GPU Block Diagram

While Maxwell is a newer GPU architecture for NVIDIA, the GM200 GPU does not leverage a new manufacturing process. The 8 billion transistor GM200 is still built on TSMC’s 28nm process. NVIDIA was able to optimize power efficiency, however, without moving to a new process, by tweaking virtually every part of the GPU. NVIDIA took what it learned with Kepler and its Tegra SoCs and put much of that knowledge into Maxwell. Maxwell is designed to boost efficiency through better GPU utilization, and ultimately improve performance per watt and per die area. NVIDIA claims that Maxwell SMs (Streaming Multiprocessors) offer double the performance of Kepler’s and double the perf per watt as well.

Maxwell’s Streaming Multiprocessors, or SMs, are also somewhat different than Kepler’s. With Maxwell, NVIDIA has made improvements to the control logic partitions for better workload balancing, and it also has finer-grained clock-gating and better compiler-based scheduling. Maxwell can also issue more instructions per clock cycle, all of which allow the Maxwell SM (also called an SMM in some NVIDIA docs) to exceed Kepler’s SMX in terms of efficiency. NVIDIA is claiming that Maxwell’s new SM architecture can deliver 40% more performance per CUDA Core on shader-limited workloads than Kepler, with up to double the performance per watt, despite using the same 28nm manufacturing process.

The GM200 GPU contains six GPCs, up to 24 Maxwell Streaming Multiprocessors (SM), and six 64-bit memory controller partitions (384-bit total). Each SM is partitioned into four separate processing blocks, each with its own instruction buffer, scheduler and 32 CUDA cores. With Kepler, the control logic had to route and schedule traffic to 192 CUDA cores, which were harder to keep fully utilized. This partitioning simplifies the design and scheduling logic, saving area and power, and reduces computation latency. The compute L1 cache function has now also been combined with the texture cache function, and shared memory is a separate unit shared across all four blocks.

There is more to Maxwell than just tech specs, however. Since we have covered many of the new features in NVIDIA’s “big” Maxwell GPU, we won’t do it again here, but will point you to our GeForce GTX 980 launch piece, in which we cover NVIDIA’s new memory compression technology, VXGI, Dynamic Super Resolution, MFAA, and VR Direct-related features like Asynchronous Warp.

Related content