Intel Tremont CPU Microarchitecture: Power Efficient, High-Performance x86

Name: Intel Tremont
Brand: Intel

by Marco Chiappetta — Thursday, October 24, 2019, 01:30 PM EDT

Late last year, at its Architecture Day event, Intel revealed a new, low-power microarchitecture, codenamed Tremont, that would power and array of processors and SoCs targeting products across the client, data center, 5G networking, and Internet of Things markets. While Intel did disclose the codename and show-off a Foveros-based SoC featuring Tremont -- codenamed Lakefield -- it did not dive deep on the microarchitecture or discuss its inner-workings.

Today, however, at the Linley Fall Processor Conference that’s currently underway, Intel discussed Tremont in-depth and revealed its main features, microarchitectural enhancements, new instructions, and expected performance levels.

Intel's Tremont Architecture Is A Significant Departure From Its Predecessors

Tremont is a low-power, 10nm x86 microarchitecture that is the successor to Goldmont Plus, which is used on current Atom, Pentium Silver, and some Celeron series processors. Tremont is destined for compact, low-power packages and incorporates a number of updates to the ISA, enhanced security features, more advanced power management, and it delivers significant IPC (Instructions per Cycle) improvements gen-over-gen versus Intel’s current low power x86 architectures.

Tremont is also significant departure from Goldmont Plus and its predecessors. Tremont features an Intel Core-Class branch predictor, with 6-wide out-of-order instruction decoder on the front end, with 4 wide allocation, 10 execution ports on the back-end, and dual load and store pipelines. Tremont is designed with up to quad-cores in mind, with up to 4.5MB of L2 cache, but the actual cache configuration will be dependent on the specific product design.

Branch prediction in Tremont has long history support and is 32 byte based. The L1 predictor has no branch penalty and the L2 predictor is larger than previous-gen products. The fetcher features a 32KB instruction cache (32 bytes per cycle) which can handle up to 8 outstanding misses and still allow the processor to continue executing instructions.

The 6-wide x86 instruction decoder in Tremont is split into dual, 3-wide clusters. As mentioned, it is an out-of-order design with wide decode support, without using area for a uOP cache. The design can also be scaled back to a single, 3-wide setup, depending on the target product’s design.

Tremont has an out-of-order window >200 (208, specifically), with 6 parallel reservation stations. It features 3 ALUs, 2 AGUs, 1 jump port, and 1 store data port.

There are also dual 128b AES units present in Tremont, which can handle a single SHA256 instruction for encryption workloads in only 4 cycles. Galois Field (GF) new instruction support is present as well. There are two parallel reservation stations in the design, with three execution ports.

Tremont features dual load/store pipelines, with 32KBof data cache (3-cycle), and a 1024 entry second level translation lookaside buffer.

Tremont's L2 cache is shared across the cores and can scale from 1.5MB on up to 4.5MB. There is also Last Level cache support built in, which is a first for Intel’s low-power designs, though it will not be implemented in every design. Intel Resource Directory Technology incorporates QoS for the L2 and LLC to optimize performance and use of bandwidth.

Tags: Intel, CPU, processor, Low-power, tremont, microarchitecture

Marco Chiappetta

Marco's interest in computing and technology dates all the way back to his early childhood. Even before being exposed to the Commodore P.E.T. and later the Commodore 64 in the early ‘80s, he was interested in electricity and electronics, and he still has the modded AFX cars and shop-worn soldering irons to prove it. Once he got his hands on his own Commodore 64, however, computing became Marco's passion. Throughout his academic and professional lives, Marco has worked with virtually every major platform from the TRS-80 and Amiga, to today's high end, multi-core servers. Over the years, he has worked in many fields related to technology and computing, including system design, assembly and sales, professional quality assurance testing, and technical writing. In addition to being the Managing Editor here at HotHardware for close to 15 years, Marco is also a freelance writer whose work has been published in a number of PC and technology related print publications and he is a regular fixture on HotHardware’s own Two and a Half Geeks webcast. - Contact: marco(at)hothardware(dot)com