ARM Cortex-A73 Taps 10nm FinFET And Burly Mali-G71 GPU For Smartphone VR Revolution

Name: ARM Cortex-A73
Brand: ARM

by Brandon Hill — Sunday, May 29, 2016, 11:00 PM EDT

ARM Shrinks To TSMC 10nm FinFET Process For Cortex-A73

Earlier this month, we told you a bit about the next generation 64-bit ARM v8-A mobile processor core, codenamed Artemis. Artemis is built using TSMC’s 10nm FinFET process technology and promises some pretty sizable performance and efficiency gains over the Cortex-A72, which is built using a 16nm FinFET process.

Today, we’re able to disclose the official name and a number of details for ARM's next-gen mobile processor core and a new GPU as well: Cortex-A73 and Mali-G71. ARM is billing the 11-stage pipeline Cortex-A73 as the the world’s most efficient premium mobile CPU, as it offers up to 30 percent greater performance than the outgoing Cortex-A72 while operating within a similar or lower power envelope. ARM is really driving home the point that it has designed Cortex-A73 to offer greater sustained performance than its predecessor (and twice that of the Cortex-A57), while at the same time exhibiting a smaller gap between sustained and peak performance (see slide below).

As you can see, the Cortex-A73 offers 2.1x and 1.3x the sustained performance of the Cortex-A57 and Cortex-A72, respectively. And as with previous architectures, Cortex-A73 can be used in big.LITTLE configurations (paired with either a Cortex-A53 or Cortex-A35 depending on the hardware application). With Cortex-A73 big.LITTLE configurations, ARM is also using Energy Aware Scheduling (EAS), which now includes generic energy modules for task scheduling in mainline Linux builds.

ARM has made a wealth of enhancements to the Cortex-A73’s microarchitecture in the never-ending quest to improve both performance and efficiency. Optimizations have been made via improved prefetching (with a 64KB I-cache, 4-way, 64B cache line size), power-optimized RAM organization, while at the same time enabling higher instructions per clock (IPC) by splitting instructions early into Micro-OPs.

A more efficient branch predictor includes a larger BTAC structure, optimized SRAM organization, 64-entry Micro-BTAC, and an on-demand 2-way x 256 entry indirect predictor. You’ll also find a more efficient 2-wide superscalar engine with out-of-order branch capabilities, improved AArch64 and AArch32 resource sharing, and an improved issue-queue load-balancing algorithm.

The memory system is capable of full out-of-order dual-issue load and store, VIPT (Virtually Indexed, Physically Tagged) data cache, and enhanced L1/L2 auto-prefetching. L2 cache performance has been improved via improved interleave access arbitration, enhanced smart cache replacement policy and the ability to sustain parallel data streams without taking a performance hit (which is critical for improving overall multi-core scaling performance).

So, how do all of these optimizations stack up in the real world? ARM says that Cortex-A73 provides performance improvements of up to 10 percent, 5 percent, and 15 percent respectively in BBench, NEON, and JMC Stream Copy (for memory throughput tests). As for power efficiency compared to Cortex-A72, Cortex-A73 exhibits a roughly 20 percent power savings in the real world in integer, floating point, and L2 cache copy operations.