Meteor Lake Architecture Revealed: AI, Tiles And The Future Of Intel Core CPUs
Intel Meteor Lake Architecture Deep Dive: CPU Core Details And Thread Director Changes
The Compute Tile And New Core uArch
The actual processor cores are usually the stars of these architectural deep dives, but Meteor Lake brings such a shakeup to the platform as a whole that they have nearly been sidelined. Nevertheless, there are significant changes here as well.The Compute Tile uses a hybrid design with P-Cores and E-Cores, as was introduced with Alder Lake. However, these are now joined by the two Low Power E-Cores residing on the SOC tile to create three tiers of compute power, not wholly unlike Arm has adopted with its DynamIQ clusters.
The Redwood Cove P-Cores are the heavy lifters of the complex, designed for performance-first workloads. Redwood Cove offers a similar IPC to Golden Cove (Alder Lake and Raptor Lake) but features a larger L2 cache and increased per-core bandwidth that should help performance in many cases.
The new Crestmont E-Cores are positioned for efficient multithreaded throughput. Intel indicates modest IPC gains of 4-6% relative to Gracemont, but like the P-Cores, it brings a few other improvements. Most notably, Crestmont adds significant VNNI and ISA improvements that will help these processors better tackle AI workloads. There should also be power and frequency gains coming to the P-Cores and E-Cores in the compute tile, as a result of using the Intel 4 manufacturing process.
Then we have the LP (or Low Power Island) E-Cores. While not part of the Compute Tile, we feel it makes more sense to detail them here. We’re told these have the same Crestmont architecture as those found on the Compute Tile, but the SOC time lacks last-level cache and a high-speed ring bus. That’s likely to be of little consequence for their purpose, which is almost strictly for low usage background work.
Intel Thread Director’s New Priorities
Thread Director was introduced in Alder Lake to help the operating system schedule tasks more appropriately between the now disparate kinds of cores. Thread Director does not dictate where tasks are placed, mind you, but rather provides hints to the OS scheduler. Those with a higher QoS need are allocated to P-Cores while lower QoS tasks could be handled by E-Cores. Of course, tasks could be dynamically reclassified and moved if necessary.Meteor Lake prioritizes tasks differently. Whenever possible, it tries to contain the active processes on the LP E-Cores so that the Compute Tile can be shut down to save power. If the workload does spill over, it then powers up the Compute Tile and migrates all threads to it. Again, it starts with using just the Compute Tile E-Cores and only engages the P-Cores if there are processes that demand the higher performance.
It may seem like this could induce latency, but the idea is that Thread Director is also receiving feedback from the P- and E- Cores about how different processes are running. So in effect, Thread Director is maintaining a Feedback Table where each process is scored by its Energy Efficiency and Performance demands, which are continuously and dynamically classified by data provided by the cores. The OS scheduler reads from the Feedback Table and uses this to influence, but not outright dictate, its task scheduling on the cores.
The Compute Tile can power on and off near-instantaneously thanks to Meteor Lake’s modular power management approach. That said, Thread Director also tries to account for how long it expects a task to run for – if it anticipated the workload would complete before a migration could finish, the scheduler simply would elect to not move it, for example.
Intel walked us through two scheduling scenarios to better understand how this all operates. In the first, the system is running a high utilization foreground app across four threads on P-Cores and a new low utilization app starts up running on two threads on the E-Cores, all on the Compute Tile. The high utilization app finishes its work, but the low utilization app is still ticking, so the scheduler migrates it to the LP E-Cores and shuts down the Compute Tile to conserve power.
In the second example, the system starts with just a low utilization app running across two threads on LP E-Cores. A new high utilization app starts up occupying four threads on P-Cores. Because the Compute Tile is now active, Thread Director gets updated and notifies the OS, which then transfers the low utilization app from the LP E-Cores to the Compute Tile E-Cores.
We are very curious to see how this all shakes out in practice. If threads can be successfully corralled onto the LP E-Cores without impacting the user experience too often, this approach stands to save a lot of power. Intel has shown that for scenarios like video playback, the LP E-Cores in conjunction with the Media and Display Engines can happily keep the system running without spilling over onto the Compute Tile.