



Your CPU core isn't a single unit. Even putting aside things like L1 and L2 caches, and the registers where the core loads and stores values, modern processor cores have numerous functional units that do the actual work. These include things like Load/Store Units and Arithmetic Logic Units that perform simple math operations very quickly, and Floating Point Units that perform more complex math, the last of which have grown to be very large in modern CPUs with the introduction of 256-bit- and 512-bit-wide vector operations





Visualization of Hyper-Threading on a Nehalem CPU. Source: NASA

This is pretty rare on one of today's multi-core processors, though, because operating systems are smart enough to schedule busy threads on separate CPUs altogether. Windows is fully aware of the core topology of the chips it's running on, and it will try to avoid scheduling two demanding threads on the same CPU core. This does lead to the best performance, but as we noted above, it also isn't great for efficiency.

Annotated Raptor Lake die shot by JmsDoug and Fritzchens Fritz.

The presence of Hyper-Threading complicates this, though. It's true that there are times when a hyper-thread (the second thread on a core) is the best place to schedule an application thread. These are fairly rare circumstances, though, and most of the time these extra "logical cores" go entirely unused on a CPU like the Core i9-13900K. Despite that, they still occupy ports on the processor's "ring bus" . Every extra port on the CPU's ring bus adds latency and complexity to the CPU. However, without Hyper-Threading, much of each P-core will sit idle most of the time. How to resolve this?





Intel's new solution might be the technique outlined in a patent called "Methods and apparatus to schedule parallel instructions using hybrid cores." The patent was published back in June, but it's only recently getting some attention around the web. Reading over the patent is fascinating, and as usual for patent applications, it's a fairly high-level overview of the technique, but it's enough to get the gist of what Intel intends.





Current method on the left, new on the right. T=thread; numbers top and bottom are seconds.

The patent talks at length about the method, describing a self-tuning algorithm where the processor's own "Streamed Threading Circuitry" (described as a "Renting Unit" in leaks and likely an evolution of Intel's current Thread Director ) logs the amount of time each partition takes to execute, and if the estimation for execution time was wrong, the processor will begin to schedule similar partitions on the the appropriate core type: E-cores if execution completed very quickly, or P-cores if it was very slow.









The proposal reminds us somewhat of the fascinating VISC proposal from Soft Machines back in 2014. We didn't cover that at the time because it sounded like pie-in-the-sky from an unknown startup, but Intel actually purchased the fledgling CPU design firm almost immediately after it first showed its VISC concept. That idea was to improve single-threaded execution by splitting the work across multiple cores at an instruction level. Sounds pretty familiar, doesn't it?



