Intel Details A Silicon Photonics Processor With 8 Cores And A Whopping 528 Threads

hero intel piuma
Some workloads are a single tangled-up thread with various data types and crunchy compute; for these, our modern CPU cores are perfectly-suited. Other workloads (like, say, graphics) are giant piles of a single data type that can be chomped through in big parallel bites; this kind of thing is exactly what our modern GPUs are made for.

darpa hive graph analytics
Types of graph analytics workloads. Image: DARPA

There are other types of workloads that our CPUs and GPUs aren't particularly suited for, though. One of them is graph analytics. Despite this being a massively-parallel workload, GPUs aren't suited for it because there's very little actual compute being done—it's almost entirely latency-heavy load-and-store operations. CPUs can do it at a decent rate thanks to their incredible clock rates, but the overwhelming majority of the hardware in a modern CPU goes totally unused for this kind of workload, wasting tons of power.


If there's any company in the world who can create a bespoke CPU for a specific type of workload, it's probably Intel. That's exactly what the company has done in response to the United States' DARPA's HIVE program. HIVE stands for "Hierarchical Identify Verify Exploit", and it's essentially a program by DARPA looking to build a processor specifically for the purpose of graph analytics. The program's goals include a 1,000x performance uplift at lower power consumption as compared to extant CPUs.

Well, Intel has done so. This isn't strictly a new announcement; Intel's been talking about this design for at least four years now. Originally called PUMA, and more recently called PIUMA (perhaps to distance it from AMD's Puma processors), it was known as Programmable Integrated Unified Memory Architecture, and it's an entirely new processor architecture unrelated to any of Intel's extant hardware.


This chip has eight "CPU cores" in a technical sense, but it can handle 528 program threads simultaneously. That's because each "core" has four "multi-threaded pipelines" that can handle 16 threads, with an additional two single-threaded pipelines that offer eight times the performance (on a single thread, obviously) compared to the "MTPs". In that sense, it reminds us somewhat of IBM's Cell processor, famously used in the PlayStation 3.


Why is it designed this way? Because most of the time, the individual program threads will simply be waiting on memory accesses. If that's the case, why not let them work on another thread in the meantime. In other words, accesses to the machine's custom DDR5 RAM still take so long relative to the actual workload that each core can handle 66 different threads in flight. Intel says that the average number of instructions between memory accesses is just two.


Because these chips are designed for operation on an absolutely massive scale, the main "core" dice are connected to silicon photonics engines using EMIB; four of them on each package. This totals 32 optical I/O ports that operate at 16 GB-per-second-per-direction each, totaling up to one-half TB/second of optical bandwidth.


Sixteen of these packages can fit into an Open Compute Project sled, offering 8TB/second of bandwidth and giving the sled 512GB of DDR5-4400 DRAM hooked up to the chips' custom controllers that support accesses with 8-byte granularity. Using a special "HyperX" optical mesh, Intel says that these processors can connect up to two million cores while keeping the maximum node access latency as low as 400 nanoseconds, which is incredible.


Each CPU draws around 75W "at nominal voltage and workloads", and Intel says that 59% of that power is actually used by the silicon photonics; the cores are only drawing about 16W. That makes sense considering that almost all of what they're doing is simply accessing memory. Intel has observed linear performance scaling so far, and it predicts that this scaling will continue out to nearly 10,000 cores.


Of course, these processors are purpose-built for task, and don't represent something Intel is likely to be selling on the mass market anytime soon. Even you bought one, it's not like you could run your own software on it; they use a custom RISC architecture, rather than Intel's usual x86. It's still a fascinating project, though, and we expect that Intel could have plenty of customers for a processor like this.

Images in this post from Intel via ServeTheHome.