Intel, NC State Researchers Supercharge Chip-to-Chip Communication Performance By Up To 12x

In a world where multi-core processors are now the norm, not the exception, the focus has largely (though not entirely) shifted from raw clockspeeds to architectural enhancements in order to continue delivering performance boosts with each new generation CPU. With that in mind, researchers from North Carolina State University and Intel have come up with a solution that could have a significant impact on the speed at which cores communicate with one another.

Today's multi-core processors coordinate workflow by sending and receiving software commands between cores. The individual cores have to read and execute the software commands, which can be time consuming. What the researchers at NC State found is that they could speed things up by switching to a hardware solution for core-to-core communication. Instead of relying on software, built-in hardware coordinates the communication between cores.


"This approach, called the core-to-core communication acceleration framework (CAF), improves communication performance by two to 12 times," says Yan Solihin, a professor of electrical and computer engineering at NC State and co-author of a paper on the work. "In other words, the execution times – from start to finish – are twice as fast or faster."

The hardware solution consists of what's called a queue management device (QMD). It's a small device that attaches to the processor network on a CPU. The QMD has the ability to perform basic computational functions and keep track of communication requests between multiple cores.

This essentially bypasses the use of shared memory space between cores. The researchers point out that, by its nature, shared memory communication is susceptible to coherence validations and cache misses that lead to large performance overheads and a high amount of network traffic.

"Many important workloads incur significant core-to-core communication and are affected significantly by the costs, including pipelined packet processing which is widely used in software-based networking solutions. In these workloads, threads run on different cores and pass packets from one core to another for different stages of processing using software queues," the researchers state.

The aforementioned CAF offloads a large portion of the communication to the QMD, which in turn reduces the queue-induced communication overhead. This in and of itself isn't necessarily a game changing technology, but is one of several on-chip solutions the researchers and Intel are looking at to accelerate more multi-core computations.