Intel's Nervana Neural Network Processor Aims To Beat The Teen Spirit Out Of NVIDIA

Today, Intel announced something that it hopes will help in the fight against NVIDIA in the AI market. Intel's new processor is called the Nervana Neural Network Processor, and previously went by the codename "Lake Crest." Intel says that it has been working to make the Nervana processor for three years and it is now ready to reveal some details on the processor.


Nervana is a purpose-built architecture for deep learning and the primary goal for the hardware team was to provide flexibility needed for deep learning primitives and to make the core hardware as efficient as possible. Intel says that it designed Nervana to cut the chains holding back existing hardware not specifically designed for AI.

That is a bit of a jab at NVIDIA, which has lead the AI and machine learning market with its GPU-based hardware. Intel's Naveen Rao says that by eliminating the GPU heritage, Intel was able to optimize the Nervana chip specifically for AI workloads with optimizations not possible on other hardware.

Intel's new chip is designed with high-speed on- and off-chip interconnects allowing bidirectional data transfers. During the design of Nervana, Intel had a stated goal of achieving true model parallelism allowing multiple chips to act as one, allowing the accommodation of larger models so users can get more insight from their data.

Rao told Fast Company, "In neural networks…you know ahead of time where the data’s coming from, what operation you’re going to apply to that data, and where the output is going to."

Nervana lacks a normal cache hierarchy and instead uses on-chip memory that is managed by the software directly. That improved memory management allows the chip to gain high levels of utilization of the massive amount of compute on each die. Intel says, "This translates to achieving faster training time for Deep Learning models."

Intel also invented a new numeric format for Nervana called Flexpoint. "Flexpoint allows scalar computations to be implemented as fixed-point multiplications and additions while allowing for large dynamic range using a shared exponent. Since each circuit is smaller, this results in a vast increase in parallelism on a die while simultaneously decreasing power per computation," wrote Rao.

Intel is pushing hard to reach a goal of a 100x increase in deep learning training performance by 2020. The chipmaker says that it is on track to exceed that goal. Intel also notes that it is working with Facebook to gain technical insights in bringing the Nervana chip and the new generation of AI hardware it represents to market.