Many-core processors are apparently the new black for 2011. Intel continues to work on both its single chip cloud computer and Knights Corner
made headlines earlier this year, and now a new company, Adapteva, has announced its own entry into the field.
Epiphany Block Diagram - Source: Adapteva
The Epiphany architecture is an array of simple, RISC-based microprocessors. Each processor contains an ALU and FPU unit and 32K of SRAM; each processor node incorporates a router.
Epiphany vs ARM and Vivante Architectures - Source: Adapteva
The nodes communicate with each other via mesh networking; the implementation is capable of scaling up to a 64x64 array (4,096 processors). Adapteva claims that Epiphany is capable of delivering unprecedented performance per watt, with a 16-core array offering up to 19Gflops at 270mW on a 28nm process.
There are certain similarities between Tilera and Epiphany. Both chips are capable of searching the SRAM of the chips around them in the event of a local cache miss, though the latency of the search will vary considerably. Unlike other designs, however, Epiphany is designed to be an FPU co-processor, not an independent chip. By further focusing on a particular area of work, Adapteva believes it can deliver best-in-class performance to FPU operations.
You Might Think This Sounds Familiar
If the idea of using an array of simple processors working in parallel sounds familiar, give yourself a gold star--you've been paying attention. Adapteva's Epiphany whitepaper
(written by the Linley Group) posits that an Epiphany co-processor is capable of delivering better media performance than either the ARM Cortex-A9 or Freescale Semiconductor's Vivante GC2000. The paper claims the Epiphany is capable of 71,000Mflops/W, compared to 14,000 for the Cortex-A9 and 37,000 for the Vivante GC2000. It continues: "At 71Gflops/W, it is twice as efficient as the Vivante GPU and five times better than Cortex-A9. Vivante’s GPU is designed for graphics, not pure FP performance. Cortex-A9 is hampered by its complex CPU design and its use of cache memory instead of SRAM. Cortex-A9 implements a 64-bit Neon unit; the newer Cortex-A15 includes a 128-bit Neon unit that will double FP performance, albeit at somewhat higher power."
Nvidia, of course, is eyeing precisely such workloads for its Tegra hardware, and the whitepaper notes that one advantage OpenCL-compatible devices have is that they're easier to program for and guarantee cross compatibility. The markets for Tegra 3 and Epiphany aren't identical, however. Adapteva is positioning Epiphany as a part that can snuggle up to virtually any other mobile solution while providing a marked benefit. Tegra, of course, is Nvidia's top-to-bottom solution. In theory, a company could actually deploy Epiphany alongside Tegra, if it thought there was a market for an extremely capable multimedia device that could leverage the FP capabilities of both solutions.
Epiphany will debut chips later this year built on a 28nm low-power process. The Linley Group is cautious on long-term success, writing: "Before committing extra silicon cost and design time to Epiphany, we expect mobile vendors will wait until the usage model for FP becomes better established. But if visual computing becomes an integral part of the user interface and application software, an FP accelerator could become as common as today’s video accelerators."