AMD, Penguin Computing Deploy First Server APUs With Curious Results

AMD's John Freuhe posted a blog entry yesterday detailing the company's first efforts to deploy Fusion-style APUs in a server environment. Penguin Computing has built and installed a cluster of 104 Infiniband-linked servers, each powered by a Llano A8-3850 processor. The company claims that the single-socket, 2U server offers unique advantages thanks to the Fusion architecture's onboard GPU.

With the Altus 2A00, Penguin is the first to bring AMD’s unique APU capabilities to the HPC community,” says Phil Pokorny, CTO Penguin Computing. “We are extremely proud of our successful deployment of this platform on such a large scale. We believe that the high level of integration and the resulting benefits for HPC users will further accelerate the adoption of the GPU processing model in HPC. The APU architecture has the potential to become a key component of future exascale systems.“


Suddenly, servers

"As we progress closer to the exascale era, it’s clear that the traditional paradigms of supercomputing are continuing to evolve and require new technologies to keep pace with the rapid levels of innovation that HPC customers demand,” said John Byrne, corporate vice president and general manager for Americas Mega Region, AMD. “With industry-leading CPU and GPU technology, AMD has the pieces to assemble a wide range of solutions for HPC deployments, and now HPC customers can leverage the power and efficiency of AMD’s APU technology with Penguin Computing’s Altus 2A00 server.”

Draining An Ocean With 100 Straws

AMD has said virtually nothing about its long-term plans for server APU products and the company had no additional comment to make when we asked for more information. The Altus 2A00 server makes some sense as a proof-of-concept product—but that's about all. For all Penguin's talk of leveraging the additional capabilities of the APU, Llano's best-in-class consumer integrated graphics performance doesn't compare all that well to the compute capabilities of discrete GPUs. Llano's integrated GPU doesn't support double-precision floating point and the performance of the entire cluster is rated at 59.6 TFlops.

That is, to be sure, quite a lot of FLOPs—but it's also 104 servers, each of which uses a boxed processor with a 100W TDP and a retail price of $139. By the time one factors in the cost of the entire server, even assuming a substantial CPU discount, it's not at all clear that a cluster of 104 linked servers using AMD's last-generation GPU architecture with a lop-sided interconnect is a better financial investment than a much smaller deployment of AMD's high-end FireStream commercial GPUs or even a cluster of Radeon 6970s.

Long term, AMD's plans for its Fusion products mean that the GPU will become an extremely potent compute node that can be leveraged for precisely the sort of workload the company claims to be targeting today. At the moment, this seems more like a tech demo than a major design win, and the company's refusal to discuss any server product plans it might have around Brazos or future Trinity parts seems an indirect confirmation of our suspicion. Such parts are coming—but not anytime soon.