Amazon Offers New Cloud-Based High Performance Computing

Amazon has long touted its ECS (Elastic Compute Cloud) as a flexible service for companies that need a certain amount of server time to test programs or features, but don't want to invest the time and effort themselves. Now, the company has added additional HPC (High Performance Computing) capabilities that are typically targeted towards large-scale enterprise or university buildouts. These are precisely the sorts of organizations that typically can afford to invest time/money, but Amazon is targeting potential customers that might be restrained either by a lack of available CPU time or those that are physically unable to install new/different hardware.

According to Amazon's CTO Werner Vogels, EC2 customers have used Amazon's service for HPC applications since launch day. This new announcement signals the delivery of several new capabilities and performance improvements. From Vogels' blog:
Amazon EC2 and Elastic Map Reduce have been successful in freeing some HPC customers with highly parallelized workloads from the typical challenges
of HPC infrastructure...[but] there were several classes of HPC workloads for which the existing instance types of Amazon EC2 have not been the right solution. In particular this has been true for applications based on algorithms - often MPI-based - that depend on frequent low-latency communication and/or require significant cross sectional bandwidth. Additionally, many high-end HPC applications take advantage of knowing their in-house hardware platforms to achieve major speedup by exploiting the specific processor architecture. There has been no easy way for developers to do this in Amazon EC2... until today.
Under the new plan, customers will be able to rent Cluster Computer Instances (CCIs) that have been specifically engineered for high bandwidth, low-latency communication. Each instance within the cluster has access to up to 10Gb/s of bandwidth when communicating with other instances. This sort of capability could come in very handy for clients who want to model the performance delta of changing certain processes on a small scale before attempting to deploy them across the entire HPC cluster. While SOP for years has been to keep a very small test cluster around for precisely that purpose,

Amazon may be able to offer a mid-sized cluster that's useful for catching scaling problems that don't occur at a very small level but appear later on. Depending on the terms and conditions of Amazon's rental policy, it might also be cheaper to use Amazon's service to expand an HPC system nearly full-time, or to keep more system resources running while overhauling part of the network.


I wonder if anyone has ever bought an EC2 server and put it to work on Folding@Home?

The company  lists two particular reasons to use its Cluster Compute program rather than the standard EC2 platform. The first is the low-latency bandwidth we've discussed; the second is: "Cluster Compute instances include the specific processor architecture in their definition to allow developers to tune their applications by compiling applications for that specific processor architecture in order to achieve optimal performance."

Major HPC buildouts are where you often find a high degree of system optimizations in use, as it can be worth the time and effort to code to the metal (meaning code to the particular strengths and weaknesses of any HPC cluster.) As with EC2, Amazon will offer a variety of price points and server structures. The largest will feature 23GB of RAM, 33.5 Compute Units (2x Intel Xeon X5570), 1,607 GB storage, and 64-bit support. We figure it'll run Crysis pretty well.