Purdue Software Throttles Servers With Rising Temps, Saves System Failure

What if your future GPU or CPU could throttle itself back when it got too hot, rather than overheating and causing errors or a shutdown? It may be possible, but for now, this is a reality in the server realm. Overheating doesn't happen nearly as often these days when it comes to consumer computing, but once you add in overclocking, serial jobs or multiple linked computers, things can get messy when temperatures rise above a certain level. Particularly with servers, many are designed to shut completely off if temperatures soar beyond a certain point in order to save the machine from irreparable damage.

But shutting down a machine in order to save it from melting, so to speak, isn't exactly the best option. Many long-term tasks that take months to finish can be ruined, requiring that the task be started again. That can cost companies and universities weeks, if not months, in research time. A new software developed by Patrick Finnegan, a systems administrator at Purdue University, enables servers to sense when temperatures are about to rise above a certain point and instead of shutting down and waving a white flag, they simply throttle back extensively until cooling machines catch up. This kills about 70% to 80% of the workflow, but users do see a power savings, and moreover, no work is lost. It's better to have a task slowed than to lose it forever, or at least that's the prevailing logic.


The software is now being sold for $250. Finnegan designed the software using a "clock frequency scaling driver available for the Linux kernel, which can control both Intel and AMD chipsets with frequency scaling capabilities" It also "also relies on Altair job scheduling software as well as a set of cluster management tools from the U.S. Department of Energy's Oak Ridge National Laboratory." Purdue itself has used the software to save system failures twice already, and it worked great both times. Here's hoping for even more fail-safe options for our own computers of the future.