What if your future GPU
could throttle itself back when it got
too hot, rather than overheating and causing errors or a shutdown? It
may be possible, but for now, this is a reality in the server realm.
Overheating doesn't happen nearly as often these days when it comes to
consumer computing, but once you add in overclocking
, serial jobs or
multiple linked computers, things can get messy when temperatures rise
above a certain level. Particularly with servers, many are designed to
shut completely off if temperatures soar beyond a certain point in order
to save the machine from irreparable damage.
But shutting down a machine in order to save it from melting, so to
speak, isn't exactly the best option. Many long-term tasks that take
months to finish can be ruined, requiring that the task be started
again. That can cost companies and universities weeks, if not months, in
research time. A new software developed by Patrick Finnegan, a systems
administrator at Purdue University, enables servers to sense when
temperatures are about to rise above a certain point and instead of
shutting down and waving a white flag, they simply throttle back
extensively until cooling machines catch up. This kills about 70% to 80%
of the workflow, but users do see a power savings, and moreover, no
work is lost. It's better to have a task slowed than to lose it forever,
or at least that's the prevailing logic.
The software is now being sold for $250. Finnegan designed the software
using a "clock frequency scaling driver available for the Linux kernel,
control both Intel and AMD chipsets with frequency scaling
capabilities" It also "also relies on Altair job scheduling software as
well as a set of cluster management tools from the U.S. Department of
Energy's Oak Ridge National Laboratory." Purdue itself has used the
software to save system failures twice already, and it worked great both
times. Here's hoping for even more fail-safe options for our own
computers of the future.