FreeBSD Programmers Report Ryzen SMT Bug That Hangs Or Resets Machines

IMG 7353
It's starting to look like there's an inherent bug with AMD's Zen-based chips that is causing issues on Unix-based operating systems, with both Linux and FreeBSD confirmed. The bug doesn't just affect Ryzen desktop chips, but also AMD's enterprise EPYC chips. It seems safe to assume that Threadripper will bundle it in, as well.

It's not entirely clear what is causing the issue, but it's related to the CPU being maxed out in operations, thus causing data to get shifted around in memory, ultimately resulting in unstable software. If the bug is exercised a certain way, it can even cause machines to reset.

The revelation about the issue on FreeBSD was posted to the official repository, where the issue is said to happen when threads can lock up, and then cause the system to become unstable. Getting rid of the issue seems as simple as disabling SMT, but that would then negate the benefits provided by having so many threads at-the-ready.

PTS Segfault
segfaults appearing while running Phoronix Test Suite benchmarks

On the Linux side of the Unix fence, Phoronix reports on similar issues, where stressing Zen chips with intensive benchmarks can cause one segmentation fault after another. The issue is so profound, that Phoronix Test Suite developer Michael Larabel introduced a special test that can be run to act as a bit of a proof-of-concept. To test another way, PTS can be run with this command:

PTS_CONCURRENT_TEST_RUNS=4 TOTAL_LOOP_TIME=60 phoronix-test-suite stress-run build-linux-kernel build-php build-apache build-imagemagick

Running this command will compile four different software projects at once, over and over, for an hour. Before long, segfaults should begin to appear (as seen in the shot above).

It's not entirely clear if both sets of issues here are related, but seeing as both involve stressing the CPU to its limit, it seems likely. Whether or not this could be patched on a kernel or EFI level is something yet to be seen.