Researchers Warn Trojan Source Attack Infects Open Source Code With Ghost Bugs

by Zak Killian — Tuesday, November 02, 2021, 03:09 PM EDT

Remember the old days of code pages and customized OSes for specific languages, like DOS/V? Unicode has more or less solved the biggest issue with displaying non-ASCII glyphs on computers, although it's still up to the operating system to implement support, of course.

Did you know you can write source in Unicode encoding, though? We reckon that's real handy for coders in territories where English fluency is lacking. Useful as it might be, it's also the attack vector for the latest terrifying security flaw: "Trojan Source." Revealed by two researchers at the University of Cambridge, Trojan Source is a way to conceal invisible malicious code inside application source using Unicode's text direction features.

It works like this: Unicode has a system called "bidi" (short for "bi-directional") that allows the use of control codes to force a change in text direction. This is important when mixing languages that read from left to right (like English and Russian) and languages that read from right to left (like Hebrew and Arabic.) These control codes can be used in source, including in comments and strings.

The thing is, while comments and strings generally have mechanisms to indicate their start and end in any given language, these bounds are almost never respected by bidi overrides. That means that by placing these override characters exclusively in comments and strings, it's possible to create code that looks like it does one thing, but actually says something completely different to the compiler.

It's rather insidious: the resulting code will look flawless to any human code reviewer, yet it could contain hand-picked backdoors just waiting to be exploited. As one of the authors explains, "If the change in logic is subtle enough to go undetected in subsequent testing, an adversary could introduce targeted vulnerabilities without being detected."

That's bad enough, but even more distressing is that most modern operating systems and editors will preserve bidi codes through copy-and-paste operations. It's common for coders to joke about their reliance on StackOverflow and similar sites, where helpful programmers produce "example" code to petitioners that then go on to simply copy-and-paste the "example" code into their applications. It's quite possible that a bad actor could carefully embed bidi codes to make their "example" code look innocuous, but actually deliver a nasty payload.

The researchers gave a 99-day embargo period to allow tool authors to patch their software, but apparently only nine of the nineteen vendors that they contacted have actually committed to releasing a patch. Considering that the flaw has already been demonstrated in JavaScript, Java, Rust, Go, Python, and most of the C variants, let's hope that this public disclosure gets some of the other vendors' butts in gear.

The one bright spot in this story is that the researchers were unable to determine a single case of this vulnerability being exploited in the wild. Of course, that doesn't mean it hasn't been exploited—nor that it won't be—but simply that no one has noticed it yet. For now, we might go ahead and manually re-type that "example" code you're implementing.

Tags: security, Programming, Unicode

Zak Killian

A 30-year PC building veteran, Zak is a modern-day Renaissance man who may not be an expert on anything, but knows just a little about nearly everything.