US Air Force Finds AI Brittle And Not Great At Tactical Targeting, For Now

Fighter Jets
The United States' armed services have been trying to improve their tactical gear using artificial intelligence (AI) for awhile now—not so much in the form of adorable robotic companions, but instead, using deep learning systems to optimize functions like object tracking and target recognition.

It seems like those efforts aren't going that well, at least judging by the latest statement on the matter. The US Air Force's Maj. Gen. Daniel Simpson said yesterday that an experimental target recognition program performed well in perfect conditions, but completely fell apart as soon as it was asked to do its job in a new context. The AI in question was trained to look for a single surface-to-air missile from an oblique angle, and then asked to look for multiple missiles at a near-vertical angle.

Simpson noted that the low accuracy rate of the algorithm wasn't surprising and wasn't the most concerning part of the exercise. Despite being correct only 25% of the time, the algorithm itself displayed a 90% confidence interval in its choices. In the Maj. Gen.'s words, "it was confidently wrong." That's a real concern for a system that people's lives could depend on, but it's not that the algorithm is bad—it was just poorly trained. There's a big difference between AI algorithms when it comes to training and inference. In this case, the AI was trying to infer its target's attributes, but simply needed more and better training to do so effectively.

Really, it's no surprise that the poorly-trained AI couldn't perform to expectations. As Simpson said, "It actually was accurate maybe about 25% of the time." This is a perfect demonstration of the "brittle AI" problem, where a neural network trained on a narrow dataset is completely unable to perform its expected task even slightly outside of that dataset.

soldiers and usaf c17 clearing the path
A USAF C-17 "Clears The Path" - Credit: US Air Force

More academically, an AI is "brittle" when it "cannot generalize or adapt to conditions outside of a narrow set of assumptions." That definition comes from researcher and former Navy aviator Missy Cummings. Essentially, when you train an AI, you need to use as broad a data set as possible, because that allows the algorithm to become versatile and more able to perform its function in varying conditions.

In this specific case, the targeting system was trained using data from sensors at a certain vantage point, then asked to operate on data from another vantage point. Obviously, it didn't work well, but this is a problem that the military faces because it's challenging to have a wide array of data for some of the things the military wants AI to be good at.

For example, when training a self-driving car, it's relatively easy to have lots of training data from various angles and locales in various conditions because there are very few limitations on collecting that data. Meanwhile, it's pretty difficult to get pictures of Chinese or Russian surface-to-air missiles.

A possible solution to this problem is to use synthetic training data, similar to that generated by NVIDIA's Omniverse Replicator. The idea is that researchers can create a close-to-life facsimile of the object they want the AI to be able to detect, and then train it on the virtual version in scenarios where it isn't possible to get real-life data.