Qualcomm Demos Stable Diffusion On Android Phones For Ultra-Low Power AI Image Generation
it's everywhere. Mobile SoCs and desktop processors alike have been including AI acceleration for a couple of generations already, but what can an individual do, locally on their own device, with AI? Until relatively recently, not much, at least directly. Within the last year, though, there have emerged some open-source projects that make working with AI locally a much more accessible proposition.
Ultimately, the result is represented in the demo video: real AI image generation without cloud acceleration, running directly on a low-power handset. And it can be done without an internet connection, in case that isn't clear. This is an impressive accomplishment, and absolutely worthy of note.
Of those, none are more infamous than the scripts that let gamers run the Stable Diffusion generative AI on their powerful graphics cards. The explosion in highly-accessible AI art generated by anyone with a GeForce card caused immense discussion in certain communities. Well, thanks to Qualcomm, Stable Diffusion just got even more accessible, because now you can run it on your Android smartphone.
Well, you can't run it, exactly—at least, not yet, because the files simply aren't available. Still, Qualcomm published a brief video (below) and a blog post boasting about the achievement. The video demonstrates a Snapdragon 8 Gen 2 powered device using a simple Stable Diffusion interface to generate a single 512x512-resolution image in 20 steps in just under 15 seconds, which is incredibly impressive. To put that in perspective, the Snapdragon 8 Gen 2 processor in the demonstration device draws less than ten watts of power.
For comparison's sake, a GeForce RTX 2060 card can draw as much as 200 watts to do the same task in only about half the time. It's fair to mention that the GeForce card can do batches of a few images with little loss of performance, but again, we're talking about a device that draws twenty times as much power—and still needs the whole rest of a PC for support. In that context, Qualcomm's achievement here is nothing short of astonishing.
We'll grant the mobile chip company its speed record, although the claim that this is the first time anyone's done this on Android needs qualification. Developer Ivon Huang got Stable Diffusion working on a Snapdragon 865 a while back, but that hacky setup took an hour to make a single image. Most likely it was running purely on the Snapdragon's 64-bit ARM CPUs, and not the newer SoC's accelerators. It's technically possible to run Stable Diffusion on almost any system employing just OpenCL but clearly it won't be efficient, and that's where Qualcomm's AI stack comes into play.
Indeed, as Qualcomm itself explains, this is only possible through "full-stack" optimizations. You see, image generation using Stable Diffusion is a many-step process involving more than one AI model, and Qualcomm's AI Research division had to optimize the process for its Snapdragon SoCs along every part of the path. The most important change was quantizing the open-source Stable Diffusion 1.5 model from the FP32 datatype (favored by GPUs) to the lower-precision INT8 datatype.
Normally, this would have a deleterious effect on the results you'd get from the model, but Qualcomm says that using AIMET—the "AI Model Efficiency Toolkit" quantization tool—it is able to increase performance and save power by reducing the memory bandwidth required for inferencing, with minimal impact on accuracy. In addition to the Stable Diffusion foundational model, the other AIs used in image generation, including the text encoder and the variational autoencoder, were also converted. This was necessary so that the software would fit on the target device to begin with.