NVIDIA Boosts DGX Spark Performance And Pushes New Developer Tools at CES 2026

by Ben Funk — Monday, January 05, 2026, 06:00 PM EDT

If you haven't heard of NVIDIA's DGX Spark AI developer workstation, maybe you've been living under a rock or on a deserted island with nothing but a volleyball to keep you company. It's one of the multitude of AI-focused products of 2025, something that NVIDIA announced at CES last year as Project DIGITS, and it combines NVIDIA's Grace Arm64 CPU architecture with an AI-tuned Blackwell GPU into the GB10 AI processor. While performance out of the gate didn't exactly wow us considering the $4000 price tag, NVIDIA has been hard at work at updating software performance, and the company just announced the latest round of those changes at CES 2026. Let's dive into what's new.

DGX Spark Performance Increases

Because performance was the biggest hole that we observed in our DGX Spark review at the time of its launch, NVIDIA has spent the majority of its engineering time working on ways to bring an uplift. Getting thumped by an M4 Max Mac Studio that shipped months earlier in our review probably didn't feel too good, but NVIDIA didn't sit back and let Apple march all over it.In fact, NVIDIA showed at least one image generation workflow that was 8x faster on DGX Spark than on an M4 Max MacBook Pro.

spark vs m4 max macbook pro — *Presumably the two systems did not both spontaneously generate the same image (credit: NVIDIA)*

Curiously, this slide apparently uses a placeholder, since it doesn't show the output for a subjective quality comparison. Instead the same image is used throughout. At any rate, the company is touting "up to" 2.5x performance since launch back in September, which does include updates released in November as well as what's being announced today.

Image credit: NVIDIA

Before we get too excited about huge performance optimizations, it appears that the majority of the big increases come from models that have been quantized from FP8 down to NVIDIA's proprietary NVFP4 data type. For example, the 2.5x performance gain is specifically for Qwen-235B, but that is mostly because the model has been converted for NVIDIA's latest TensorRT-LLM (often abbreviated TRT-LLM) release available as open-source software on GitHub.

This is great in a couple of ways: one being the obvious performance increase, but also in memory consumption. The DGX Spark has 128 GB of memory, of course, but the real advances are multi-model agents that need to cram even more data into memory. With memory shortages coming in fast and furious, and the vast majority of wafer allocations earmarked for high-margin datacenter hardware, conserving memory is going to be critical if AI developers intend to improve both performance and accuracy. For example, the open Nemotron models that NVIDIA released last month use NVFP4 heavily.

There are performance increases across the board, however, and NVIDIA says we can expect to see them with other frameworks, including bare CUDA increases, llama.cpp, and Pytorch both with and without NVFP4. Speculative decode isn't really a new concept, but it's seeing wider use as Eagle3 "pre-bakes" responses that then get passed off to Qwen, and that's how NVIDIA achieved its claimed 2.5x performance vs the launch period. What's unclear is what the chart above is talking about with 2.5x performance, but we're working on the assumption that it's related to time to a completed response.

Our own DGX Spark testing showed a lot of promise (credit: HotHardware)

For those unfamiliar, speculative decode lets the system decrease a key performance indicator: time to first token. If you've experimented with LM Studio LLMs on your PC you know the most agonizing period of time is the time spent thinking, and speculative decode reduces that. Basically, the concept is to have a smaller "draft" stage that gets to generating output quickly (Eagle3-120B quantized to NVFP4 in this case), and then a larger "target" that refines and completes the work. It wasn't really possible to fit a 235B-parameter model into 128GB of system memory along with a smaller draft model for speculative decode until the larger Qwen 235B was quantized to NVFP4.

Explore AI Workloads With More Tools

One of the most helpful resources NVIDIA provided during our DGX Spark review period was a healthy list of what the company calls playbooks, which provide step-by-step instructions for setup, execution, and performance metrics of a variety of AI workloads. While some may wonder how NVIDIA got into such a dominant position in the AI hardware race, one only has to look at the vast array of APIs, finely-tuned libraries, developer documentation, and supporting software to get their answer.

Several of those playbooks have seen significant updates, including the onboarding process, Open WebUI with Ollama LLMs, and playbooks built around common development tools like Visual Studio Code, retrieval-augmented generation (RAG) applications that can search the web in addition to generate text, and multi-DGX Spark tools using NVIDIA's Collective Communications Library (NCCL).

Additionaly there are a total of seven brand new playbooks around a variety of use cases: TRT-LLM speculative decoding (as discussed above), multi-modal inference, and a really interesting single-cell RNA sequencing workload. If you have a Spark (or even if you don't, but you have an NVIDIA GPU with gobs of memory like a GeForce RTX 5090) definitely check those out.

The company also has some new game creation tools to assist with generating code and assets. However, rather than being fully AI-created "slop," NVIDIA wants to augment existing textures and objects using AI to upgrade environments and textures to use custom materials and enhanced lighting.

Aspiring developers in non-game fields might be interested instead in using Nsight CUDA Copilot AI within Visual Studio Code to get some vibe coding done while keeping all data and intellectual property local to the machine rather than in the cloud. Nsight will be a exclusive to the DGX Spark or partner machines built on NVIDIA's GB10 Superchip.

Lastly, NVIDIA is bringing support for its Enterprise AI platform to the DGX Spark later this month. Enterprise AI is a bit of an umbrella term that encompasses cloud-native tools meant to help businesses build and deploy production-ready AI applications. This could be anything from robot-assisted manufacturing in factories to retail loss-prevention and medical point-of-care applications. A lot of those apps run either in datacenters or cloud providers, or on edge devices like the Jetson family of small, power-efficient single-board computers.

NVIDIA DGX Spark Conclusions

NVIDIA spent most of 2025 gearing up for the launch of the DGX Spark, the GB10 Superchip platform, and its software stack, all to make AI development with a large footprint more attainable. Thanks to broader adoption of the Blackwell-native NVFP4 data type, multi-model speculative processing, and new clever use cases, 2026 is shaping up to be even bigger for the company. New playbooks based on the latest tools and frameworks will help developers get up and running quickly, while performance and memory optimizations will make AI run better on the existing hardware. NVIDIA isn't going to let up on development on this platform for a long time to come, and we're interested to see what's next.

Tags: Nvidia, dgx spark, ces2026

Ben Funk

Ben has been fascinated by technology since he got a Commodore VIC-20 as a child in 1984. By day he's a software developer working in education technology, and at night he's a husband, dad, musician, gamer, and freelance technology writer. If he's not at his PC, Ben can be found hanging out with his family, gaming on a vintage Sega console, or grippin' and rippin' with his beloved Paul Reed Smith guitar.

Follow Ben on Twitter.

Opinions and content posted by HotHardware contributors are their own.