OpenAI And NVIDIA Collaborate On gpt-oss Open Source Reasoning Model And It Runs On GeForce
Built on a mixture-of-experts architecture and trained using NVIDIA's H100 GPUs, these models are designed for complex, multi-step reasoning tasks, like code generation, document analysis, and tool use including web search, if you enable that function.

The announcement is part of a broader push by both OpenAI and NVIDIA to make advanced AI more accessible to developers, researchers, and enthusiasts. It also underscores NVIDIA's ongoing strategy of tightly integrating its hardware and software ecosystem into the rapidly evolving open-source AI landscape. The company worked with OpenAI to optimize the new models for everything from multi-rack datacenter deployments to local inference on high-end PCs.
At the cloud scale, NVIDIA reports that its Blackwell GB200 NVL72 system can push inference performance to 1.5 million tokens per second with the gpt-oss-120b model, which is a number aimed squarely at organizations deploying large-scale AI services. Blackwell's NVFP4 4-bit precision isn't used here, but the MXFP4 format that the models do use helps keep power and memory use in check while still supporting trillion-parameter workloads in real time.
Perhaps the most noteworthy part of this release is what it means for local inference. Developers can now run the very same models on GeForce RTX and RTX PRO GPUs, with performance purportedly scaling up to 256 tokens per second on the GeForce RTX 5090. That's fast enough to support snappy interactions in local chat UIs, and the models' support for 217-token context windows opens the door to deep, document-level reasoning, something typically reserved for server-grade systems.
Fortunately, setup is also more streamlined than in the past. The Ollama app now includes official support for the gpt-oss models, allowing users to load, chat, and tinker with them right on their own systems. File attachments, context customizations, and even multimodal support are all built in—though multimodal functionality is not available with these new models. For developers, there's also CLI and SDK access, plus support across other frameworks like llama.cpp and Microsoft AI Foundry Local.
It's a notable shift: powerful reasoning models are no longer just something you access through an API. With the right hardware and a bit of setup, they can now run locally, and still be fast enough to be useful. To get started with Ollama and try these models on your own 16GB-or-more-VRAM RTX GPU, you can follow the instructions on NVIDIA's official blog. And you can try gpt-oss here on NVIDIA's platform as well.