Inside Intel Tech Tour 2025: Panther Lake And Clearwater Forest Built On 18A
Panther Lake NPU 5 Details
The neural processing unit in Panther Lake is dubbed NPU 5. In terms of its peak TOPS versus NPU 4 in Lunar Lake, it doesn’t seem like a massive upgrade. NPU 5 will offer up to 50 TOPS, versus NPU 4’s (Lunar Lake's) 45 TOPs. NPU 5, however, is optimized for efficiency and has been re-tuned to better handle the latest workloads.
Intel rearchitected the way the Multiply-Accumulate array works and reconfigured and balanced its design. With NPU 4, there are two MAC arrays per Neural Compute Engine slice, with two copies of the backend units. In NPU 5, however, Intel doubled the size of the MACs per Neural Compute Engine and halved the number of backend units.
Intel also enhanced the low-level pipelines to improve performance. Intel removed need for channel padding, which results in higher occupancy and better utilization of the MAC array, so it can do more work in parallel. NPU 5 also supports FP8 datatypes natively and it now has a native FP32 post-processing pipeline. NPU 5 supports programmable activation functions as well, and can natively handle sigmoids and tangents, which were previously emulated on a DSP.
The end results are higher performance and efficiency. Even though NPU 5 is physically smaller than NPU 4, performance is mostly flat with INT8 or FP16 operations but vastly improved in other areas. NPU 5 achieves a roughly 40% improvement in TOPS per area at lower power. When counted in aggregate, the three AI engines in Panther Lake – the CPU, NPU, GPU – offer up to 180 total platform TOPS.
IPU 7.5 In Panther Lake Highlights
In recent years, video conferencing has become ubiquitous, but image quality from the front-facing cameras in most PCs has left plenty to be desired. To address those concerns, Intel has made a number of upgrades to the Image Processing Unit in Panther Lake, once again with a focus on performance and efficiency.IPU 7.5 in Panther Lake will support AI-enhanced noise reduction and local tone mapping, hardware accelerated staggered HDR, and support for three concurrent cameras, with the ability to capture 16MP stills and 120 FPS slow motion.
IPU 7.5 should increase visual quality across a variety of lighting conditions. The enhanced HDR capabilities result in wider dynamic range, AI based noise reduction and tone mapping will help produce cleaner images in low-low light settings and more lifelike contrast and improved depth across a scene. And all of that can be achieved with lower power consumption.
Staggered HDR works by taking two exposures – one high-key and one low-key -- to better capture details in highlights and shadows, and then merges the frames for a more balanced look. AI enhanced noise reduction helps to recover details in poor lighting conditions, to smooth skin tones, better reproduce fine textures and improve overall sharpness. AI enhanced tone mapping analyzes different segments of a scene independently to better optimize the final output. The end result should be better contrast, with minimal or no halos, no color artifacts, and consistent temporal behavior.
Panther Lake And The new Xe3 GPU Architecture
The Xe3 GPU architecture on board Panther Lake arguably changes the most versus previous gen designs. Note however, that despite the Xe3 naming convention, the GPUs in Panther Lake will still carry B-series branding and be lumped in with second-gen Arc Battlemage designs. True third-gen Xe3P GPUs based on the graphics architecture codenamed Celestial will be coming later.
If you recall, Intel breaks down its Xe GPUs down into slices, which house the Xe cores and ray tracing units. With Xe2, each render slice contained 4 Xe cores and 4 ray tracing units. With Xe3, Intel is increasing the number of cores and ray tracing units per slice to 6.
In addition to reconfiguring the render slices, Xe3 features a number of architectural improvements as well. First, the size of the L2 cache increases from 8MB to 16MB, which significantly reduces traffic on the chip fabric, to the tune of 17% – 36%. Intel also improved the vector engines, enhanced back end fixed function hardware, and improved ray tracing.
There are 8 x 512-bit render engines and 8 x 2048-bit XMX engines per Xe3 core. The vector engines can support up to 25% more threads and have a new variable register allocation feature to help improve utilization and efficiency.
Intel also optimized how rays flow through the pipeline through the use of a new dynamic ray tracing management feature. The unit has been rearchitected to slow down dispatches to prevent an excess of queued up rays. And a new Unified Return Buffer (URB) streamlines how data is passed through functional units inside the GPU.
Xe3 also offers up to 2X the anisotropic filtering performance stencil test rates of Xe2.
Intel claims Panther Lake’s Xe3 GPUs offer more performance at less power, and up to 50% more performance at peak power. The performance improvements stem from multiple things – some from higher IPC per core, some from more power, and from a roughly 40% improvement in performance per watt.
Intel has made significant strides with its GPU drives and software as well. Intel’s graphics software for panther lake will obviously incorporate support for the for the new variable register allocation, but it’ll also have a faster scheduler to enable direct preemption, which can swap between context without flushing the GPU, and initial support for DirectX cooperative vectors. DirectX cooperative vectors can code for matrix operations inside shaders, to incorporate AI into games.
Although it’s actually part of the compute tile, Panther Lake also features a new Xe Media Engine. The new media engine builds upon the previous generation by adding support for AVC and a couple of other additional formats.
Although we didn’t get to run any benchmarks, we did get to see Panther Lake in action running a few different games and were provided data that illustrated how much better Xe3 handled gaming workloads thanks to its architectural and software improvements.
The render pre-pass is much faster on Panther Lake and support for 10 threads per vector unit and variable register allocation really helps significantly as well. The larger L2 also helps keeps more context closer to the GPU, which improves bandwidth utilization and ultimate increases performance.
With Panther Lake, Intel is also introducing support XeSS MFG, or multi-frame generation. Like NVIDIA’s RTX 50-series cards, XeSS MFG can generate up to 3 frames in between rasterized frames. That’s probably not something every gamer wants to hear, but the fact is AI-assisted frame generation is here to stay and it will eventually become more pervasive. That said, its shortcomings – like potential visual anomalies and higher latency – will eventually be addressed, but that’s a conversation for another day.
We should also mention that Intel is incorporating a shared GPU/NPU override in its drivers, so users can shift memory around between the NPU and GPU. Intel will also be offering a pre-compiled shader distribution to optimized launch times and reduced stuffer on first launch.
Panther Lake also features optimized power sharing between the CPU and GPU. Intel first released Intelligent Bias Control v2 in Q2, to maximize performance for gaming workloads by dynamically optimizing power distribution between the CPU and GPU. It’s optimized for upscaled and typically helped improve .01% and 1% lows. With Panther Lake, Intelligent Bias Control v3 is coming, which features a new velocity-based algorithm that prioritizes the new E-cores. That’s frees up a significant amount of power for the GPU to boost overall performance and produces less spiky power distribution to further smooth frame pacing. Intel is claiming about 10% better performance on average with the new algorithm, with up to 30% better lows.