Modder Gets AMD's FSR 4 To Work On Radeon RX 7900 XTX And Runs Benchmarks
RDNA 3 GPUs do in fact support the needed feature—Wave Matrix Multiply Accumulate (WMMA) instructions—when used with FP16 data, though. As a result, it should be possible to run the FP8 model on the FP16 path, and indeed it is, as Mesa developers have already implemented a very hacky workaround that enables the use of FSR4 on RDNA 3 GPUs when toggled. Reddit user /u/Virtual-Cobbler-9930 posted his impressions of using FSR4 on a Radeon RX 7900 XTX to the AMD Radeon subreddit.
The results aren't really surprising at all if you understand what's happening here. In essence, RDNA 3 GPUs have to do more than double the work of an RDNA 4 GPU to make use of FSR4. As a result, the performance hit from doing FSR4 upscaling is rather drastic, even on his powerful 24GB GPU. However, the Redditor notes that the performance in "Quality" mode is still better than running the games he tested in native 4K resolution.
Moreover, the quality benefits are apparently plain to see. FSR3 is definitely a step up from simple spatial upscalers, but it still has many of the same faults it always has: noticeable temporal smearing and artifacting, particularly on fast-moving objects or when frame rates are low. FSR4 completely eliminates these artifacts, especially in Cyberpunk 2077 and Oblivion, both of which suffer significant image stability issues with FSR3.
In Marvel Rivals, on the other hand, the user doesn't recommend the use of FSR4 because of the performance hit. He notes that while FSR4 chunks his game down to around 50 FPS, he can use FSR3 and have reasonable image quality at over 100 FPS. The temporal resolution (frame rate) is much more valuable than the slightly cleaner image in this kind of competitive online game.
Virtual-Cobbler-9930 says "FPS on FSR4 scales awful with lower resolutions," meaning that reducing the FSR4 quality preset doesn't gain him much performance. The reason is because of what is happening when using FSR4. Image upscaling has a fixed cost based on the output resolution. In essence, the input resolution—that is, the upscaling preset, like Performance or Quality—lets you adjust how much work the graphics parts of the GPU have to do. The output resolution, the resolution of your screen, determines how much work the upscaler has to do.
Smart upscaling generally works because GPUs have loads of extra compute that they don't need for games, which are typically bound up on raster or memory hardware. All of that horsepower is just sitting unused most of the time, so why not put it to work upscaling? The problem comes in when you slap that FSR4 workload down onto a GPU that was never meant for it. Suddenly, you have a very high fixed cost in frame time for the upscaler, and no matter how much you reduce the rendering workload on the GPU, you can never exceed a certain frame rate due to that fixed cost.
How to reduce that fixed cost? Reduce the output resolution. In other words, FSR4 might actually be quite viable on RDNA 3 GPUs at resolutions lower than 4K. It's going to depend a lot on the game and also on the settings in question. It's also possible that there are further software optimizations that could be applied to improve the performance, since the current implementation is in an early state.
If you'd like to try FSR4 on your RDNA 3 GPU, there's a pretty big roadblock in the way for most of you: it's currently only possible on Linux. However, if you're on Linux, you can check the discussion in the aforementioned Reddit thread, or head over to this CachyOS discussion thread for detailed instructions on how to enable FSR4 and add FSR4 to your games.