All of the buzz surrounding Intel's efforts in graphics right now is around the company's Odyssey towards its first modern discrete GPU, currently scheduled for release in 2020. That's understandable, but let's not forget that, in terms of market share, Intel's integrated graphics lead the pack. Intel's upcoming Gen11 graphics will keep things going, and interestingly, Intel has quietly released a white paper that describes Gen11 in some detail.
Intel has already shared a few details about Gen11 during its Architecture Day last December. Gen11 will complement Intel's upcoming Sunny Cove CPU architecture, which itself will form the basis for both Core (consumer) and Xeon (server) processors.
While Intel did not spill all of the beans at the time, it did say that Gen11 bumps the number of enhanced execution units from 24 to 64, and pushes compute performance to over 1 TFLOPS. That's not on par to stronger discrete solutions, but as we saw earlier today in a benchmark leak, Gen11 is shaping up to be much faster than Gen9.
In the newly published white paper, Intel compares the makeup of Gen11 to Gen9. The table above presents the theoretical peak throughput of the compute architecture, aggregated across the entire spectrum. Values stated are "per clock cycle," as the final product clock rates are still being hashed out.
This chart reiterates the 1 TFLOPS claim, while adding some other points of comparison, such as half-precision performance (2 TFLOPS).
As for the call out to "slices," Gen11 will consist of 8 subslices aggregated into 1 slice. So, a single slice aggregates a total of 64 execution units. Aside from grouping subslices, the slices integrate additional logic for geometry and L3 cache.
"In Gen11 architecture, arrays of EUs are instantiated into a group called a Subslice. For scalability, product architects can choose the number of EUs per subslice. For most Gen11-based products, each subslice contains 8 EUs. Each subslice contains its own local thread dispatcher unit and its own supporting instruction caches. Each Subslice also includes a 3D texture sampler unit, a Media Sampler Unit and a dataport unit," Intel explains.
Communication takes place through a ring interconnect, which is an on-die bus between CPU cores, caches, and the Gen11 graphics in a ring-based topology.
The paper also discusses a technique called Coarse Pixel Shading. This works by reducing the number of times the pixel shader executes, which in turn saves rendering time. To preserve details along the edges, sample coverage and depth continue to be sampled at the target resolution.
"CPS allows us to decrease the total amount of work done when rendering portions of the scene where the decrease in shading rate will not be noticed. We can also use this technique to lower the total overall power requirements or hit specific frame rate targets by decreasing the shading resolution while preserving the fidelity of the edges of geometry in the scene," Intel says.
Also found in the white paper are references to Gen11's position only shading tile-based rendering (PTBR). This consists of two distinct pipelines—a typical render pipe and a new position only shading (POSH) pipe.
The POSH pipe executes the position shader in parallel with the main application, and has the advantage of generating results much faster. That's because it only shades position attributes and avoids rendering actual pixels.
"The POSH pipe runs ahead and uses the shaded position attribute to compute visibility information for triangles to gauge whether they are culled or not. Object visibility recording unit of the POSH pipe calculates the visibility, compresses the information and records it in memory," Intel explains.
It's an interesting read, if you're into technical details. Hit the link in the Via field (PDF) below to give it a look.