Since 2020, Netflix has been able to serve up 200 gigabits per second (Gbps) of TLS-encrypted video traffic from each of its servers, but there is now a path to doubling that amount to 400Gb/s. Andrew Gallatin, a senior software engineer at Netflix, outlined how the streaming service has been able to achieve the feat using a combination of AMD's second-generation EPYC "Rome" processors and various software optimizations.
AMD's
EPYC 7502P Rome processors provide the big iron muscle, consisting of 32 cores and 64 threads running at 2.5GHz (base) to 3.35GHz (boost). Netflix's servers also boast 256GB of DDR4-3200 RAM in an eight-channel configuration, providing around 150GB/s of memory bandwidth (or around 1.2Tb/s in networking units). There's also 128 lanes of PCIe Gen4 at play.
According to Gallatin, two Mellanox ConnectX-6 Dx switches serve up four 100GbE connections. Other specs include 18
WD SN720 2TB NVMe SSDs and, on the software side, FreeBSD. Using this configuration, Netflix sees performance top out at 240Gbps, mostly because its servers are limited by memory bandwidth.
This led Netflix to tool around with NUMA (Non Uniform Memory Architecture) to squeeze more performance out of its servers. Leveraging NUMA gets memory controllers and devices closer to some of the CPU cores. There are cross-domain costs, though, such as latency penalties and bandwidth limits imposed by AMD's Infinity Fabric (47GB/s per link, 280GB/s total). So, the idea is to keep as much of the 200GB/s of bulk data as possible off the NUMA fabric.
"Bulk data congests NUMA fabric and leads to CPU stalls when competing with normal memory accesses," Gallatin explains in a presentation.
A big part of the solution is something called network-centric siloing, which essentially segregates certain data—in this case, storage and various network processes being handled on the NUMA node. By offloading TLS encryption and other software optimization wizardry, Gallatin says Netflix was able to achieve 400GBps on AMD's hardware.
Netflix also tested this on other platforms, including Intel Xeon (Ice Lake) and Ampere's Arm-based Altra Q80-30 configurations, but was not able to match the performance it achieved on AMD's EPYC platform. Interestingly, it was the Arm-based configuration that came the closest, but AMD still held the advantage, at least as configured.
Looking ahead, Gallatin says there is already a prototype capable of 800Gbps "sitting on [the] data center floor" due to a shipping delay. He didn't go into detail about the hardware specifications or what other software optimizations Netflix may have used, but said it is perhaps something the company will discuss in detail next year.