|Introduction and Specifications|
|We've often spoken about the future of SSD technology eventually evolving away from "bridged" interfaces like SATA and SAS, to direct-attached, native interfaces like PCI Express. It just makes sense. With the ultra-fast random access times and high IO bandwidth of solid state storage, it's not the storage media itself that's the limiting factor, rather, non-native interfaces get in the way and become the bottleneck. A SAS or SATA controller still has to have its protocol translated over to PCIe so the host can talk to it, which wastes precious bandwidth and adds latency.
Most of the PCIe SSD cards on the market today, with the exception of products from Fusion-io, still rely on SATA or SAS-based NAND controllers to interface on the backend of the device to the NAND array. PCIe cards from OCZ, Intel, LSI and others use controllers from LSI SandForce or the like. Fusion-io was the first company to introduce a true native PCI Express to NAND Flash controller-processor employed in their products, though Micron has also been cooking up their own native PCIe SSD technology for some time now.
Today we're looking at the Micron P320h, a PCI Express SSD that was introduced to the market well over a year ago and has actually been shipping to OEM customers for some time, but is just now hitting the market for general availability. Micron partnered with IDT, a veteran semiconductor manufacturer out of San Jose that specializes in high speed serial switching and memory interface technology, for co-development of the product. A match made in high bandwidth heaven, between a bellwether memory giant and a cutting-edge high speed logic manufacturer? Perhaps. Read on as we find out.
The primary difference in what Micron and IDT have partnered together to build, versus what Fusion-io has developed, is that the co-developed Micron-IDT solution is a ground-up custom ASIC design, whereas Fusion-io relies on FPGAs (Field Programmable Gate Arrays) to implement their technology. At least in theory, there is a high degree of hand tuning involved in custom ASIC (Application Specific Integrated Circuit) design implementations, versus programmable logic (FPGA) chips. The latter of which, from a volume production standpoint, is also generally more costly as well. (Full disclosure: I used to work for IDT as a Field Sales Engineer.)
The IDT ASIC follows the emerging NVMe standard of optimized PCI Express SSD Interfaces. IDT purpose-built this ASIC with Micron for their application but also has a number of similar devices available now on the open market. The 89HF3208 is a 32 channel NAND controller with a X8 PCI Express interface that is both Gen 2 and Gen 3 compatible, though Micron's P320h card is currently only validated for Gen 2 operation. The IDT 89HF3208 is a rather large 1517 pin FCBGA (Flip Chip Ball Grid Array) packaged device, which is understandable with a 32 channel NAND memory controller on board.
Regardless, the net result of what Micron and IDT have pulled together here is a X8 PCI Express SSD that claims monster performance numbers of up to 785K IOPs for reads and 205K IOPs for writes, along with over 3GB/s and 1.9GB/s read/write bandwidth, respectively.
The Micron P320h isn't cheap though. At a one piece price of $6995, this is an SLC (Single Level Cell) NAND solution that is squarely targeted at high availability, high throughput data center and enterprise applications. Let's take a closer look at what makes the Micron P320h tick.
|Micron P320h PCIe SSD Up Close|
|The Micron P320h is a rather elegant half height, half length design that will fit in a number of chassis form factors from full height ATX boxes to 2U servers or 1U servers with a riser card. Micron provides backplane brackets for both half or full height setups (full height seen below).
The card has a rather small heatsink mounted atop the IDT ASIC. Micron notes the card requires minimal airflow, specifically 1.5m/s or 300 LFM. In a server chassis especially, this isn't an issue. The 700GiB card we tested is compromised of a base board and two daughter cards, each with 250GiB of Micron 34nm SLC NAND on them. In total the P320h 700GiB card has a full 1TB of memory on board, the excess of which is used for wear-leveling, parity data and maintenance. Since the drive is based on SLC NAND technology, it has a very high lifetime endurance specification of 25PB, which is almost two times that of Intel's MLC-based SSD 910, for example. Unfortunately, SLC NAND, as we noted earlier, also comes at a significant price premium, especially in 34nm manufacturing process technology. In the future, Micron plans to release 25nm SLC NAND memory, which should help mitigate cost significantly.
Also on board the P320h is a little over 2GB of DDR3-1333 cache memory that is also manufactured by Micron. There are nine 256MB chips in total here allocated for fast map table data look-ups across the NAND array.
The IDT PCI Express Flash Controller on board the P320h configures what is essentially a RAID-5 array that stripes data across all four 250GiB SSD memory modules on the card. Micron calls their custom algorithm "RAIN" which is short for Redundant Array of Independent NAND. RAIN employs a "7+1 RAID 5" architecture where 1 parity element is allocated for each 7 storage elements (blocks and pages). Though the P320h employs a hardware ECC engine on board as well, this is not sufficient for data recovery and device resilience over the life of the SSD. RAIN allows a real-time parity check and in the event a block of data is flagged with an error, the data is recovered and moved into a wear-level algorithm state. This all happens seamlessly to the application or user, in the background. Error data is also gathered and tracked on internal logs of the drive and can be displayed via SMART health monitoring functions.
Speaking of health monitoring, Micron also offers a tool suite with the P320h called RealSSD Manager. In addition to SMART (Self Monitoring and Analysis Reporting Technology) monitoring and message status, you can check the drive's active, current performance throughput as well as life time remaining on the drive, and controller temperature. These tools should be particularly useful for Data Center Managers looking to see how a drive is performing under load and if current thermal management solutions are getting the job done keeping the P320h cool. The percent life remaining monitor is a particularly simple yet hugely useful tool that we'd like to see employed on more SSD products in general moving forward. Obviously, it takes the guess work out of SSD health status monitoring.
|Test System, SANDRA Disk and File System Benchmarks|
Our Test Methodologies: Under each test condition, the SSDs tested here were installed as secondary volumes in our testbed, with a standard spinning hard disk for the OS and benchmark software installations. The SSDs were left blank without partitions wherever possible, unless a test required them to be partitioned and formatted, as was the case with our ATTO, CrystalDiskMark benchmark tests, as well as IOMeter runs. Windows firewall, automatic updates and screen savers were all disabled before testing. In all test runs, we rebooted the system and waited several minutes for drive activity to settle before invoking a test.
** Please also note that the Micron P320h card was configured in high performance mode, for maximum read/write performance, though lower power modes are also available.
For our first set of tests, we used SiSoft SANDRA, the the System ANalyzer, Diagnostic and Reporting Assistant. Here, we used the Physical Disk test suite and provided the results from our comparison SSDs. The benchmarks were run without formatting on all drives and read and write performance metrics are detailed below.
SANDRA's purely sequential and rather short read/write workloads in this test favor the Micron P320h and it turns in a first place finish by a comfortable margin. Note that the drive tends to offer higher read throughput and here we're not seeing quite the top-end of Micron's specification in that area. Also note that the Intel SSD 910 requires software OS-level RAID for it to realize its peak performance, and since the SANDRA Physical Disk test module runs on bare, unformatted volumes, the Intel drive is operating on only one of its for 200GiB NAND modules.
SANDRA's File System benchmark exhibits combined Read/Write throughput on blank, formatted volumes .
Here we see a much tighter grouping in the field and though we weren't able to re-test the OCZ Z-Drive R4 since we no longer have it in the lab, the Micron P320h shows best of class performance here as well versus its primary competitors. Sequential Read bandwidth was 2.29GB/s and Sequential Writes came in at 1.84GB/s for a total of 2.19GB/s in SANDRA's "drive score" rating.
|AS SSD Benchmark Tests|
Next up we ran AS SSD, an SSD specific benchmark being developed by Alex Intelligent Software. This test is interesting because it uses a mix of compressible and incompressible data and outputs both Read and Write throughput of the drive, measured in both bandwidth as well as IOPs.
With this test we had fewer reference datapoints to compare for this review. So we picked a couple of the stiffest all around competitors we could find, Fusion-io's ioDrive 160GB card and Intel's SSD 910. Here again we see the P320h fall a bit short of its max read performance number, but it still outpaces Intel and Fusion-io, with exception of Sequential Reads, where Intel sneaks past. Note the P320h's strong read performance at high queue-depths, as exhibited in the 4K-64thrd test which shows 4K transfers at a IO queue depth of 64.
CrystalDiskMark is a synthetic benchmark that tests both sequential as well as random small and large file transfers. It does a nice job of providing a quick look at best and worst case scenarios with regard to SSD performance, best case being larger sequential transfers and worst case being small, random access and transfers.
CrystalDiskMark does a really nice job of exploiting the strengths and weaknesses of the Micron P320h. Here again we see strong Read performance, especially as queue depths increase. Shallow queue depths and small write transfers tend to be the drive's shortfall, however.
|ATTO Disk Benchmark|
|ATTO is another synthetic disk benchmark that measures transfer speeds across a specific volume length. It measures raw transfer rates for both reads and writes and graphs them out in an easily interpreted chart. We chose .5kb through 8192kb transfer sizes and a queue depth of 10 over a total max volume size of 256MB. ATTO's workloads are sequential in nature and measure raw bandwidth, rather than IO response time, access latency etc. This test was performed on blank, formatted drives with default NTFS partitions in Windows 7 x64.
In our ATTO testing, we see an interesting reversal. ATTO presents its workloads with a rather shallow queue depth of 10, which is the highest setting you can choose and what we always test at for SSDs. Here the P320h shows substantially more headroom in Write performance but levels off at 1.5GB/sec for Reads. For Writes, we realized 1.9GB/s on average which did show the drive dropping in below the OCZ Z-Drive R4, but still within striking distance.
|IOMeter Test Results|
As we've noted in our previous SSD coverage, though IOMeter is clearly a well-respected industry standard drive benchmark, we're not completely comfortable with it. The fact of the matter is, though our actual results with IOMeter appear to scale properly, it is debatable whether or not certain access patterns, as they are presented to and measured on an SSD, actually provide a valid example of real-world performance for the average end user or application workload. That said, we do think IOMeter is a gauge for relative available bandwidth with a given storage solution. In addition there are certain higher-end workloads you can invoke on a drive with IOMeter, that you really can't with any other benchmark tool available currently.
In the following tables, we're showing two sets of access patterns; our Workstation pattern, with an 8K transfer size, 80% reads (20% writes) and 80% random (20% sequential) access and our Database access pattern of 4K transfers, 67% reads (34% writes) and 100% random access.
We decided we'd break out out IOMeter tests into a couple of groups, with IOs set to standard sector--aligned tests (default for the benchmark) and then 4K aligned IO requests for the same workloads and queue depths. 4K IO boundaries generally tend to align with NAND program page sizes, so, as you can see, performance scales dramatically. Again, unfortunately we didn't have the Z-Drive R4 around for 4K aligned testing, in this scenario. Regardless, in our 8K Workstation data transfer configuration, a mix of mostly read requests at 80%, the P320h shows killer performance almost doubling Intel's SSD 910 and coming within striking distance of the OCZ Z-Drive R4.
With our database access pattern, which is a larger mix of 4K write requests, along with 4K reads, the P320h showed a massive performance lead, with over three times the IO throughput of the SSD 910 in our 4K aligned tests. Interestingly, in our sector-aligned tests above this graph, the P320h was still getting its legs at higher queue depths and came close to matching the OCZ Z-Drive R4.
The workload you see represented in the IOMeter graph below has become an "industry standard" configuration as of late, though we'd offer that it still should be taken with a grain of salt. Again, what we're looking at here is a one set access pattern that is concurrently sprayed across the drive volume by IOMeter until the drive reaches its saturation point. In this IOMeter run, we should note that drives were formatted. but left blank. Specifically the P320h was secure erased and pre-conditioned, filling the drive with 128K writes in a raw state and the formatted before we ran the test.
Here, the Micron P320h falls just short of the Intel SSD 910 in 4K random writes. We will note, however, that the P320h offered in excess of 515K IOPs in 4K random reads under this exact test condition, which is similar to what we saw in our database test workload. Micron's P320h SSD offers significant, industry-leading strength in high queue depth random read performance.
|Performance Summary and Conclusion|
|Performance Summary: Micron's P320h PCI Express SSD is an interesting beast to be sure. Its architecture is inherently simplified and designed for high throughput, blazing fast random access and high workload queue depths. Throughout most of our testing, the P320h offered top-end performance, besting more complex competitive solutions on the market that rely on standard SAS/SATA controllers from companies like LSI SandForce. Especially with high queue-depth Reads, the Micron P320h is very competitive with the fastest PCIe drives on the market and in some cases can blow them right out of the water. We also think the performance picture will improve over time, as Windows drivers mature for the product.
A Note on Stability and Drivers: All was not rosy with the Micron P320h, however, and while the Engineering team at Micron has spent much of its time validating on Linux and Server Chipset-based platforms from Intel, the Windows driver we've been working with over the past few weeks only became available to us recently. In addition, we were only able to get the P320h up and stable in an Intel X79-based test system we had on hand, while X58 and Z77 boards we tested the drive in often failed to get past POST or ended up with a blue screen with the drive falling back to a verify state, rebuilding its internal array after a hard reset. The array rebuild is normal operating functionality after a failure but it was clear Micron needed more time validating the driver and hardware under more chipset platforms in Windows environments. It is possible there is additional performance headroom left untapped in its current state as well.
What Micron and IDT have put together here is nothing short of impressive. We've looked at a lot of PCI Express SDDs in our day and it always appeared that, like Fusion-io had previously proven, there were less complex, more elegant ways to attack the problem and with much less bridging and translation getting in the way of performance. In a former life, I spent many hours selling to Engineers in the back-room labs of big iron storage players like EMC, talking to engineers about their designs. Back then, PCI Express SSDs simply didn't exist, serial interfaces were just starting to gain serious traction across platforms and Intel was just beginning a vigorous push for X86 in embedded designs, which quickly evolved into something called "common platform," a catch phrase that was another way of saying X86 servers were on their way to ubiquity. A lot has changed since then but one mantra that was drilled into our sales grunt heads was that bridging and translation across interfaces = latency and bottlenecks.
It seems now, the native, direct-attached PCI Express SSD has finally arrived. Though Micron may not have been the first, they are the first to have delivered a product with a purpose-built ASIC and custom storage processor, rather than a programmable device. Though clearly, with its $7K price tag, the Micron P320h is a solution targeted at the Data Center and Enterprise (but yes, it is bootable), I can see a time in the not so distant future where PCI Express SSDs are resident in desktops and notebooks as well, sans a SATA or SAS controller getting in the way of IO throughput. Until then, the Micron P320h gives IT pros another weapon in their arsenal of handling more users and transactions in smaller footprints and with the exponentially higher throughput that only a PCI Express SSD can deliver.
Micron P320h 700GB PCI Express SSD