In the past decade, CPU performance has increased by more than 100 times, while traditional Hard Disk drives (HDDs) have increased by less than 1.5 times. This uneven development of computing and storage technologies has greatly affected the overall performance of IT systems. It was not until the invention of solid-state drives (SSDs), whose performance has been revolutionized, that the bottleneck of storage was solved. However, as a new technology, SSD still has some inherent defects. How to give full play to the advantages of SSD is a worthy direction of research. This topic is discussed in terms of performance, persistence, cost of use, and so on.
【1】How to give full play to SSD performance
First, let’s look at how traditional HDDs are used:
- Generally, SAS and SATA interfaces are used for protocols.
- Linux I/O scheduling needs to use the elevator algorithm to rearrange THE I/O to optimize the path of the magnetic head.
- Enterprise-level storage usually uses Raid cards for data protection.
In terms of interface protocols, NVMe came into being with the invention of SSD. Compared with the SINGLE-queue mechanism of SAS and SATA, NVMe supports a maximum of 65535 queues and uses PCIe interfaces to eliminate link and protocol bottlenecks.
In the control card ecology, the major manufacturers have also launched their own NVMe control card chips, PMC (now belong to Microchip), LSI, Marvel, Intel, SMI, etc., technology has also been very mature.
Optimized for Linux drivers and THE IO stack, NVMe drivers can bypass the traditional scheduling layer designed for HDDs, as shown in the figure below, shortening the processing path.
Up to now, the first two of the three traditional HDDs mentioned above have been addressed in order to fully leverage SSD performance, but there is still no good solution for NVMe-based Raid in the enterprise market. The most widely used Raid5/Raid6 data protection mechanisms (N+1, N+2) in traditional enterprises are used to strip and fragment data, calculate redundant Parity Code, and store data on multiple hard disks. Writing new data is usually a “read and write” mechanism. This mechanism itself becomes a performance bottleneck, and “read and write” takes a toll on SSDs ‘lifetime. In addition, because the NVMe protocol puts the control card inside the NVMe disk, the I/O is completed by the DMA module inside the NVMe disk, which brings more difficulties to the DESIGN of the Raid card based on NVMe. Currently, there are few available options for such Raid controller cards on the market, and they are not widely used because they do not take advantage of NVMe’s performance.
Based on the current situation, many enterprise-level storage solutions still use SAS/SATA SSDs and traditional Raid cards. In this way, the two problems that have been solved before may occur, and SSD performance cannot be fully utilized.
However, this is changing, as Lightbits Labs’ NVMe over TCP (NVMe/TCP) storage cluster solution addresses this problem. The solution uses a self-developed data acceleration card and the Erasure Code mechanism to achieve random write performance of more than 1 MBIT/s and avoid service life loss caused by reading and rewriting. Lightbits also proposes an Elastic Raid mechanism that provides N+1 protection (similar to Raid5). Unlike traditional Raid5, which requires hot spare drives or replacement of damaged drives, Lightbits automatically balances a damaged hard drive to create new protection. For example, a node has 10 disks that are protected by 9+1. If a disk is damaged, the system automatically switches to the 8+1 protection state and rebalances the original data to the new protection state, greatly improving maintainability and data security. In addition, the data acceleration card can achieve 100Gb of wire-speed compression, significantly increasing the available capacity, and thus significantly reducing the cost of using the system.
【2】How to improve the persistence of NVMe disks
The most widely used SSD is based on NAND particles, and one of NAND’s inherent problems is endurance. And with the development of technology, the density of NAND is getting higher and higher. The latest generation has reached QLC (4bits per Cell), and the number of times that each Cell can be erased is also decreasing (1K P/E Cycles). The development trend is shown in the figure below.
In addition, NAND has a large minimum erasable unit, as shown in the figure below. You can write in 4KB, but you can only write in 256KB. (Different SSDs have different sizes, but the principle is the same.) This tends to create a void and trigger the Garbage collection (GC) movement of SSDs, which leads to a phenomenon known as write amplification, which further affects disk persistence.
In enterprise storage, raid 5/6 is usually used as a “read and write” mechanism to further enlarge the number of disk write operations, which is about twice the loss of direct-write operations in common scenarios. In addition, many Raid5 will also enable the Journal mechanism, further eroding the life of the disk.
Finally, there’s another factor to consider with the latest QLC — the Indirection Unit (IU). For example, some QLC disks use 16KB IU, if you want to write smaller IO, it will also trigger an internal “read rewrite”, which is another heavy damage to the service life.
As you can see, NAND-based SSDs are still delicate. However, these problems can be avoided if used correctly. For example, take a common QLC disk as an example. According to the following two groups of parameters related to performance and persistence, sequential write is 5 times more persistent than random write, and the performance is 26 times higher:
- Sequential write 0.9dwpd, random 4K write 0.18dwpd;
- Sequential write 1600 MB/s, random 4K write 15K IOPS (60MB/s).
From the above analysis, it is essential to be able to use the disk in optimal working conditions. The good news is that some advanced solutions, such as Lightbits’ all-NVMe clustered storage solution, can solve this problem. By changing random I/O to sequential I/O and adopting Elastic Raid technology, the solution avoids the disadvantage of Raid “read and write”, thus greatly improving disk persistence and random performance.
【3】How to reduce the cost of use
Because SSDs are a new technology compared to HDDs, and the industry’s production scale and demand conflict, the current price is still higher than HDDs. How to reduce the cost of SSD usage becomes very important.
The most important part of reducing the cost of using SSDS is to get the most out of them, both in terms of capacity and performance. For now, however, most NVMe disks are plugged directly into the application server, which is prone to waste a lot of capacity and performance because only applications on that server can use it. According to research, SSDs have been used around 15 to 25 percent of the time using DAS (Direct Attached Storage).
A better solution to this problem is the “decoupled” architecture that has been widely accepted in the market in recent years. Once uncoupled, you can easily push utilization up to 80% by turning all NVMe disks into one large storage resource pool, using as many as the application servers can use, as long as you control the total number. In addition, because resources are concentrated, more means and methods can be used to reduce costs, such as compression. For example, an average application data compression ratio of 2:1 is equivalent to double the available capacity and halving the price per GB. Of course, there are problems with compression itself, such as the CPU cost of compression itself, and many storage solutions can suffer greatly when compression is turned on.
Lightbits’ NVMe/TCP clustered storage solution solves compression problems by storing acceleration cards. The card can achieve 100Gb of wire-speed compression capability without consuming CPU or adding latency. With such a solution, there is a little additional cost to compression. In addition, as mentioned earlier in the introduction of improved persistence, Lightbits solutions provide increased lifetime and support for the use of QLC disks, with a significant cost reduction over the lifetime. In general, SSDs can be largely controlled by increasing efficiency through decoupling, increasing available capacity through compression, improving service life through optimization, or enabling QLC.
This section describes how to use SSDs in terms of performance, durability, and cost. You can see that NVMe SSDs are not easy to use. Therefore, for the average user, choosing a good storage solution is crucial. Lightbits, an Israeli innovation company, has created NVMe/TCP and a new generation of all-NVMe clustered storage solutions that make SSDs easy to use.
END.