In the past decade or so, CPU performance has improved by more than 100 times, while traditional HDD Hard Disk Drive has improved by less than 1.5 times. This uneven development of computing storage technology has greatly affected the overall performance improvement of IT systems. It wasn’t until the invention of the Solid State Drive (SSD), with its disruptive performance, that the storage bottleneck was solved. However, as a new technology, SSD still has some inherent defects. How to give full play to the advantages of SSD is a direction worth studying. The following aspects of performance, persistence, use cost and so on this topic to do some discussion
- How to give full play to SSD performance
First, let’s look at how traditional HDDs are used:
- The protocol generally uses SAS, SATA interface
- Linux IO scheduling requires an elevator algorithm to rearrange IO to optimize the path of magnetic heads
- Enterprise storage typically uses RAID cards for data protection
In terms of interface protocol, with the invention of SSD, NVME protocol came into being. Compared with the single queue mechanism of SAS and SATA, NVME can have up to 65,535 queues, and directly adopts the PCIe interface, which eliminates the bottleneck of link and protocol.
In terms of control card ecology, major manufacturers have also launched their own NVME control card chips, including PMC (now belonging to Microchip), LSI, Marvel, Intel, SMI and domestic Delui, etc., and the technology has been very mature.
The Linux driver and IO stack have also been optimized. As shown in the figure below, the NVME driver can directly bypass the traditional scheduling layer designed for HDDs, greatly reducing the processing path
So far, the first two of the three traditional HDDs mentioned above have been addressed in order to get the most out of SSD performance, but NVME-based RAID has never been a good solution in the enterprise market. The most widely used RAID5 / RAID6 data protection mechanisms (N+1, N+2) in traditional enterprises usually strip data into slices, then calculate redundant Parity codes, store data to multiple hard drives, and write new data is usually a “read-write” mechanism. This mechanism itself can be a performance bottleneck, and “read-rewrite” can be very costly over the life of the SSD.
In addition, because the NVME protocol controls the card to the NVME disk inside, IO by the NVME disk inside the DMA module to complete, which gives the NVME-based RAID card design has brought greater difficulties. At present, there are few RAID control cards available in the market, and the performance of NVME can not play to the advantages, so it has not been widely used.
Based on the current situation, many enterprise storage solutions are still using SAS/SATA SSDs plus traditional RAID cards, which again presents the two problems that have been addressed previously. SSD performance is not fully utilized.
However, this situation is changing, and the NVME over TCP (NVME /TCP) storage clustering solution invented by Lightbits Labs addresses this problem well. The solution can achieve random write performance of more than 1M IOPS by using a data acceleration card independently developed and Erasure Code (Erasure Code) mechanism, and avoid the loss of service life caused by “read and rewrite”. In addition, Light bits proposes the Elastic RAID mechanism, which provides Elastic N+1 protection (similar to RAID5). Compared to traditional RAID5, which requires hot backup or replacement of damaged disks, the Elastic RAID mechanism can automatically balance and form new protection when a hard disk is damaged. For example, once there are 10 disks in a node, the 9+1 protection is adopted. When a disk is damaged, the system will automatically switch to the 8+1 protection state, and rebalance the original data to the new protection state, thus achieving a significant improvement in the aspects of maintainability and data security. In addition, the data acceleration card is capable of 100 GB of wire-speed compression, significantly increasing the available capacity, which in turn can significantly reduce the cost of the system
- How can I improve NVME persistence?
The most widely used SSDs are based on NAND particles, and one of the inherent problems with NAND is durability. In addition, with the development of technology, the density of NAND is getting higher and higher. The latest generation has reached QLC (4bits per Cell), and the number of erasable Cycles per Cell is also decreasing (1K P/E Cycles). The development trend is shown in the figure below.
In addition, NAND has a feature that the minimum erasable unit is relatively large, as shown in the figure below. When writing, you can write in units of 4KB, but when erasing (such as modifying original data), you can only operate in particles of 256KB (different SSDs have different sizes, but the principle is the same). This makes it easy to form voids and trigger the Garbage collection shift of SSD data, leading to the so-called write amplification phenomenon, which has a further impact on-disk persistence.
In enterprise-class storage, the “read-write” mechanism, RAID5/6, is used to further amplify the number of write operations on the disk, which is about twice the loss of direct write in general usage scenarios. In addition, many RAID5 will also start the Journal mechanism, the service life of the disk will be further lost.
Finally, for the latest QLC, there is another factor to consider when using it — the InDirection Unit (IU). For example, some QLC disks use 16KB IU, and if you want to write a smaller IO, it will also trigger internal “read-rewrite”, which will damage the service life.
As can be seen from this, NAND-based SSD is relatively fragile. However, these problems can be avoided if used correctly. For example, taking a common QLC disk as an example, we can see from the following two sets of parameters related to performance and persistence that sequential write is 5 times more durable than random write, and performance is 26 times more:
0.9 DWPD in sequence and 0.18 DWPD in random 4K;
Write 1600 MB/s sequentially, write 15K IOPS at random 4K (60MB/s)
Through the above analysis, it is very important to use the disc in an optimal working state. The good news is that some advanced solutions, such as Lightbits’ full NVME cluster storage solution, can solve this problem. This scheme avoids the drawback of RAID “read-rewrite” by changing random IO into sequential IO and the unique Elastic RAID technology, which can greatly improve disk persistence and random performance.
- How to reduce the use cost?
Because SSD is a new technology compared with HDD, coupled with the contradiction between the production scale and the demand of the industry, the current price is still high compared with HDD. So how to reduce the SSD cost becomes very important.
One of the most important ways to reduce the cost of use is to make full use of SSDs, both in terms of capacity and performance. For now, however, most NVME disks plug directly into the application server, which can easily waste a lot of capacity and performance because only the applications on that server can use them. According to the survey, the utilization rate of SSD is about 15%-25% using the DAS (Direct Attached Storage) method.
A better solution to this problem is the “decoupling” architecture that has been widely accepted in the market in recent years.
After decoupling, turn all the NVME disks into one large pool of storage resources, and the application server can take as many as it needs, as long as the total is sufficient, and you can easily push utilization up to 80%. In addition, because resources are centralized, there are more tools and methods available to reduce costs, such as compression. For example, applying an average data compression ratio of 2:1 equates to doubling the available capacity and halving the price per GB. Of course, compression itself brings some problems, such as the CPU cost of compression itself, and the performance of many storage solutions is greatly reduced when compression is turned on.
For compression issues, Lightbits’ NVME /TCP cluster storage solution can be addressed by storing accelerated cards. The card can achieve 100GB of wire-speed compression capacity without consuming CPU or adding latency. With such a solution, there is a little extra cost to compress functionality.
In addition, as mentioned earlier in the introduction to improving persistence, the LightBits solution can extend life and support the use of QLC discs, which can also result in a significant cost reduction over the life cycle. Overall, the cost of using SSDs can be greatly controlled by increasing efficiency through decoupling, increasing available capacity through compression, improving life through optimization, or enabling QLC.
The above analysis from the performance, durability, use cost three aspects of how to use SSD disk, we can see that it is not easy to use NVME SSD disk.
Therefore, it is very important for the average user to choose a good storage solution. For this reason, Light bits, an Israeli innovation company, takes the mission of giving full play to the maximum value of the NVME disk, invented the NVME /TCP protocol, and launched a new generation of full NVME cluster storage solutions, which can help users easily use SSD disk well.