Statement: The author of this article is a staunch supporter of open source, so this article is not software, but in the spirit of open source.
Prologue
Because SSDs have many characteristics that are different from traditional mechanical hard disks, their performance and lifetime are closely related to how they are used. So it’s not as if buying a really cool SSD is the end of the story. SSD performance, on the other hand, is not as stable as a mechanical hard disk (with a small floating range), but rather discrete (with a large floating range). The best state is far from the worst state (hundreds of times), so avoiding the worst state is more important than using the best state.
Partition alignment
This problem occurs when you partition an SSD and then use it. Of course, you can avoid this problem by not partitioning it. For Linux, to use it without partitioning is to format the /dev/sdx device (for example, mkfs.xfs /dev/sdx) and then mount /dev/sdx/MNT /point
Given usage habits, reserved space (see below), and the requirements of installing a boot program such as GRUB, it’s not a good idea to use it without partitions. Fortunately, the partition alignment problem can be solved perfectly.
If the partition points on an SSD are unaligned multiples of the SSD block size, the SSD performance cannot be fully realized, and the system may feel students. In addition, the unaligned partition points may cause unnecessary extra write/to erase operations on SSDs during the continuous writing of small files, which greatly affects the SSD life. This is the most important thing to avoid when using SSDs.
Misaligned partitions have the greatest impact on 4KB random writes, and actual tests have shown that other factors being equal, misaligned partitions can lead to performance differences of up to 10 times.
▶ CHS, LBA, block
Before understanding the ins and outs of the matter, we must review a few concepts of the traditional hard disk.
CHS(cylinder-head-sector) was the earliest addressing method for accessing hard disks. Although the CHS value no longer corresponds to the actual physical value (it’s just a logical value), there are still many disk management programs (such as fdisk/cfdisk) that use CHS to understand hard disks.
▶ Side/head
A hard disk is usually composed of one or more circular thin films stacked together. Each circular membrane has two “sides”, both of which are used to store data. The disk surface is numbered from top to bottom from “0”, which is called side 0, side 1, and side 2…… Since each face has a dedicated read/write head, 0 head, and 1 head are also commonly used… Call it. Hard disk surface number (or number), less only 2, more up to dozens of surfaces.
According to the CHS specification, the head uses 8-bit addressing, so it can have up to 256 heads (0-255). However, because some antique programs only support up to 255 heads, the default is still 255 for compatibility purposes in most cases.
▶ track
If the magnetic head does not move and the disk rotates, data is continuously written on a circle. We call such a circle a Track.
The head does not move, that is, reading and writing on a track; As the head moves, it reads and writes on different tracks.
Numbering from “0” from the outside according to the CHS addressing specification. However, the bit width used for addressing is not a fixed value (depending on different specifications), so we can assume that it is large enough.
▶ cylinder
When the tracks with the same track number on each surface are combined, they are called a cylinder. That is, a cylinder formed by a group of tracks with the same distance from the axis.
According to CHS addressing specification, the cylinder, like the track, is also numbered from “0” from the outside in. And its addressing bit width is large enough.
The cylinder is also the smallest unit for disk partitioning. The partition is continuously distributed according to track and cylinder (that is, each partition is a group of cylinders with continuous thickness).
Here is a fdisk output, with the last line in mind:
# fdisk /dev/sda Command (m for help): n Command action e extended p primary partition (1-4) p Partition number (1-4): 1 First cylinder (1-524, default 1):
▶ Sector
A track can hold “a lot” of data, but the host usually does not need to read and write as much at a time. Therefore, the track is divided into 512 or 4096-byte segments, each called a Sector. The size of a sector is fixed at 512 or 4096 bytes.
The computer reads and writes to the hard disk in sectors. Even if you read only one byte, you must read all 512 or 4096 bytes from the sector in which that byte is located into memory at once.
According to the CHS addressing specification, a Sector is always numbered from “1” (not “0”). Due to the use of 6bit addressing, the maximum value is 63, i.e., there is neither sector 0 nor sector 64. All mechanical hard drives currently use the maximum value of 63 in logical CHS mode.
▶ block
A Block is the minimum access space of a file system. A Block can hold at most one file (that is, there are no multiple files in the same Block). If a file is smaller than a block, it also occupies a block, so free space in the block is wasted. A large file, on the other hand, can occupy multiple or even tens of millions of blocks.
# df / (/dev/dsk/c0t3d0s0 ): 573548 blocks 226057 files /proc (/proc ): 0 blocks 3854 files /var (/dev/dsk/c0t3d0s1 ): 1897206 blocks 250028 files /var/run (swap ): 611424 blocks 26300 files /tmp (swap ): 611424 blocks 26300 files
▶ Difference between a sector and a block
- Sector Indicates the minimum unit for accessing disks. The unit is 512 MB or 4096 MB
- Block is the minimum access unit of a file system. It can be set at will but must be an integer multiple of sectors. For example, the default block for ext2 FS is 4K
The block size should be planned reasonably according to the characteristics of its own system application: if the block is too large, there will be a waste of space when accessing small files; If blocks are too small, the number of blocks on the hard disk increases, which increases the search time for inodes pointing to blocks, and makes reading and writing large files inefficient.
▶ Disk capacity
For a 500GB hard disk, see fdisk -l.
# fdisk -l /dev/sda Disk /dev/sda: 500.1 GB, 500107862016 bytes 255 heads, 63 sectors/track, 60801 cylinders, total 976773168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sda1 63 8000369 4000153+ 83 Linux /dev/sda2 8000370 95891984 43945807+ 83 Linux /dev/sda3 95891985 427923404 166015710 83 Linux /dev/sda4 427923405 976768064 274422330 83 Linux
As you can see, total disk capacity = number of heads * Number of sectors per track * Number of cylinders * Sector size. Of course, the number of heads, sectors, and cylinders here is not a physical real value, but a logical value. Even for hard disks with 4KB sectors, the sector size can be a logical value.
▶ LBA addressing mechanism
CHS addressing has long been relegated to the dustbin in practice. The real 48-bit Logical Block Address (LBA) approach is currently used in practice. LBA is a very simple addressing mode: blocks (sectors) are located by numbering from 0, the first block LBA=0, the second block LBA=1, and so on. The maximum capacity is 128PB in 512 bytes per sector. The LBA addressing mode completely hides the physical structure of the hard disk and abstracts it into a simple one-dimensional strip, which is very easy for the operating system to understand.
▶ Analysis of causes
With that in mind, we now know that the sector size of mechanical hard disks has long been defined as 512 bytes and that the latest “advanced format” mechanical disks have finally increased the physical sector size to 4KB(4096 bytes). The smallest unit of operation on a mechanical hard disk is the sector, that is, whether reading or writing 1 byte, 10 bytes, or 500 bytes, the actual operation is 512 bytes. Of course, for a mechanical hard disk with a 4KB sector, all read and write operations are rounded up to 4KB multiples.
But SSDs operate differently. Unlike HDDs, which have only read/write operations and are unified, SSDs have three operations: read/write/erase. Flash read and write units are 4KB or 8KB pages, and flash erases (aka programming) operations are performed in blocks of 128 or 256 pages.
Traditionally, the starting point for the first partition on LBA mode HDDs is 63 logical sector (63x512B=31.5KB), which is of course not a problem for HDDs with 512B sector size. But with SSDs and newer HDDs, the result is that the first 4KB of a user’s first data will be stored between 31.5KB and 35.5KB in the system’s “logical sector.” this continues to cause all subsequent data to get stuck between two physical sectors (pages), which we know is the smallest unit of disk writing. If stuck between two sectors (pages), a read-write operation (read – erase – write in the case of SSDs) is required while writing, resulting in performance degradation.
▶ Where do I put the split point?
Simply speaking, it should be placed in the largest SSD operation unit “block”, which is an integer multiple of the block size.
For single-channel flash devices, this is simple and easy to understand. But since almost all SSDS are not single-channel, how much should be left free?
Multi-channel flash devices also split data into blocks and read and write data to each channel, the size of the block is the same as the size of the flash chip used. The situation becomes more complicated if the multi-channel SSD splits the data further before writing to the channels. However, the current SSD master does not seem to be so clever, as the smallest unit of data for multi-channel writes is still the size of the flash block. Therefore, in the current situation, we can completely disregard the number of channels of the device and directly set the partition point at the integer multiple of the block size. Maybe in the future, you need to set the split point to an integer multiple of “number of channels *block-size”?
▶ FDISK
Before presenting the final solution, take a look at the fdisk tool:
# fdisk -h Usage: fdisk [options] <disk> change partition table fdisk [options] -l <disk> list partition table(s) fdisk -s <partition> give partition size(s) in blocks Options: -b <size> sector size (512, 1024, 2048 or 4096) -c[=<mode>] compatible mode: 'dos' or 'nondos' (default) -h print this help text -u[=<unit>] display units: 'cylinders' or 'sectors' (default) -v print program version -C <number> specify the number of cylinders -H <number> specify the number of heads -S <number> specify the number of sectors per track
We usually don’t use any options, but in order to enforce alignment of split points, we must force the following three parameters:
-b The sector size can be one of 512/1024/2048/4096. Note that using any other value greater than 512 (let's say N) results in fdisk using only 512/N of its true capacity. So it's best not to use this option. -H Specifies the number of magnetic heads. The value must be an integer between 1 and 256. -S Specifies the number of sectors per track. The value must be an integer ranging from 1 to 63.
The product of these three values is the total size of a logical cylinder, which is the minimum partition unit.
▶ Solution
We now know that partition alignment can be solved perfectly by dividing the disk into “blocks” as the smallest SSD unit.
Now suppose we get an SSD with parameters like this: each page size is 8KB and each 256 page is a block, so we have to partition it in the smallest unit 256*8KB=2097152B=2048KB=2MB. In other words, we can force the fdisk command line argument to make each cylinder size 2MB (assuming 512 bytes per sector) :
fdisk -u=cylinders -H 128 -S 32 /dev/sdx
If the size of each cylinder is 2MB or not large enough, it can be enlarged to 4MB(assuming 512 bytes per sector) :
fdisk -u=cylinders -H 256 -S 32 /dev/sdx
Another detail to note is that the start of the first partition does not start on the first cylinder by default, but on the second cylinder, otherwise the first partition may still not align. As shown below:
Command (m for help): n Command action e extended p primary partition (1-4) p Partition number (1-4, default 1): Using default value 1 First cylinder (1-800, default 1): 2 Last cylinder, +cylinders or +size{K,M,G} (2-800, default 800): 500
Note the First cylinder line, where the default value is 1, but can’t use it, manually change it to 2.
The reason for this problem, according to Google, is that fdisk will give special treatment to sectors starting from Cylinder 1 and arbitrarily shift the starting point of the partition forward. However, now that the new version of FDisk has fixed this bug, we don’t need to dig any further.
▶ Alignment checks
How do I check that the partitions are indeed aligned? Run the “fdisk -u=sectors -l /dev/sdx” command to query the value. Such as:
# fdisk -u=sectors -l /dev/sda Disk /dev/sda: 671 MB, 671088640 bytes 128 heads, 32 sectors/track, 320 cylinders, total 1310720 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x7bf5a16d Device Boot Start End Blocks Id System /dev/sda1 63 40959 20448+ 83 Linux /dev/sda2 40960 79871 19456 83 Linux /dev/sda3 81920 163839 40960 83 Linux
Sector size = 512 bytes logical/physical
If Start=63 for the sda1 partition, then the distance between the Start sector head of the sda1 partition and LBA0 is 63*512B=31.5KB, which is obviously unaligned.
Start=40960 for sda2 partition, indicating that the distance between the Start sector head of SDA2 partition and LBA0 is 40960512B=20MB=522MB, obviously aligned on the edge of both 2MB and 4MB blocks. End=79871; End= (79871+1)512B=39MB; End= (79871+1)*512B=39MB;
Start=81920 for sda3 partition, indicating that the distance between the Start sector head of sDA3 partition and LBA0 is 81920512B=40MB=542MB, obviously aligned on the edge of both 2MB and 4MB blocks, even for 8MB blocks. End=163839; End= (163839+1)512B=80MB=52222MB; End= (163839+1)512B=80MB=5222*2MB;
So sDA3 is the most perfectly aligned partition, and Sda1 is the worst.
FileSystem
Now that SSDs are perfectly partitioned, it’s time to create a file system, so which file system is best suited for SSDs? Because Linux system has so many file system: etx2 / ext3 / corruption/reiser3 reiser4 / JFS, XFS/Btrfs/NILFS2… In the age of HDDs, choosing the right file system has always been a headache. So far, I have to say that no file system is a perfect match for SSDs.
******************************To be continued******************************