Prologue
An SSD must use FTL to convert logical addresses to physical addresses. If an abnormal power failure occurs when SSD reads, writes, and deletes work properly, the MAPPING table may be lost due to the lack of update. As a result, SSDS cannot be identified by the system.
In addition, SDRAM is used as the cache to improve read/write performance. If an unexpected power failure occurs during the read/write process, data in the SDRAM may not be written into the Nand Flash enough, resulting in data loss or mapping table loss.
A fault caused by an abnormal power failure
SSD abnormal power failure usually occurs in the following three symptoms:
① SSD cannot be identified by the system again, so the mapping table needs to be rebuilt or the production side can use it again in a simple and crude way; ② After multiple power failures, a large number of new bad blocks appear on SSDS. The mechanism behind adding bad blocks is as follows: When the SSD fails to read, write, or erase some blocks, the blocks are identified as bad blocks. Of course, these blocks are not real bad blocks, but errors caused by abnormal power failure. ③ Data loss in SDRAM;
Common power failure protection mechanisms
Each one has a different understanding of power failure protection mechanisms, different users, and completely different protection mechanisms. Generally, there are the following two methods:
► Save all data in SDRAM
After an abnormal power failure, all data in the SDRAM must be written to the Nand Flash. Generally, the capacity of the SDRAM is set to 1/1000 of the SSD raw capacity. For small-capacity SSDS, less data needs to be written into the Nand Flash. Data continues to be written using supercapacitors or tantalum capacitors. However, if the SSD capacity is large enough, for example, 8TB, the data that needs to be written into Nand Flash in SDRAM will be very large. If supercapacitors or tantalum capacitors are still used for power supply, it will inevitably face the following three thorny problems:
a. More tantalum capacitor particles are needed for protection. In practical engineering practice, this is a very severe test. Engineers are faced with the limitations of thickness and standard size, and the PCB area is not enough for use. b. Even if there are enough capacitors for protection, SSDS cannot be normally started when "restart" is performed and must be shut down for a period of time before restarting. The reason is that SSDS can be identified only after all the tantalum capacitors are fully discharged. c. After several years of use, when tantalum capacitors or supercapacitors are aged and the tantalum capacitor power supply cannot reach the initial design target, users still have the potential trouble of data loss or SSD identification after power failure. If redundant capacitors are made in the initial design, it will return to the endless cycle of problem "B".
Fortunately, the problems of B and C can be solved perfectly. It only requires enough brains and experience of engineers to solve these thorny problems
► Only user data in SDRAM is saved, not the mapping table
Would reduce the use of SDRAM and the use of tantalum capacitor, “do not save the mapping table” does not mean the loss of the mapping table, it not only saves the data to update the mapping table, but after the SSD to electricity, to find the last mapping table kept writing new data, to build a mapping table, but mechanism set up the disadvantages of this arrangement also was not reasonable enough, It takes a long time to rebuild the mapping table, and the SSD takes some time to return to the normal state.
For a controller without SDRAM design, all data is written directly to Nand Flash, and when there is an abnormal power failure, the data not written to Nand Flash will be returned to the host. There is no additional data to save, therefore, for truly high-reliability applications, SDRAM design is king. Its representative is a German old industrial brand master, its only drawback is that the performance is not good enough, in fact, many application scenarios do not need the highest performance, but "enough" performance.
Test method and principle
SSDS need to be used as system disks and slave disks for testing in the specific test. The only difference between the master disk and slave disk is that the master disk needs to power on or off the whole test computer, while the slave disk only needs to power on or off SSDs.
⑴ When the SSD is used as an empty disk when 25% data is written when 50% data is written when 85% data is written, and when 100% data is written, the abnormal power failure test is conducted 3000 times respectively. The power failure and power-on interval are 3 seconds for each time.
When the SSD writes a certain amount of data to the disk, the background starts garbage collection. Garbage collection means data relocation, and data relocation means updating the mapping table. In this case, abnormal power failure often occurs.
⑵ When data is being written, the SSD is powered off unexpectedly
In Windows, you need to perform the following eight operations to write data to the file system:
① Read the boot sector: Get cluster size, Gets the MFT initial position, and Gets the size of each MFT entry; ② Read the MFT entry of $MFT: Get additional MFT locations based on the $DATA attribute; ③ Create an MFT entry for the new file: Access $BITMAP to find an unallocated item, Allocate the first free item to the new file and update the $BITMAP to position 1; ④ Initialize the MFT entry: Clear the MFT content of the new file, Create the standard information attribute $SATANDARD_INFORMATION and the filename attribute $FILE_NAME, Set current time, Set the usage flag in the MFT entry header; ⑤ Allocate cluster space for files: Set the $BITMAP bit to 1, Update the cluster address in the $DATA attribute, The best modification time to update the file; ⑥ Add file name entry: Read the INDEX ROOT attribute $INDEX ROOT, Reads the INDEX ALLOCATION attribute $INDEX_ALLOCATION, Update directory last access time; ⑦ Create a new index entry: Access the index root attribute $INDEX_ROOT to create a new index for the new file, Set the appropriate time and flag; ⑧ Write system logs: Create an item in the system log and write the changes to $UsrJml, If Quota management is set, the new file size will be credited to $Quota.
Therefore, the process of writing data is also the process of updating the mapping table. In this case, a power failure still affects whether the mapping table is fully updated.
⑶ Abnormal power failure occurs when deleting data
In Windows, deleting data also requires eight actions, just as creating files requires updating the mapping table.
⑷ If the SSD fails to read files abnormally, perform 3000 tests with a 3-second power-off interval
⑸ When an abnormal power failure occurs during the normal shutdown, test 3000 times;
⑹ Abnormal power failure during normal operating system startup. Tested 3000 times.
For industrial or military grade SSDs, these tests need to be performed at high and low temperatures.
******************************END******************************