Since the 20th century, with the rapid development of information technology and digital technology such as the Internet and artificial intelligence, the amount of information has grown exponentially. According to statistics, the total amount of global data information will increase from 30 Zabytes in 2018 to 163 Zabytes in 2025, and traditional silicon-based storage devices will face a development bottleneck. In recent years, a technology called "DNA storage" has been highly anticipated as the future of sustainable storage.
First, let’s take a look at what’s wrong with this technology. By some accounts, the technology can store a staggering one gram of DNA — a billion terabytes (215 petabytes) of data. For example, at 10GB per HD movie, one gram of DNA could store about 220 million movies! Hence the claim that a kilogram of DNA could store the world’s data.
In addition, in terms of storage time, hard disks, tapes, and other storage can only retain about 10 years of data, while DNA can retain at least a hundred years of information. In addition, in terms of energy consumption, the energy consumption of 1GB of data disk storage is about 0.04W, while the energy consumption of DNA storage can be less than 10^-10W (10^-10W). Therefore, DNA storage has great advantages in terms of storage capacity, data storage time, and energy consumption.
Next, let’s look at what DNA storage is. Conceptually, DNA storage technology takes DNA molecules as storage mediums to store information on the DNA molecules, so as to simulate the data reading and writing of memory.
The concept seems very simple, but the implementation principle is very complicated. To review the biological knowledge learned before, DNA is composed of
cytosine (C), and
guanine (G) four bases of the double helix structure, to preserve the genetic information of the organism.
In computer systems, data is represented in binary, with zeros and ones for each piece. But the new algorithms used for DNA storage are not the traditional computer binary model.
To store information from a computer into DNA is to convert the computer’s binary data stream into a data-storage computing model encoded by the sequence of bases in DNA. For example, you can use binary, with 0 for base A or C and 1 for base T or G; Ternary data can also be encoded and stored to reduce error rates; Or use other data storage computing schemes that convert numeric encoding to base chemical encoding.
Since it is to store data, DNA storage technology also includes four steps: information encoding, information storage, information retrieval, and information reading.
Firstly, encode the information written into the DNA. That is, a computer algorithm maps a sequence of bits to the DNA sequence and then synthesizes the encoded DNA sequence to produce multiple physical copies of each sequence. DNA sequences can be arbitrarily arranged but of limited length, so bits are broken down into smaller chunks that can later be reassembled into the original data. This can be done by adding an index to each block, or by storing overlapping blocks of data in the DNA sequence.
Secondly, the synthetic DNA needs to be stored in a suitable way (in vivo or in vitro).
Thirdly, carry out physical retrieval and sampling of the corresponding synthetic DNA pool, i.e. information retrieval. To avoid reading all of the data in the pool, you need capabilities such as random access in computer design, or the ability to select specific data items to read from a large data set. This is easy to do in mainstream digital storage media (hard disks, etc.), but difficult to do in molecular storage due to the lack of physical organization across data items in the same molecular pool. Random access in DNA data storage can be achieved through selective processes such as magnetic bead extraction using probes mapped to data items or PCR (biological polymerase chain reaction) using primers associated with data items during the coding process.
Finally, after the DNA sample is selected, the next step is to sequence it, generate a set of sequencer sequencing fragments, and decode them back to the original digital data with high fidelity, that is, information reading, the success of which depends on the sequencing coverage and error rate in the whole process.
Despite its potential, DNA storage still faces hurdles as far as applications are concerned.
First of all, this technology is prone to errors in the process of information encoding and decoding. Some papers show that the error rate of each base in each position is about 1%, and the end customer cannot bear the risk of such a degree of error. Secondly, the overall write flux of DNA data storage is about kilobytes per second, which is 6 orders of magnitude less than the mainstream read and write flux, and the sequencing capacity is 2-3 orders of magnitude less than the mainstream read and write flux. Thirdly, in terms of cost, although the cost of DNA synthesis is still secret, according to industry analysts, the cost of array DNA synthesis is about 0.0001 US dollars per base, which works out to 800 million US dollars per TB. Finally, although there is evidence that DNA dating back thousands of years can be read, it can degrade much faster than that, depending on the conditions under which it is placed.
However, in the context of the exponential growth of global data and information, DNA storage technology has begun to explore applications in different fields, and countries have gradually realized the application prospect of DNA as a storage medium in the future and the importance of developing related new technologies.
High-throughput DNA synthesis, sequencing, and coding, as the three main technical fields of DNA storage technology, have become the focus of policy planning and technology research and development in various countries.
While challenges remain, the future of synthetic DNA storage systems remains bright and could have a profound impact on fields such as global data management and healthcare. With the joint efforts of academia and industry, it is believed that there will be many ways to build low-cost and practical DNA storage in the foreseeable future.