Three Storage Modes: Directed-Attached Storage (DAS), Storage Area Network (SAN), and Network-Attached Storage (NAS).
Three Storage Types: Block Storage, File Storage, and Object Storage.
Block storage and file storage are the two main storage types we are familiar with. Object-based storage is a new network storage architecture. Object-Based Storage Devices are short Object-Based Storage Devices (OSD).
The essence is the same, the basis is block storage, but the external interface is inconsistent or applied to different service scenarios.
Distributed storage application scenarios currently fall into three common types in terms of their storage interface:
- Object storage: that is, key-value storage in common sense, whose interfaces are simple GET, PUT, DEL, and other extensions such as seven cow, another beat, Swift, and S3
- Block storage: This interface usually exists as a QEMU Driver or Kernel Module. This interface needs to implement the Linux Block Device interface or the Block Driver interface provided by QEMU, such as Sheepdog and AWS EBS. Green Cloud’s cloud hard disk and Ali Cloud’s Pangu system, and Ceph’s RBD (RBD is Ceph’s block storage interface)
- File storage: The common meaning is to support POSIX interfaces, which are the same type as traditional file systems such as Ext4, but the difference is that distributed storage provides parallelization capabilities, such as CephFS of Ceph (CephFS is the file storage interface of Ceph), but sometimes GFS, HDFS, a non-POSIx file storage interface, falls into this category.
The two storage modes listed below are both block storage types:
- DAS (Direct Attached Storage): It is directly connected to the host server in a storage mode, each host server has an independent storage device, each host server storage device cannot work together, need to access data through the host, you have to go through a relatively complex set go, if the host server belongs to different operating systems to access each other’s data, it is more complex, some systems can’t even access it. It is generally used in a single network environment with low data exchange capacity and low-performance requirements. It can be said that it is a relatively early application of technology implementation.
- Storage Area Network (SAN): A type of storage system that connects specialized host servers over a high-speed (fiber) network. The system resides at the back end of the host cluster and uses I/O connections such as SCSI, ESCON, and Fiber Channels. In general, SAN applications are characterized by high cost and high performance in applications that require high network speed, high data reliability and security, and high data sharing performance. For example, the most important big data applications in telecommunications and banking. Uses the SCSI block I/O command set and provides high-performance random I/O and data throughput over Fiber Channel or disk-level data access. It has the advantages of high bandwidth and low latency and occupies a place in high-performance computing. However, due to the high price and poor scalability of SAN systems, systems with tens of thousands of CPUs are no longer sufficient.
Typical devices: Disk Arrays, Hard Disks
Block storage allocates raw space to hosts. Block storage provides external services at the physical layer. The system using block storage is formatted with its own file system. Once used by a system, it is monopolized.
For example, if a disk array has five disks, you can partition N logical disks per logical disk partition, RAID, or LVM. But a logical disk and a physical disk are two completely different concepts. Suppose each disk is 100GB and there are five disks in total. The five logical drives are also divided into five logical drives of 100 GB each. However, the meaning of these five logical drives is completely different than the original five physical drives. For example, for the first logical drive, the first 20 GB could be from physical drive 1 and the second 20 GB could be from physical drive 2. Thus, a logical drive is a logical creation of multiple physical drives.
The block storage system then maps these logical drives to the host by mapping them. The host operating system recognizes that there are five hard drives. However, the operating system cannot distinguish between physical and logical drives, so it treats them as just five physical drives. At least not in the perception of the operating system.
In this mode, the operating system also needs to partition and format the mounted raw hard disk, which is the same as the built-in hard disk of a common host.
Advantages of Block Storage
- The advantage of this approach is, of course, the data protection provided by means of Raid and LVM;
- Multiple inexpensive hard drives can be combined into one large-capacity logical drive to provide external services and increase capacity.
- When data is written, it is a logical disk made up of multiple disks. Therefore, data can be written to multiple disks at the same time, improving read/write efficiency.
- In most cases, the SAN architecture is used for block storage. Transmission speed and encapsulation protocol improve transmission speed and read/write efficiency
Disadvantages of Block Storage
- If SAN architecture is used, you need to buy Fiber Channel cards and Fiber Channel switches for hosts, which is expensive.
- Data cannot be shared between hosts. When a server is not clustered, a host is allocated a raw block storage disk. After formatting and using the block storage disk, the block storage device will be the local site for the host.
- This is not conducive to data exchange between hosts of different operating systems: operating systems use different file systems. Once formatted, data cannot be exchanged between different file systems. For example, the file system on Win7 is FAT32/NTFS while on Linux it is EXT4. EXT4 cannot recognize the NTFS file system.
Usage Scenario of Block Storage
Docker container, virtual machine disk storage allocation.
Typically, NAS products are file-level storage.
Network Attached Storage (NAS): A collection of network storage devices, usually directly connected to the network, providing data access services. A set of NAS storage devices is like an inexpensive system for providing data archiving services. For example educational, government, enterprise, and other data storage applications.
It uses NFS or CIFS command sets to access data, uses files as a transfer protocol, and uses TCP/IP for network storage. It has good scalability, low cost, and easy management. For example, the NFS file system is commonly used in cluster computing, but NAS is not suitable for high-performance clusters due to its high protocol overhead, low bandwidth, and high latency.
Typical devices: FTP and NFS servers
To solve the problem of files not being shared, there is file storage.
File storage is intended to provide external services at the file system level. The system only needs access to the file system level to the interface.
File storage also has hardware and software integration devices, but in fact, as long as it is installed with the appropriate operating system and software, an ordinary PC can undertake FTP and NFS services, the shelf after the service from the server, it is a kind of file storage.
Host A can upload and download files directly to the file store. Unlike block storage, Host A does not need to format file storage as the file management function has been taken over by the file storage itself.
Advantages of File Storage
- Low cost: every machine works and shared ethernet. No dedicated SAN is required so low cost.
- Convenient file sharing.
Disadvantages of File Storage
- Low read/write rate and slow transfer rate: Ethernet, upload, and download speed are slow, besides, all read and write operations of a server are on disk, compared to a disk array, which often reads and writes ten hundred disks at the same time, the speed is much slower.
Usage Scenario of File Storage
File storage with a directory structure.
Typical device: Distributed server with large-capacity hard disks
The most common object storage solution is to install large-capacity hard drives on multiple servers and then install the object storage software on multiple additional servers as management nodes. The administration node can manage other servers and grant read/write access.
The reason object storage is like this is to overcome the shortcomings of block storage and file storage and develop their respective advantages. In short, block storage is fast to read and write, which is not conducive to sharing. File storage is slow to read and write, which favors sharing. Can we get a shareable read/write block for storage? This is where object storage comes in.
First, a file contains properties (terminology: metadata, metadata, such as the file’s size, modification time, storage path, etc.) and content (data).
For example, FAT32 is a file system that directly stores a file along with metadata. The stored procedure first splits the file according to the minimum block size of the file system (for example 4 million files, if the file system requires 4K blocks, the file will be split into 1000 small blocks) and then writes it to disk. Data and metadata are not differentiated. Each block tells you the address of the next block to read and then continues to follow that order and finally completes the entire file to read all blocks. In this case, the reading and writing speed is very slow because even if you have 100 arms to read and write, only one arm works because it doesn’t know where the next block is until it reads the first one. block.
Object storage will be metadata independent, the control node called the metadata server (server + object storage management software), is mainly responsible for storing object attributes (mainly object data is scattered, storing information on the distributed server), and another is responsible for storing distributed server data called OSD, mainly responsible for storing the data part of the file. When a user accesses an object, the user first accesses the metadata server. The metadata server only returns the OSD where the object is stored. If the comment file A is stored on three OSD nodes (B, C, and D), the user will directly access all three OSD servers to read the data.
At this time, three OSD nodes will transmit data to the outside at the same time, so the transmission speed will be accelerated. If the number of OSD servers is larger, the reading and writing speed will improve. In this way, reading and writing speed should be achieved.
On the other hand, the object storage software has a special file system, so the external OSD corresponds to the file server, so there is no difficulty in sharing, but also solves the problems of file sharing. So the emergence of storage objects is a good combination of block storage and file storage advantages.
Key Technologies of File Storage
What are the key technologies for object storage file systems?
- Distributed metadata.
- Concurrent data access, and object storage architecture defines a new, more intelligent disk interface OSD.
What is OSD?
Storage Local Area Network (SAN) and Network Attached Storage (NAS) are two main network storage architectures that we are familiar with. Object Storage is a new network storage architecture. The device based on object storage technology is an object storage device (OSD for short).
What object mode is used to access objects in storage objects?
On the storage device, all objects have an object ID. The object ID OSD command is used to access objects.
What are the main functions of the OSD?
- Data storage. The OSD manages the object data and stores it on a standard disk system. The OSD does not provide access to the block interface. Clients use object IDs and offsets to read and write data when requested.
- Intelligent distribution. The OSD uses your CPU and memory to optimize data distribution and supports data prefetching. The OSD supports intelligent object prefetching that optimizes disk performance.
- Metadata management for each object. The OSD manages the metadata of objects stored on it, which is similar to traditional inode metadata, which typically includes the object’s data block and the object’s length.
Advantages of Object Storage
- High read/write speed of block storage.
- Features such as file storage sharing.
Usage Scenario of Object Storage (Suitable for updating data with few changes)
Why use file and block storage when object storage offers the benefits of file and block storage?
- There is a class of applications that require direct raw disk mapping, such as Databases. Because the data on a raw disk must be mapped and formatted according to the database file system, you cannot use other storage formatted for a specific file system. Block storage is better suited for such applications.
- The cost of object storage is higher than ordinary file storage. It is necessary to purchase special object storage software and large-capacity hard drives. If the number of data requirements is not huge, just for file sharing, please directly use the cost-effective way of file storage.
Differences Among Three Storage Types
|Block Storage||File Storage||Object Storage|
|Concept||A storage system in which specialized mainframe servers are connected by a high-speed (optical fiber) network||Using a file system, with a directory tree structure||Think of data and metadata as an object,|
|Speed||Low latency (10ms), hot spot prominent||Different technologies are different||100 ms-1s, cold data|
|Distributability||Remote unreality||Distributed, but with bottlenecks||High distribution concurrency|
|File Size||All sizes are fine. Hot spots are prominent||Suitable for large files||Suitable for all sizes|
|Interface||Driver, Kernel Module||POSIX||Restful API|
|Typical Technology||SAN||HDFS, GFS||Swift, Amazon S3|
|Usage Scenario||Bank||Data Center||Network media file storage|
Differences Among Three Storage Modes
|IP-based network||Fiber-channel-based||IP-based network|
|Transfer files||Transfer Block||Transfer files|
|It provides multiple network functions, such as user rights and capacity quota management||No network function||The network function depends on the storage server,|
|Low available bandwidth||High available bandwidth||Low available bandwidth|
|System applications and storage functions are separate and do not affect each other||System applications and storage functions are separate and do not affect each other||The same server is responsible for the system application and storage functions|
|NAS storage has a sharing function.|
It can be between multiple application servers
Automatic implementation of shared access.
|Sharing software must be installed to enable multiple application servers to share and access storage devices.||Based on the network sharing function of the DAS storage server, multiple application servers can access each other.|
|It is suitable for systems of various scales. The larger the number of application servers, the higher the simplicity and convenience ratio.||It is applicable to systems of all scales. The larger the number of application servers, the higher the proportion of network equipment cost.||Generally, it applies only to a system with one or two servers|