Amazon launched its Elastic Compute Cloud (EC2: Elastic Compute Cloud) cloud storage product two years ago, designed to provide users with enhanced storage and computing capabilities in the form of Internet services. CDNetworks, a provider of content delivery networks, and Nirvanix, the industry’s leading cloud storage platform provider, unveiled a new collaboration and announced a strategic partnership to provide the industry’s only integrated platform for cloud storage and content delivery services today. Six months ago, Microsoft launched the beta version of Windows Live SkyDrive, which provides networked mobile hard drive services. Recently, EMC announced that it has joined the Dori Trusted Infrastructure Project to work on a global research collaboration on trust and reliability assurance in cloud computing environments, and IBM has made cloud computing standards part of a $300 million expansion program for its global backup center.
Cloud storage is becoming more and more popular, we all say “cloud”, and each has its own argument, each has its own point of view, so in the end what is cloud storage?
1. What’s Cloud Storage?
Cloud storage is a new concept that extends and develops the concept of cloud computing. Cloud computing is the development of Distributed Computing, Parallel Computing, and Grid Computing, which automatically divides a huge computing process into countless smaller sub-processes through a network, and then hands it over to a huge system composed of multiple servers to compute and analyze the results back to the user. The results are transmitted back to the user after analysis. Through cloud computing technology, network service providers can process tens of millions or even billions of pieces of information within seconds, achieving the same powerful network services as a “supercomputer”.
Cloud storage is a new concept that extends and develops the concept of cloud computing. Cloud computing is the development of Distributed Computing, Parallel Computing, and Grid Computing, which automatically divides a huge computing process into countless smaller sub-processes through a network, and then hands it over to a huge system composed of multiple servers to compute and analyze the results back to the user. The results are transmitted back to the user after analysis. Through cloud computing technology, network service providers can process tens of millions or even billions of pieces of information within seconds, achieving the same powerful network services as a “supercomputer”.
If this is still difficult to understand, then we can borrow the structure of WAN and the Internet to explain cloud storage.
Cloud-Like Network Structure
I believe we all know LAN, WAN, and Internet very well already. In a common LAN system, in order to use the LAN better, generally speaking, users need to know very well the model and configuration of each hardware and software in the network, such as what type of switch is used, how many ports are there, what routers and firewalls are used, and how they are set up respectively. How many servers are in the system, and what operating systems and software are installed. What type of connection cable is used between each device and what xml:lang=IP address and subnet mask are assigned.
But when we use WAN and Internet, we only need to know what kind of access network and user name and password to connect to WAN and Internet, we don’t need to know how many switches, routers, firewalls, and servers there are in WAN and Internet, we don’t need to know what kind of route the data reaches our computer, and we don’t need to know what software is installed on the servers in the network respectively.
WAN and Internet are completely transparent to specific users, and we often use a cloud-like graph to represent WAN and Internet, as follows:
Although this cloud graph contains many, many switches, routers, firewalls, and servers, for specific WAN and Internet users, these are not necessary to know. This cloud graphic represents the interconnected network services brought by WAN and Internet. No matter where we are, we can access WAN and Internet with a network access cable and a user and password, and enjoy the services brought by the network.
Referring to the cloud-like network structure, create a new cloud-like structure storage system, this storage system consists of multiple storage devices, through the cluster function, distributed file system or similar grid computing and other functions to work together, and through certain application software or application interface, provide certain types of storage services and access services to users.
When we use a certain independent storage device, we must be very clear what model this storage device is, what interface and transmission protocol, we must know clearly how many disks are in the storage system, what model and how large capacity they are, and we must be clear what connection cable is used between the storage device and the server. In order to ensure data security and business continuity, we also need to establish corresponding data backup systems and disaster recovery systems. In addition, regular status monitoring, maintenance, hardware and software updates, and upgrades for storage devices are also necessary.
With cloud storage, none of the above is needed for the user. All devices in the cloud storage system are completely transparent to the user, and any authorized user anywhere can connect to the cloud storage through a single access cable and access the cloud storage.
Cloud storage is not a storage, it’s a service
Like cloud-like WANs and the Internet, cloud storage is not a specific device for users, but a collection of many, many storage devices and servers. When users use cloud storage, they are not using a particular storage device, but a data access service brought by the entire cloud storage system. So strictly speaking, cloud storage is not storage, but a service.
The core of cloud storage is the combination of application software and storage devices, through application software to achieve the transformation of storage devices to storage services.
2. Architectural Model of Cloud Storage
Compared with traditional storage devices, cloud storage is not just hardware, but a complex system consisting of multiple parts such as network devices, storage devices, servers, application software, common access interfaces, access networks, and client programs. Each part takes the storage device as the core and provides data storage and business access services to the outside world through application software. The structure model of the cloud storage system is as follows:
A、Storage Layer
The storage layer is the most fundamental part of cloud storage. Storage devices can be FC Fiber Channel storage devices, IP storage devices such as NAS and iSCSI, or DAS storage devices such as SCSI or SAS. The storage devices in cloud storage are often large in number and distributed across many different geographies, and are connected to each other through WAN, Internet, or FC Fiber Channel networks.
The storage layer is the most fundamental part of cloud storage. Storage devices can be FC Fiber Channel storage devices, IP storage devices such as NAS and iSCSI, or DAS storage devices such as SCSI or SAS. The storage devices in cloud storage are often large in number and distributed across many different geographies, and are connected to each other through WAN, Internet, or FC Fiber Channel networks.
B、Basic Management Layer
The base management layer is the core part of cloud storage and the most difficult part of cloud storage to implement. The foundation management layer realizes the collaboration between multiple storage devices in cloud storage through technologies such as clustering, distributed file system, and grid computing so that multiple storage devices can provide the same service to the outside world and provide larger, stronger, and better data access performance.
CDN content distribution system, data encryption technology to ensure that the data in the cloud storage will not be accessed by unauthorized users, at the same time, through a variety of data backup and disaster recovery technology and measures to ensure that the data in the cloud storage will not be lost, to ensure the security and stability of the cloud storage itself.
C、Application Interface Layer
The application interface layer is the most flexible and versatile part of cloud storage. Different cloud storage operation units can develop different application service interfaces and provide different application services according to the actual business types. For example, video monitoring application platform, IPTV and video-on-demand application platform, network hard disk citation platform, remote data backup application platform, etc.
D、Access layer
Any authorized user can log in to the cloud storage system and enjoy cloud storage services through a standard common application interface. The type of access and means of access provided by cloud storage varies by cloud storage operation unit.
3. Technical Prerequisites for Cloud Storage
From the above cloud storage architecture model, it is clear that the cloud storage system is a collection of multiple devices, applications, and services working together, and its implementation is predicated on the development of multiple technologies
A. Development of broadband network
The real cloud storage system will be a multi-regional distribution, across the country, or even across the world, a huge public system, users need to connect to cloud storage through ADSL, DDN, and other broadband access devices, rather than through FC, SCSI or Ethernet cable directly connected to an independent, private storage device. Only when broadband networks are sufficiently developed, users are likely to have access to large enough data transmission bandwidth to achieve a large volume of data transmission and truly enjoy cloud storage services, otherwise, it is only empty talk!
B. WEB2.0 Technology
The core of Web 2.0 technology is sharing. Only through web 2.0 technology can cloud storage users achieve centralized storage and data sharing of data, documents, images, and video and audio content through a variety of devices such as PCs, cell phones, and mobile multimedia, etc. The development of Web 2.0 technology has made the applications and services available to users more flexible and diverse.
C. Application Storage Development
Cloud storage is more than storage, it is more than application. Application storage is a kind of storage device that integrates application software functions in the storage device, which has not only data storage functions but also application software functions and can be regarded as a collection of servers and storage devices. The development of application storage technology can greatly reduce the number of servers in cloud storage, thus reducing system construction costs, reducing the system by the server causing a single point of failure and performance bottlenecks, reducing data transmission links, providing system performance and efficiency, and ensuring the efficient and stable operation of the entire system.
D. Clustering technology, grid technology and distributed file systems
A Cloud storage system is a collection of multiple storage devices, multiple applications, and multiple services working together, any single point of the storage system is not cloud storage.
Since it is composed of multiple storage devices, different storage devices need to work together through technologies such as clustering, distributed file system, and grid computing, so that multiple storage devices can provide the same service to the outside world and provide larger, stronger, and better data access performance. Without the existence of these technologies, cloud storage cannot be truly realized, the so-called cloud storage can only be an independent system, cannot form a cloud-like structure.
E. CDN content distribution, P2P technology, data compression technology, deduplication technology, data encryption technology
CDN content distribution system, data encryption technology to ensure that the data in the cloud storage will not be accessed by unauthorized users, at the same time, through a variety of data backup and disaster recovery technology to ensure that the data in the cloud storage will not be lost, to ensure the security and stability of the cloud storage itself. If the data security in the cloud storage is not guaranteed, I think no one dares to use the cloud storage, otherwise, the saved data is either lost soon or the whole country knows about it!
F. Storage virtualization technology, storage network management technology
The number of storage devices in cloud storage is huge and distributed in many different regions, how to achieve logical volume management, storage virtualization management, and multi-link redundancy management among multiple devices of different vendors, different models, and even different types (such as FC storage and IP storage) will be a huge problem, and if this problem is not solved, the storage devices will be the performance bottleneck of the whole cloud storage system, and the structure will not be able to form a whole. This problem is not solved, the storage device will be the performance bottleneck of the entire cloud storage system, the structure cannot form a whole, and will also bring later capacity and performance expansion difficulties and other problems.
Another problem caused by a large number of storage devices in cloud storage and their wide geographical distribution is the storage device operation and management. Although these problems do not need to concern the users of cloud storage, for the operation of cloud storage units, it is necessary to solve the problems of centralized management, status monitoring, fault maintenance, and high labor costs through practical and effective means. Therefore, cloud storage must have an efficient centralized management platform similar to network management software, which can realize the centralized management and status monitoring of storage devices, servers, and network devices in the cloud storage system.