Master NVMe: Your Ultimate Guide to How It Works!

Discover the fundamentals of NVMe internal mechanics in one definitive article, perfect for tech enthusiasts and professionals.

Leo Zhi
April 19, 2024
9:00 am

The emergence of any new technology is to address existing problems. NVMe was also developed to solve a current problem: the contradiction between the rapidly increasing performance of storage media and the poor performance of transmission channels. The performance of SSD-based storage devices is skyrocketing, but there has been no substantial improvement in the performance of SAS and SATA interfaces.

Currently, SAS and SATA based on the SCSI protocol can only have a single queue with relatively low depth, with queue depths of 254 and 32, respectively. NVMe, designed from the outset with this issue in mind, can support up to 64K queues (65,535 command queues and one administration queue), and each queue can have a depth of up to 64K. Compared to the SCSI protocol, it’s like the difference between a narrow country lane and a dual eight-lane highway.

⬆️ Figure 1: Beautiful countryside down roads and highways

Basic Principles of NVMe

To facilitate the understanding of the relationship between the host and NVMe devices, we simplify the internal structure of NVMe devices here. As shown in Figure 2 from the NVMe white paper, the host is referred to as “Host,” and the NVMe device as “Controller.” Interaction between the host and the controller is achieved through queues in shared memory.

⬆️ Figure 2 NVMe Multi-Queue Diagram (Image Credit: Internet)

NVMe queues are of two types: one is used for management, known as the Admin Queue, of which there is only one, and the other is the Command Queue, which can have up to 65,535. The number and mode of command queues are set through the Admin Queue. Each queue is a pair of queues, consisting of a Submission Queue and a Completion Queue. The Submission Queue is used for the host to send NVMe commands to the NVMe device, while the Completion Queue is for the NVMe device to provide feedback to the host about the status of command execution. Additionally, there is another mode in NVMe where multiple Submission Queues share the same Completion Queue, which will not be discussed in this text.

NVMe Queue and Command Processing

From the previous discussion, we know that NVMe uses queues to pass control commands and other information. What exactly are these queues? The submission queue and the completion queue are just areas in memory. In terms of data structure principles, these queues are circular buffers, as shown in Figure 3.

⬆️ Figure 3 Circular Buffer (Image Credit: Internet)

NVMe uses a doorbell mechanism to notify the controller whether the command queue has new data requests/commands. That is, each queue has a doorbell pointer. For the submission queue, this pointer represents the tail pointer of the queue. After the host writes data to the submission queue, it updates the tail pointer mapped to the device register space. At this point, the controller is aware of the new request/command and can begin processing it.

When the controller completes an NVMe request, it notifies the host through the completion queue. Unlike the submission queue, the completion queue uses an interrupt mechanism (which can be INTx, MSI, or MSIx) to inform the host. Figure 4 shows the complete processing flow of a command.

⬆️ Figure 4 Complete Command Processing Flow (Image Credit: Internet)

NVMe Command Format

Having discussed the command sending and processing flow, let’s now look at what an NVMe command looks like. Figure 5 shows the specific format of an NMVe command. If you are familiar with the TCP/IP or SCSI protocols, then understanding this diagram will be quite straightforward. In Figure 4, each row consists of 8 bytes, with the command size totaling 64 bytes.

This command format includes several complex fields. Due to space limitations and to avoid overwhelming readers, not all details will be discussed here. Instead, we’ll briefly introduce a few key fields of this command format. The Command Identifier identifies a specific command. The Namespace Identifier indicates which namespace the command is sent to. Data Pointer 1 and Data Pointer 2 are used to specify the precise locations of the data.

Two points to note:

An NVMe controller can have multiple namespaces, identified by the Namespace ID.
Commands and data are separate, unlike in TCP where data follows the command.

Here, we focus on the Command Identifier, which occupies 4 bytes of space. Although it’s only 4 bytes, it is divided into 3 major parts and 6 minor parts, as shown in Figure 6.

⬆️ Figure 6 Command Identifier Format (Image Credit: Internet)

We will describe the meaning of each field from the least significant bit to the most significant bit:

OPC: Stands for Opcode, which is the operation code of the command being executed. Specifically, it defines what the controller is supposed to do, such as read data, write data, flush, etc.

⬆️ Figure 7 OPC Definition (Image Credit: Internet)

FUSE: Stands for Fused Operation, which indicates whether the command is a regular command or a compound command. Table 8 from the white paper explains this field.

Value	Definition
00b	Fused operation, the first command
01b	Fused operation, first command
10b	Fused operation, the second command
11b	Reserved

⬆️ Table 8 FUSE Definition

PSDT: Stands for PRP or SGL for Data Transfer, which describes the organization of memory used to store data.

Performance of NVMe

Finally, let’s look at the performance comparison between NVMe, SAS, and SATA storage devices. To avoid any suspicion of advertising, this article will not specify the manufacturers and types of the devices.

⬆️ Figure 9 Performance Comparison among PCIe or NVMe, SAS, and SATA (Image Credit: Internet)

From the above figure, it is clear to see the performance difference between SAS and SATA devices compared to NVMe devices, especially in terms of read operations, where NVMe has a definitive performance advantage.

Disclaimer: This article is created by the original author. The content of the article represents their personal opinions. Our reposting is for sharing and discussion purposes only and does not imply our endorsement or agreement. If you have any objections, please contact us through the provided channels.

It’s Leo Zhi. He was born on August 1987. Major in Electronic Engineering & Business English, He is an Enthusiastic professional, a responsible person, and computer hardware & software literate. Proficient in NAND flash products for more than 10 years, critical thinking skills, outstanding leadership, excellent Teamwork, and interpersonal skills. Understanding customer technical queries and issues, providing initial analysis and solutions. If you have any queries, Please feel free to let me know, Thanks

Master NVMe: Your Ultimate Guide to How It Works!

Table of Contents

Basic Principles of NVMe

NVMe Queue and Command Processing

NVMe Command Format

Performance of NVMe

Latest Posts

Rare Earth Elements Explained by a Semiconductor Expert

Wafer Bonding Types Fully Explained in One Clear Chart

Get Free Consultation

Products

DiskMFR

Others

contact us today

Please let us know what you require, and you will get our reply within 24 hours.

Let's Have A Chat

Learn How We Served 100+ Global Device Brands with our Products & Get Free Sample!!!

Master NVMe: Your Ultimate Guide to How It Works!

Table of Contents

Basic Principles of NVMe

NVMe Queue and Command Processing

NVMe Command Format

Performance of NVMe

Latest Posts

Rare Earth Elements Explained by a Semiconductor Expert

How Yttrium Oxide (Y₂O₃) Enhances Modern Chip Performance

Best Chemical Solutions for Cleaning Photomask Plates

Wafer Bonding Types Fully Explained in One Clear Chart

Get Free Consultation

Please let us know what you require, and you will get our reply within 24 hours.

Let's Have A Chat

Learn How We Served 100+ Global Device Brands with our Products & Get Free Sample!!!