During memory usage, data errors can be affected by hard or soft errors, which may not be a problem for the average consumer, but for enterprise and data center servers, data errors can have a significant impact on server performance. To prevent such errors, servers usually use ECC memory. So, the question is, what is ECC memory and how is it different from ordinary memory?
What’s ECC memory?
ECC Memory is error-correcting Code Memory, where ECC is a method of detecting and Correcting errors per unit of Memory. Here, we notice another problem, how can memory make mistakes?
A memory error is a problem where a value stored in memory changes. We know that the data in memory is stored in binary form with a value of 1 or 0. If the value of 1 switch to 0, or 0 switches to 1, memory is said to have “bit flipped” and the data stored in memory will change.
For a simple example, the number 135 is represented as the binary string 010000111, and if one of the values is flipped, the following situation occurs:
010000111 = 135
110000111 = 391
011000111 = 199
010100111 = 167
000000111 = 7
Depending on how the computer handles this data, a bit flip in memory can be as harmless as a performance glitch. But on the other hand, they can be catastrophic, even causing the entire computer system to crash or perform the wrong operation. On average, an 8GB memory can produce five such errors per hour of use. For the average computer user, the impact is imperceptible, but for task-intensive servers, these errors can have serious consequences.
There are many potential causes of bit flipping, the most common being background radiation, mainly caused by neutrons produced by cosmic rays. Cosmic rays are high-energy particles, usually protons, that travel at close to the speed of light. When cosmic rays hit atoms, they produce a large number of neutrons and other subatomic particles, and these neutrons then go on to have secondary interactions, and these secondary neutron interactions are thought to be the main cause of memory bit flipping errors.
Principle of error correction
So how does ECC memory prevent such errors? The WAY ECC memory detects errors in parity, which mainly detects whether a byte is even or odd by adding a 0 or 1 to the end of the byte. For example, if parity adds a byte to the odd bit 7, parity is 1, and even will be 8. If a parity byte is 0 and the result is odd, the byte is corrupted.
Of course, the parity bits of ECC memory is not always 8-bit bytes, and 7-bit codes / 64-bit bytes can also be generated by binary cyclic error correction codes. This means that the system generates a 7-bit code every time it reads 64-bit data. The purpose of detection is to determine whether the code matches. If a mismatch means it has an error, ECC memory will correct the error immediately.
Differences between ECC memory & regular memory
The biggest difference between ECC and ordinary memory in appearance is that an extra chip will be added to the PCB board to check and correct errors.
However, ECC memory brings not only advantages but also disadvantages. ECC memory is more expensive than regular memory due to the additional memory chips and their complexity. More importantly, ECC memory is about 2% slower than regular memory in terms of reading speed because of the extra time required to check for errors in-memory data.
When ECC memory is applied to a server, it monitors memory data and corrects errors in time. First, it reduces the number of crashes somewhat, especially in devices that cannot afford to corrupt in-memory data, such as computing applications or servers in the scientific and financial industries. Secondly, its data error correction can maintain data integrity, and enhance the stability of the system. In data centers, ECC memory is more reliable than regular memory.
In addition, it is important to know that most consumer PC hardware does not support ECC memory. For example, Intel and AMD consumer and pyroelectric CPUs do not support ECC, only server CPUs do.
There is no absolute criterion to determine whether ECC memory or non-ECC memory is better. It needs to be specific to the application scenario. If you are in the financial or medical industry or other critical data-related industries, you must consider configuring ECC memory in data center servers. If you’re just a regular PC user or don’t plan to use your device for major projects, you can opt for plain memory.
END.