Mirroring is a data redundancy technique used by some RAID levels, in particular RAID level 1, to provide data protection on a RAID array. While mirroring has some advantages and is well-suited for certain RAID implementations, it also has some limitations. It has a high overhead cost, because fully 50% of the drives in the array are reserved for duplicate data; and it doesn’t improve performance as much as data striping does for many applications. For this reason, a different way of protecting data is provided as an alternate to mirroring. It involves the use of parity information, which is redundancy information calculated from the actual data values.

You may have heard the term “parity” before, used in the context of system memory error detection; in fact, the parity used in RAID is very similar in concept to parity RAM. The principle behind parity is simple: take “N” pieces of data, and from them, compute an extra piece of data. Take the “N+1” pieces of data and store them on “N+1” drives. If you lose any one of the “N+1” pieces of data, you can recreate it from the “N” that remain, regardless of which piece is lost. Parity protection is used with striping, and the “N” pieces of data are typically the blocks or bytes distributed across the drives in the array. The parity information can either be stored on a separate, dedicated drive, or be mixed with the data across all the drives in the array.

The parity calculation is typically performed using a logical operation called “exclusive OR” or “XOR”. As you may know, the “OR” logical operator is “true” (1) if either of its operands is true, and false (0) if neither is true. The exclusive OR operator is “true” if and only if one of its operands is true; it differs from “OR” in that if both operands are true, “XOR” is false. This truth table for the two operators will illustrate:

Uh huh. So what, right? Well, the interesting thing about “XOR” is that it is a logical operation that if performed twice in a row, “undoes itself”. If you calculate “A XOR B” and then take that result and do another “XOR B” on it, you get back A, the value you started with. That is to say, “A XOR B XOR B = A”. This property is exploited for parity calculation under RAID. If we have four data elements, D1, D2, D3 and D4, we can calculate the parity data, “DP” as “D1 XOR D2 XOR D3 XOR D4”. Then, if we know any four of D1, D2, D3, D4 and DP, we can XOR those four together and it will yield the missing element.

Let’s take an example to show how this works; you can do this yourself easily on a sheet of paper. Suppose we have the following four bytes of data: D1=10100101, D2=11110000, D3=00111100, and D4=10111001. We can “XOR” them together as follows, one step at a time:

D1 XOR D2 XOR D3 XOR D4
= ( (D1 XOR D2) XOR D3) XOR D4
= ( (10100101 XOR 11110000) XOR 00111100) XOR 10111001
= (01010101.XOR 00111100) XOR 10111001
= 01101001 XOR 10111001
= 11010000

So “11010000” becomes the parity byte, DP. Now let’s say we store these five values on five hard disks, and hard disk #3, containing value “00111100”, goes el-muncho. We can retrieve the missing byte simply by XOR’ing together the other three original data pieces, and the parity byte we calculated earlier, as so:

D1 XOR D2 XOR D4 XOR DP
= ( (D1 XOR D2) XOR D4) XOR DP
= ( (10100101 XOR 11110000) XOR 10111001) XOR 11010000
= (01010101 XOR 10111001) XOR 11010000
= 11101100 XOR 11010000
= 00111100

Which is D3, the missing value. Pretty neat, huh? This operation can be done on any number of bits, incidentally; I just used eight bits for simplicity. It’s also a very simple binary calculation–which is a good thing, because it has to be done for every bit stored in a parity-enabled RAID array.

Compared to mirroring, parity (used with striping) has some advantages and disadvantages. The most obvious advantage is that parity protects data against any single drive in the array failing without requiring the 50% “waste” of mirroring; only one of the “N+1” drives contains redundancy information. (The overhead of parity is equal to (100/N)% where N is the total number of drives in the array.) Striping with parity also allows you to take advantage of the performance advantages of striping. The chief disadvantages of striping with parity relate to complexity: all those parity bytes have to be computed–millions of them per second!–and that takes computing power. This means a hardware controller that performs these calculations is required for high performance–if you do software RAID with striping and parity the system CPU will be dragged down doing all these computations. Also, while you can recover from a lost drive under parity, the missing data all has to be rebuilt, which has its own complications; recovering from a lost mirrored drive is comparatively simple.

All of the RAID levels from RAID 3 to RAID 7 use parity; the most popular of these today is RAID 5. RAID 2 uses a concept similar to parity but not exactly the same.

By Charles M. Kozierok

Data recovery Salon welcomes your comments and share with us your ideas, suggestions and experience. Data recovery salon is dedicated in sharing the most useful data recovery information with our users and only if you are good at data recovery or related knowledge, please kindly drop us an email and we will publish your article here. We need to make data recovery Salon to be the most professional and free data recovery E-book online.