RAID Arrays

In the server world, we use RAID. RAID stands for “redundant array of independent disks” and is a fancy way of saying “I have a bunch of hard drives connected together.” The purpose for RAID is to safely store data.
A long time ago, we used to use hardware RAID. That was when your motherboard controlled how the data was written to the disks, usually through the BIOS. That way has been deprecated and now we use software RAID, which is where your operating system (OS) decides how the data will be written to the disks. This way is much better.
There are many types of RAID, but only a few still in use today, and of those, choices are slim for real-world usage.
RAID0 (stripe)

While this is indeed RAID, it probably the worst idea for RAID ever. RAID0 takes your data and breaks it into pieces and spreads it out over 2 or more drives. So say your file is File A. File A is broken up into 5 parts and spread across 5 disks (see the picture above). While this is fast, the issue is if one of those drives dies (all server owners know this is an inevitability) since File A is now missing a piece, File A is no longer accessible. In other words, your data is gone. And not just our example file. Your entire hard drive array is now garbage because all the data is dependent on all the disks functioning. When one disk dies, the data across all disks is lost. Never use RAID0.
RAID1 (mirror)

RAID1 is amazing. In a RAID1 array, you have 2 disks. Data from the first disk is copied in real time to the second disk. For example, if you save File A to your hard drive, your computer actually saves two copies of File A in case one of your disks dies. This means two things:
- If one disk dies, you have a complete second copy that is up-to-date. Since we use software RAID now, you won’t even notice this failure other than your OS telling you to replace the failed drive. You can go about your day without catastrophe.
- This costs the most money. If we bought two 4TB hard drives and put them in a RAID1 array, we only get 4TB of usable space out of them because the second drive is used purely as a backup. So even though we purchased 8TB of hard drives, we are only getting 4TB of usable space. Thats the cost of safety.
Resilvering
Now is a good time to explain what happens when one of your hard drives dies. First, don’t panic. The idea behind RAID is everything is going to be OK. Once your OS tells you a hard drive is dead simply replace it with a new one of equal or greater size (this is where hot-swapping comes in handy). Your OS will then spread the existing data equally across all the drives so the new one balances the load. This process of re-balancing the data is called resilvering. Resilvering is stressful for your drives, so many people replace disks at night so in the morning everything is done.
RAID5 (RAIDZ1)

RAID5 is also known as RAIDZ1 in modern terminology. This type of RAID requires at least three drives and uses parity bits spread across all drives to keep your data safe. I’m not going to go into how the parity bits work because its technical and you don’t have to do anything other than let your OS handle it. RAID5 can tolerate one drive failure before data loss, which means if one drive dies, your data is still safe. However, if two drives die, its the same situation as RAID0 – your data is gone. What you need to know is how much space this safety costs you. It takes one hard drive worth of space to store all the parity, even though the parity is spread evenly across all drives (see image above). For example, if you have three 4TB drives, thats 12TB of space. But since parity costs one hard drive, your usable space is 8TB. People who usually use RAID5 do it with 5+ drives, which is more cost-effective. Say you have five 4TB drives, which is 20TB of space. RAID5 gives you 16TB worth of usable space since one of those 4TB drives is used for parity.
RAID5 has fallen out of favor due to the stress resilvering puts on the drives. For example, if you are using RAID5 and one of your hard drives dies and you replace it for resilvering and another drive dies during resilvering, your data is gone. The likelihood of this happening is higher with more disks of larger capacity. Most people will tell you not to use more than 8 disks of no larger capacity than 4TB when using RAID5; this is the safe limit.
RAID6 (RAIDZ2)

RAID6 is also know as RAIDZ2 in modern terminology. It is the exact same as RAID5 but requires at least four drives and spreads parity across all drives, but takes two disks worth of space for this parity safety. RAID6 is what is in favor today for large arrays. However, increased safety comes at a cost. Since RAID6’s parity takes up two disks worth of space, multiple disk arrays are commonplace here. Its not crazy to see people with eight 16TB RAID6 arrays, which would yield 96TB usable space from a 128TB array. The reason RAID6 is preferred is in the event of disk failure. When one drive fails, parity is still maintained across all drives, so in the event a second disk fails during resilvering, data integrity is maintained. RAID6 is considered safe enough for enterprise use, which is where it is regularly found.
RAID10 (striped mirror)

RAID10 is also known as a “striped mirror” in modern terminology. As you can see from the picture, RAID10 is a hybrid of RAID1 + RAID0. I know I said “RAID0 is bad” earlier, but this implementation of it is totally fine. Basically RAID10 is two (or more) RAID1 arrays side-by-side. Think of it as daisy-chaining RAID1 arrays together. RAID1 is a great place to start with RAID because it takes the fewest number of drives, thus saving money. However, RAID1 only works with up to two disks. What if you need more capacity than just two disks can offer? Simple, use RAID10 by adding two more disks in RAID1 to your already existing RAID1 array. You can do this until you run out of space to put all of those hard drives.
From a safety perspective, drive failure is limited to one drive per RAID1 array. So looking at the picture above as an example of two RAID1 arrays, any two disks can fail that aren’t in the same RAID1 array. So disk 1 and disk 3, disk 2 and 4, disk 1 and 4, or disk 2 and 3. However, if disk 1 and 2, or 3 and 4 were to fail, all data would be lost across the entire array. The likelihood of this is so low RAID10 is considered very safe.
Which should I use?
For beginners, start with a mirror. It takes only two disks and is very safe. To expand, buy two more disks and add them to the array. If you know you want to go big for your storage use RAIDZ2. It is super safe and will give you a ton of space for your large files. All other types of RAID should only be used in special cases, or really not at all.