Skip to content


Tips to fixing a broken RAID 1

As often, I don’t really have the time to turn this into a proper guide. Plus some others do exist already, such as Replacing A Failed Hard Drive In A Software RAID1 Array. So I’ll just list some useful commands to diagnose a broken RAID and fix it in the case where there is no physical issue on the disk (i.e. I won’t cover fixing bad sectors, but I will cover checking for bad sectors).

1) Detecting the RAID is out of sync:
cat /proc/mdstat
If the RAID is synchronized, you’ll see [UU], if not you’ll see [U_] or [_U], such as:

cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sda3[0]
      972141184 blocks super 1.2 [2/1] [U_]

md1 : active (auto-read-only) raid1 sda2[0]
      1999040 blocks super 1.2 [2/1] [U_]

md0 : active raid1 sda1[0]
      488192 blocks [2/1] [U_]

unused devices: <none>

2) Checking for bad sectors, using smartctl. NB: if you don’t have it installed (possible if your host provided you with a “minimal” install), you should be able to install it with apt-get install smartmontools on Debian/Ubuntu.
Short tests (about a few minutes each):
smartctl -d ata -t short /dev/sda
smartctl -d ata -t short /dev/sdb

Reading the results
smartctl -a /dev/sda
smartctl -a /dev/sdb

If short tests show no issues, run the longer tests (about a few hours):
smartctl -d ata -t long /dev/sda
smartctl -d ata -t long /dev/sdb

3) If there are errors on one of the drives, use hdparm to get more info to identify the drive (notably its model name and serial number), so as to be able to ask your host to change the right drive.
hdparm -I /dev/sdb
hdparm -I /dev/sda

4) To list the partitions on all disks (it can be interested to compared this to the output of cat /proc/mdstat)
fdisk -l

5) In the case I had, it turn out that my RAID had desynchronized, but no disk was damaged: only the second drive had stopped and fallen out of the RAID for some reason… So I was able to reconstruct the RAID at once, by running the following command for each partition (do not forget to adapt the names of both the RAID partition and the physical disk partition):
mdadm --manage /dev/md2 --add /dev/sdb3

And then a few snapshot of reconstruction progress (it takes quite a bit of time, of course, since all the disk is read, not just the used space):

cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sdb3[2] sda3[0]
      972141184 blocks super 1.2 [2/1] [U_]
      [=>...................]  recovery =  8.4% (82301952/972141184) finish=1017.9min speed=14568K/sec

md1 : active (auto-read-only) raid1 sda2[0]
      1999040 blocks super 1.2 [2/1] [U_]

md0 : active raid1 sda1[0]
      488192 blocks [2/1] [U_]

After the md2 partition is resynchronized:

cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sdb3[2] sda3[0]
      972141184 blocks super 1.2 [2/2] [UU]

md1 : active (auto-read-only) raid1 sda2[0]
      1999040 blocks super 1.2 [2/1] [U_]

md0 : active raid1 sda1[0]
      488192 blocks [2/1] [U_]

Adding back the md1 one:
mdadm --manage /dev/md0 --add /dev/sdb1

And now after md0 was done too:

cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sdb3[2] sda3[0]
      972141184 blocks super 1.2 [2/2] [UU]

md1 : active raid1 sdb2[2] sda2[0]
      1999040 blocks super 1.2 [2/2] [UU]

md0 : active raid1 sdb1[1] sda1[0]
      488192 blocks [2/2] [UU]

Bonus because I don’t want to lose the link and I’m not sure it deserves an entire new post: a few methods to wipe empty disk space (I usually do that before giving back the servers I rent): http://superuser.com/questions/19326/how-to-wipe-free-disk-space-in-linux

Posted in hardware, Linux, servers.


0 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.



Some HTML is OK

or, reply to this post via trackback.

Please solve the CAPTCHA below in order to fight spamWordPress CAPTCHA