As often, I don’t really have the time to turn this into a proper guide. Plus some others do exist already, such as Replacing A Failed Hard Drive In A Software RAID1 Array. So I’ll just list some useful commands to diagnose a broken RAID and fix it in the case where there is no physical issue on the disk (i.e. I won’t cover fixing bad sectors, but I will cover checking for bad sectors).
1) Detecting the RAID is out of sync:
cat /proc/mdstat
If the RAID is synchronized, you’ll see [UU], if not you’ll see [U_] or [_U], such as:
cat /proc/mdstat Personalities : [raid1] md2 : active raid1 sda3[0] 972141184 blocks super 1.2 [2/1] [U_] md1 : active (auto-read-only) raid1 sda2[0] 1999040 blocks super 1.2 [2/1] [U_] md0 : active raid1 sda1[0] 488192 blocks [2/1] [U_] unused devices: <none>
2) Checking for bad sectors, using smartctl. NB: if you don’t have it installed (possible if your host provided you with a “minimal” install), you should be able to install it with apt-get install smartmontools
on Debian/Ubuntu.
Short tests (about a few minutes each):
smartctl -d ata -t short /dev/sda
smartctl -d ata -t short /dev/sdb
Reading the results
smartctl -a /dev/sda
smartctl -a /dev/sdb
If short tests show no issues, run the longer tests (about a few hours):
smartctl -d ata -t long /dev/sda
smartctl -d ata -t long /dev/sdb
3) If there are errors on one of the drives, use hdparm
to get more info to identify the drive (notably its model name and serial number), so as to be able to ask your host to change the right drive.
hdparm -I /dev/sdb
hdparm -I /dev/sda
4) To list the partitions on all disks (it can be interested to compared this to the output of cat /proc/mdstat
)
fdisk -l
5) In the case I had, it turn out that my RAID had desynchronized, but no disk was damaged: only the second drive had stopped and fallen out of the RAID for some reason… So I was able to reconstruct the RAID at once, by running the following command for each partition (do not forget to adapt the names of both the RAID partition and the physical disk partition):
mdadm --manage /dev/md2 --add /dev/sdb3
And then a few snapshot of reconstruction progress (it takes quite a bit of time, of course, since all the disk is read, not just the used space):
cat /proc/mdstat Personalities : [raid1] md2 : active raid1 sdb3[2] sda3[0] 972141184 blocks super 1.2 [2/1] [U_] [=>...................] recovery = 8.4% (82301952/972141184) finish=1017.9min speed=14568K/sec md1 : active (auto-read-only) raid1 sda2[0] 1999040 blocks super 1.2 [2/1] [U_] md0 : active raid1 sda1[0] 488192 blocks [2/1] [U_]
After the md2 partition is resynchronized:
cat /proc/mdstat Personalities : [raid1] md2 : active raid1 sdb3[2] sda3[0] 972141184 blocks super 1.2 [2/2] [UU] md1 : active (auto-read-only) raid1 sda2[0] 1999040 blocks super 1.2 [2/1] [U_] md0 : active raid1 sda1[0] 488192 blocks [2/1] [U_]
Adding back the md1 one:
mdadm --manage /dev/md0 --add /dev/sdb1
And now after md0 was done too:
cat /proc/mdstat Personalities : [raid1] md2 : active raid1 sdb3[2] sda3[0] 972141184 blocks super 1.2 [2/2] [UU] md1 : active raid1 sdb2[2] sda2[0] 1999040 blocks super 1.2 [2/2] [UU] md0 : active raid1 sdb1[1] sda1[0] 488192 blocks [2/2] [UU]
Bonus because I don’t want to lose the link and I’m not sure it deserves an entire new post: a few methods to wipe empty disk space (I usually do that before giving back the servers I rent): http://superuser.com/questions/19326/how-to-wipe-free-disk-space-in-linux
0 Responses
Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.