Skip to content

Our server exploded

Well, not really, but the result is quite similar. The server crashed early in the night, only this time a reboot didn’t help. There had been hard drive issues for a while, only I really didn’t have time to do yet another migration, and I was hoping RAID 1 would help (one of the drives had issues, but the other worked fine, without a single error in S.M.A.R.T. diagnosis) to remain online long enough until my schedule cleared up.

So, well, a few sites are back up, the others will take longer.

Some interesting lessons were learned, though:
– Murphy’s law has been once again verified
– when a server starts behaving strangely (like, Tor deamon stopping for no reason), trash it
– *particularly when you have diagnosed hard drive issues
– *and when the strange behavior also includes random crashes
– don’t rely on RAID 1 for proper redundancy at EUserv, their RAID controllers seem… well, a bit wacky. Either that, or I’ve been really massively unlucky with them. But their vKVM thing for dealing with non-booting servers is quite neat 😉
– don’t use a trial offer as a secondary DNS just because the quota is about good enough. At BuddyDNS I used to use my 300k monthly queries quota in about 25 days and finish the month without secondary, but after just 7 hours of downtime today that quota was used up. I may have missed a few e-mails :s
– don’t set your DNS serial blindly to the “recommended” format of YYYYMMDDXX without thinking it through first: once you’ve done that, it’s very hard to go back to 1, 2, 3, etc. (this is how I ended up on buddyDNS on the first place, because trying to go back to 1 2 3 broke my previous secondary DNS provider)
– some other lessons were learned, but promptly forgotten before making it into this post. The lesson for this is write down the lessons you learn as soon as you learn them.

Last but not least, very sorry for that downtime folks 🙁

Edit: well it turns out that all it took to bring the server back online was to run an fsck from the vKVM… Now smartmontools doesn’t even detect any fault on the “bad” hard drive, and more surprisingly, not a single reallocated sector either. The lesson to trash the server will still apply though (when I have the time), and this notepad stays on the new server anyway ^^

Posted in Uncategorized.

0 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

Some HTML is OK

or, reply to this post via trackback.

Sorry about the CAPTCHA that requires JS. If you really don't want to enable JS and still want to comment, you can send me your comment via e-mail and I'll post it for you.

Please solve the CAPTCHA below in order to fight spamWordPress CAPTCHA