Fault tree analysis of the September 19 downtime

A few weeks ago we had another downtime, and since this time the causes accessible to me were a bit richer than the previous downtime, which was directly caused by a network outage at our provider, I thought it would be fun to use some risk management technique on it. Namely, a fault tree analysis, which is my favorite method because I just like the concept of “why why why” 😀

Note that after some previous undetected downtimes, I had taken steps to improve downtime detection. They proved themselves useful, as this time the downtime was detected within a few minutes of onset.

The tree follows below, in PNG for the preview and in SVG for the zoomed version:

I put the root causes linked to my now former host, 1&1, in red. It’s quite obvious that most of the causes are linked to them, particularly the huge 6h delay to process the payment, which at this epoch is just inconceivable… About the “set it and forget it”, that’s something I’ve always disliked about 1&1: they force you to let them store your credit card info (a bit like Amazon except that Amazon let you deleted those info), so that they can renew automatically. This helps to forget: my other hosts have manual renewal, I never forgot to renew there…

I find this risk analysis method really straightforward. If you’re interested in further reading, some more links (the first one is in English, but the others are in French):
https://en.wikipedia.org/wiki/Fault_tree_analysis
https://fr.wikipedia.org/wiki/Arbre_des_causes
http://eocastle.birdsallinteractive.com/images/arbre-des-causes
http://www.travailler-mieux.gouv.fr/IMG/pdf/CRAM_bourgogne.pdf
http://hse.iut.u-bordeaux1.fr/lesbats/H-arbre%20des%20causes/ADC.HTM
http://fr.cyclopaedia.net/wiki/Arbre-Des-Causes

Posted in security, servers.

rev="post-3977" No comments

By patheticcockroach – 2013-10-07

0 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

« How to fix Piwik’s dashboard stuck on “Loading data…” How to duplicate a complete folder with subfolders over SSH using command line only (SCP) »

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Fault tree analysis of the September 19 downtime

0 Responses

See also…

Recent Comments

Meta

Calendar

Archives

Fault tree analysis of the September 19 downtime

0 Responses

Subscribe

See also…

Recent Comments

Meta

Calendar

Archives