Monthly Archives: October 2011
Reporting from Flagstaff, this is my last post before heading into the Grand Canyon tomorrow where I will be leading a trip down the river and I will be rowing a 2000 pound, 18′ raft for the next 21 days and traveling 220 miles down the Colorado River. There is no connectivity there, so don’t expect any blog posts for awhile. If you follow me on Facebook or Twitter (@davehabz) then you have already been following me on my journey.
As a BlackBerry Deployment Engineer for Microsoft’s Office 365 cloud service, I am sometimes privy to confidential information. In this case, I will leak to you that RIM had a huge BlackBerry service outage this week. OK, so maybe you already heard that. While the root cause analysis (RCA) will take time to complete all the details, they did report that it came down to a network switch failed and the backup did not take over as expected. The result caused a cascade of system failures. Right now it sucks to be RIM. And it is easy to sit back and admonish RIM for not having been better prepared. I’m sure they will learn from this mistake. When I was growing up, as my parents sent me off to school, they would always say “Have a great day and make lots of mistakes!” Why? Because they knew that we all learn from our mistakes. Since then I have come to a new conclusion: I can’t afford to make all the mistakes I need to learn. So I have adopted a new philosophy:
If Intelligence is the ability to learn from your mistakes, then Wisdom is the ability to learn from the mistakes of others.
In this case, I really don’t want to make the same mistake RIM made. So what can we learn from RIM’s mistake? When it comes to the most critical systems, have multiple redundancies, not just one backup system as was the case at RIM. Cave divers always have 3 systems to keep them alive. Medical systems often have 3 redundant systems. Football teams have third string players for key positions. The space shuttle had 3 to 5 redundancies for those most critical systems! Murphy’s Law states “Anything that can go wrong will go wrong.” and one of the many corollaries states “Everything goes wrong at once.”
Take a moment to learn from RIM’s mistake. For your most critical of mission-critical systems, have multiple redundancies. If it is a hard sell to management, just point them to Black(Berry) Monday, October 10, 2011.