Why IT infrastructure and planning is mission critical
“Beginning at approximately 11:18 PM on July 2nd and continuing through the present time Fisher Plaza experienced a significant power event that required all power systems including street power, UPS, and Generator power to be completely shut down in Plaza East.”
So begins a message on the website of AdHost, an IT hosting provider which hosted in the same space as several Seattle based companies including Redfin, Authorize.net, and Bing Travel. What’s referred to above as a “power event” was actually a fire that ended up taking out these sites, as well as many others. Though Redfin was able to get back up the following morning, some sites took longer – Bing Travel didn’t get back up until July 4th at 1pm. Imagine if your retail site – or one of your vendors – had hosted its entire operation at that data center? How much money would have been lost in the hours after the fire?
The term “disaster recovery” may be scary, but it’s not nearly as scary as the thought of losing revenue during an outage that could last days. Once a retailer realizes that outages can and do happen, there are four things they should do to prevent such outages from impacting their business:
- Set up operations across multiple data centers. Plain and simple, having multiple data centers means that if something happens to one (or more!) data centers, others keep your site up and running. Here at richrelevance we run 5 geographically distributed data centers across the country, with two more on the way. This came in handy when several months ago one of our data centers had a fire and was offline for days, and Sears.com and Walmart.com shoppers noticed zero impact to their product recommendations.
- Build failover protection and scale into your platform. More than just hosting your site across multiple data centers, you should architect your platform such that should one datacenter go down the others seamlessly pick up the traffic without any manual adjustments — and serve that traffic with the your standard level of performance. Our previous experience contributing to the software and/or hardware platform architecture at Amazon, Overstock, Hotmail, and Akamai has helped us here at richrelevance build failover safe, scalable infrastructure.
- Keep it simple! Be sure not to make your system overly complex such that even if distributed it might fall victim to interruption or other issues in the wake of an outage. For example, here was Microsoft’s public response regarding Bing Travel: “Bing Travel is a complex system of servers, databases and networking hardware that runs at massive scale,” explained Microsoft spokeswoman Whitney Burk via email. “It takes a bit of time after an interruption of power such as this one to bring it back online. Given power was restored at 2am today, we feel we had the service back up as quickly as was possible.” Bing visitors would likely have preferred a working website to a public statement.
- Have a disaster recovery plan in place. Know what you’re going to do if disaster does strike. Should, say, two of your three datacenters go down, know exactly what you would do so that that last datacenter isn’t on its own for long.
Consider this a wake up call, and unless you were hosted at that Seattle datacenter, consider yourself lucky this call didn’t cost you revenue. Invest time with your IT team to make sure you’re distributed, scalable, simple, and otherwise ready – and then make sure your vendors are as well. Your customers (and investors!) will be glad you did.
You can learn more about the Seattle incident here: