Home · Portfolio · Support

7/29/05 Outages: What Happened

Here's a rundown of today's server problem, what it affected, and what was done to fix it.

WHAT HAPPENED?
Your server's primary disk was failing, causing various problems throughout the system. The data center replaced the faulty drive with a new one.

WHY WAS IT FAILING?
Basically, entropy. All disks fail eventually, and unfortunately this one decided to do so today.

WHAT COULD HAVE BEEN DONE TO PREVENT IT?
In this case, not much. As we said above, all disks fail eventually. There are a safeguards in place, however, to make sure that the disk failure doesn't become a disaster, such as a daily backup. FictCo's servers are backed up nightly and it is from that backup that we restored most of our data (we were able to pull a lot of data off of the drive from mid-afternoon as well). We are working with our data center to determine what will be done to minimize the amount of downtime should a similar failure happen in the furure.

WHAT HAPPENED TO TODAY'S DATA?
The data center was able to restore some data to the new disk from a backup made later in the day (at approximately 3:30pm) so changes made to your site before then should — contrary to our previous expectations — still show up.

The exception to this is SQL databases; these are in the process of being restored from a 12:00 noon backup today. Any changes made between noon and 6:15 will have been lost.

WHAT ABOUT EMAIL?
Email was trickier, will behave as follows, based on time of day received:
5:30am - 11:30am (during original problems): Regretfully, emails received by the server during this time may have been lost
11:30am - 3:30pm: Emails received by the server during this time should have been preserved. If you did not download them during this time, they are probably still on the server
3:30pm - 6:15pm: (Server was down so data center could perform backup) Emails received by the server during this time will probably have bounced. The sender's server will likely try to redeliver these emails, and they should eventually come to you.

Once again, we're sorry for the downtime and inconvenience. As we mentioned, we'll be talking at length with the data center to be sure that they will be able to handle any similar problems better, and more quickly, in the future.

Everything should be working perfectly now; if you experience any additional problems, don't hesitate to contact us.

Thanks again for your patience,

Jonathan & Jesse
FictCo

Posted by JVG at 07.29.2005

Upcoming Maintenance: No Downtime Expected

Description of Maintenance: Preventative - server farm is making additions to core routing equipment. This should not affect customer performance, and no downtime is expected at any time.

Date of Maintenance: 11PM EST July 12, 2005 - 6AM EST July 13, 2005

Affected Service Types: IP network traffic should remain unaffected.

Posted by JVG at 07. 8.2005



« June 2005 | August 2005 »


All Stale News

October 2005
September 2005
August 2005
July 2005
June 2005
December 2004
November 2004
October 2004
September 2004
August 2004
June 2004
May 2004
April 2004
January 2004
November 2003
October 2003






Powered by
Movable Type 2.63