June 23, 2007

I might actually know what I'm doing

Last nite, I had to work on $customer. We were failing their primary application server (a RHEL3 box) over to their failover server (previously reinstalled as CentOS 4). Why? So we could nuke the primary server and install CentOS 4. The gig started at 21:00 EDT, and by 02:30 EDT I was completely done and had failed back the application to the primary server. In a mere 5.5 hrs, I installed our app on the failover server, moved the SAN over to it, brought it up, and then nuked and installed CentOS (via iLO over the Internet), installed our app back onto the primary, moved the SAN back, and restarted services. I had an application downtime of maybe 45 minutes total. And the only issue I faced was when the newly rebuilt primary server wanted to fsck the SAN on boot while it was attached to the failover server. Thankfully, I was able to interrupt that action and beat the server into submission. And did I mention that I did all this while running a Webex to show the process to $customer's new 'support' person?