June 23, 2007
Last nite, I had to work on $customer. We were failing their primary
application server (a RHEL3 box) over to their failover server
(previously reinstalled as CentOS 4). Why? So we could nuke the primary
server and install CentOS 4. The gig started at 21:00 EDT, and by 02:30
EDT I was completely done and had failed back the application to the
primary server. In a mere 5.5 hrs, I installed our app on the failover
server, moved the SAN over to it, brought it up, and then nuked and
installed CentOS (via iLO over the Internet), installed our app back
onto the primary, moved the SAN back, and restarted services. I had an
application downtime of maybe 45 minutes total. And the only issue I
faced was when the newly rebuilt primary server wanted to fsck the SAN
on boot while it was attached to the failover server. Thankfully, I was
able to interrupt that action and beat the server into submission. And
did I mention that I did all this while running a Webex to show the
process to $customer's new 'support' person?
Posted in Work at 11:53:43