Contingency planning for move
- From: Owen Taylor <otaylor redhat com>
- To: gnome-infrastructure gnome org
- Subject: Contingency planning for move
- Date: Thu, 10 Dec 2009 14:52:36 -0500
Spent a little time thinking about contingency plans if servers don't
survive the move. (Actually, this should be turned into a standing
contingency plan - almost nothing here is specific to the move, I'm
just worried about jostling during a move triggering latent hardware
failures.)
Of particular concern are the four old servers that don't have active
service contracts; if these suffered a failure, we wouldn't have a
easy path to getting them repaired in a timely fashion. We might be able
to cajole someone in Red Hat IT into putting in a replacement drive if
we mailed one out there, but that's about all.
container.gnome.org (Sep. 2003, HP donation)
window.gnome.org (Apr. 2004)
menubar.gnome.org (Apr. 2004)
button.gnome.org (Apr. 2004)
(Clearly in the near future we need to look into replacing these
machines; it might be possible to recertify them but I doubt it makes
sense.)
The three newer Red Hat donated servers should have active 24x7 onsite
service contracts:
label.gnome.org (May 2006)
vbox.gnome.org (Dec. 2008)
drawable.gnome.org (Dec. 2008)
So basic contingency plan for these would be to get them repaired
(restore from backups if necessary, but they are all RAID-1 or
RAID-10, so hopefully not.) That should be faster than trying to
move stuff around.
The main potential problem would be if they got dropped and destroyed
or lost during the move; there is insurance, but it could be
weeks to get them replaced, especially with the holidays.
I'm not sure about the Sun donated server:
fixed.gnome.org (2006?)
But it doesn't run any essential services, so I'm less concerned about
it. Diving into detail:
container.gnome.org
===================
What it runs:
NFS export of /home/users, /home/admin, and mail archives
Cobbler
sysadmin.gnome.org
Contingency plan:
We have 90G of unallocated disk space on drawable.gnome.org
(and 40G more that could be used at a pinch). /home/user
is 35G, /home/admin 1G, so there's no problem putting them
onto a partition on drawable, and drawable has tons of
spare IO capacity. Bugzilla isn't stressing it at all.
Mail archives are 30G, could also put them on drawable.gnome.org
to keep things simple, or could export them from vbox
vbox where we have lots of unallocated (slow) disk space.
menubar.gnome.org
=================
What it runs:
ns-master.gnome.org
Cluster email
Mailman
Contingency:
Create a VM on vbox.gnome.org, restore to that. Menubar is
actually not very loaded either for CPU or disk, so I think
we could get away with running it on vbox.gnome.org without
impacting mail or the other services on vbox (git, bugzilla)
window.gnome.org
=================
What it runs:
master.gnome.org
www.gnome.org, planet.gnome.org, art.gnome.org,
library.gnome.org, other miscellaneous websites
Contingency:
Create a VM on vbox.gnome.org, restore to that. window is
pretty heavily loaded, and I wouldn't be happy having it
putting more load on vbox.gnome.org's disks, but it should
OK for a short period of time. We could investigate moving
high load services (art.gnome.org, planet.gnome.org) to
fixed.gnome.org which is basically unused, or scramble to
find new hardware.
button.gnome.org
================
What it runs:
Mango
Miscellaneous databases:
blogs.gnome.org, artweb, gnomejournal, rt3, mango
Contingency:
Create a VM on vbox.gnome.org, restore to that. Migrate databases
to drawable.gnome.org after getting initial functionality back.
Mango could stay on a VM on vbox.gnome.org indefinitely.
label.gnome.org
===============
What it runs:
LDAP
Wikis (live.gnome.org, gnome-db.org, pango.org)
XMPP server (Openfire)
Contingency:
Get machine repaired if possible.
In case of complete loss, temporarily migrate services to
fixed.gnome.org, which is basically unused, while waiting for
replacement.
drawable.gnome.org
===============
What it runs:
bugzilla.gnome.org database
Contingency:
Get machine repaired if possible.
In case of complete loss, get a replacement as fast as possible,
try to get a loaner machine from Red Hat IT.
(Maybe could run the database on vbox.gnome.org, but it is
doing a lot already, and its disks weren't spec'ed for database
operation.)
vbox.gnome.org
==============
What it runs:
bugzilla.gnome.org
git.gnome.org
puppet
Contingency:
Get machine repaired if possible.
In case of complete loss, get a replacement as fast as possible,
try to get a loaner machine from Red Hat IT.
(There's not really any machine where we could move stuff; maybe
could set up git on fixed.gnome.org temporarily.)
fixed.gnome.org
===============
What it runs:
build.gnome.org (master server for buildbot)
Mock environment for package builds
Contingency:
build.gnome.org could be set up in a VM on vbox
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]