status of signal



Team -

Based on our downtime this past evening I took an interest in our
current monitoring solution (if you could call it that). The details I
found are listed below, and I think clear up some misconceptions I've
(we've) had about this box.

signal.gnome.org is, as we know, hosted at OSUDL. It is a 2cpu VM
(QEMU Virtual CPU version 0.11.1), with 256M RAM and about 7.5G
storage. Currently it is running nagios3 on apache 1.3 and mysql
server (a requirement of nagios3?).

The current monitoring configuration is poor and looks like it has
been for some time. It is only monitoring a handful of services, the
key services not even configured properly. As an example,
window.gnome.org HTTP service: down 246d 16h 33m 12s. Most configured
services are like this. It's mostly red across the board, and I'm sure
it's simply misconfiguration.

It'll take a little bit of work but it can be cleaned up to provide
rudimentary monitoring without a lot of work. This is what I'd like to
do:

1) update to apache2 (why is it even on apache 1.3??)
2) define as a group the critical services we want monitored (I'd
suggest http for bugzilla and the wiki for starters)
3) configure SSL for the signal webserver. Auth is done by htpasswd.
We all know plain text is bad.
4) configure the nagios3 path as the default DocumentRoot. Currently /
shows some generic message, the wiki points to /nagios/, but the
actual monitoring is at /nagios3/
5) as an extra, perhaps add a DNS cname/alias for 'nagios.gnome.org'
which points to signal.
6) /etc/aliases only defines specific admins as email recipients. I
think these should be sent team-wide.

All of this would take me maybe a couple hours tomorrow. I'm
interested in any other feedback re: services monitored, notification
methods (emails to specific sysadmins per-host? emails to -sysadmin?
emails to -infrastructure?)

In the meantime I'll get started on some basic maintenance, such as
fixing the monitoring that is there.

Thanks,
Christer


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]