Re: Status of the build.gnome.org service



Is there anything we could monitor through Nagios for getting notified
about service outages?

Going with a systemd unit file and proper documentation (SOPs) about
common use cases might be a good starting point especially now that we
are receiving new boxes with SSDs but generally better hardware to
migrate GNOME OSTree to.

2015-12-03 18:16 GMT+01:00 Colin Walters <walters verbum org>:
On Thu, Dec 3, 2015, at 12:14 PM, Colin Walters wrote:
On Thu, Dec 3, 2015, at 09:55 AM, Andrea Veri wrote:

On this side the whole service seemed to be down for around two days
(from the 1st to the 3rd) and the only way we noticed the outage was
someone joining #sysadmin and reporting the downtime.

I think what happened is `git fetch` got stuck again on
https://gitlab.com/groups/uhttpmock
I need to add timeout support.  Ideally detecting this case - I think
gitlab can sometimes start a response where the TCP connection
stays alive but the server side never sends anything.

FTR, what I did is log in, run `ps auxwf|grep git`, notice the `git fetch`
had been running for a long time, then `kill <pid>`.



-- 
Cheers,

Andrea

Debian Developer,
Fedora / EPEL packager,
GNOME Infrastructure Team Coordinator,
GNOME Foundation Board of Directors Secretary,
GNOME Foundation Membership & Elections Committee Chairman

Homepage: http://www.gnome.org/~av


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]