Re: Bugzilla.gnome.org, live.gnome.org, etc down
- From: Olav Vitters <olav bkor dhs org>
- To: Ross Golder <ross golder org>
- Cc: gnome-infrastructure gnome org, bugmaster gnome org
- Subject: Re: Bugzilla.gnome.org, live.gnome.org, etc down
- Date: Tue, 27 Mar 2007 17:55:02 +0200
On Tue, Mar 27, 2007 at 10:05:59PM +0700, Ross Golder wrote:
> Olav Vitters wrote:
> > [ The people who reported Bugzilla being down are bcc'ed. Many thanks
> > for informing us. ]
> >
> > Bugzilla.gnome.org is back up thanks to Matthew Galgoci.
> >
> >
> > Background info: the OOM-killer on the Bugzilla server killed (among
> > others) the LDAP process (contains the user names, etc). This broke the
> > LDAP replication from the master LDAP server (label) to the Bugzilla
> > server. Also broke the LDAP database on the Bugzilla server. Because of
> > the Bugzilla server not accepting the replication from the LDAP server
> > to the Bugzila server, LDAP refused to answer LDAP queries anymore.
^^^^^^^^^^^^
Here I meant the master LDAP server (label) and answering queries
externally. Locally (on label) everything seemed to work.
> > Initially I tried to recreate the LDAP database on the Bugzilla server,
> > but failed miserably. Fortunately Matthew was able to get it running
> > again.
Steps to fix:
<mgalgoci> mv /var/lib/ldap /var/lib/ldap.mjg
<mgalgoci> took a db dump from label with slapcat
<mgalgoci> scp'd the file over to box
<mgalgoci> made a new /var/lib/ldap on box owned by ldap.ldap
<mgalgoci> started up ldap on box
<mgalgoci> used slapadd to load the db
<mgalgoci> stopped ldap after the load was done
<mgalgoci> ran slapd_db_recover
<mgalgoci> started up ldap
<mgalgoci> started up nscd
<mgalgoci> verified it worked by doing getent passwd bugzilla
Instead of above I tar'ed /var/lib/ldap from label and placed it on box.
However, due to 32bit vs 64bit this only broke box even more. Also tried
the slapcat method, but that asked for a password (it was not the master
LDAP password).
Not sure about starting nscd. I think Stric mentioned that it can cause
problems.
> Was it *just* the bugzilla LDAP slave server that went down? I saw a
> bunch of 'create-auth-*' script errors too around that time that suggest
> the master LDAP server was having problems too. Maybe a bit more to this
> that the above?
The LDAP server on label was up and running, however it did not respond
to backchannel queries (it might or might not have externally --
socket).
That was very confusing to diagnose (tcpdump was lacking).
> All seems to be working hunky-dorey now, though. Thanks, mgalgoci.
--
Regards,
Olav
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]