Re: Bugzilla.gnome.org, live.gnome.org, etc down



On Tue, Mar 27, 2007 at 10:05:59PM +0700, Ross Golder wrote:
> Olav Vitters wrote:
> > [ The people who reported Bugzilla being down are bcc'ed. Many thanks
> >   for informing us. ]
> > 
> > Bugzilla.gnome.org is back up thanks to Matthew Galgoci.
> > 
> > 
> > Background info: the OOM-killer on the Bugzilla server killed (among
> > others) the LDAP process (contains the user names, etc). This broke the
> > LDAP replication from the master LDAP server (label) to the Bugzilla
> > server. Also broke the LDAP database on the Bugzilla server. Because of
> > the Bugzilla server not accepting the replication from the LDAP server
> > to the Bugzila server, LDAP refused to answer LDAP queries anymore.
                           ^^^^^^^^^^^^

Here I meant the master LDAP server (label) and answering queries
externally. Locally (on label) everything seemed to work.

> > Initially I tried to recreate the LDAP database on the Bugzilla server,
> > but failed miserably. Fortunately Matthew was able to get it running
> > again.

Steps to fix:
<mgalgoci> mv /var/lib/ldap /var/lib/ldap.mjg
<mgalgoci> took a db dump from label with slapcat
<mgalgoci> scp'd the file over to box
<mgalgoci> made a new /var/lib/ldap on box owned by ldap.ldap
<mgalgoci> started up ldap on box
<mgalgoci> used slapadd to load the db
<mgalgoci> stopped ldap after the load was done
<mgalgoci> ran slapd_db_recover
<mgalgoci> started up ldap 
<mgalgoci> started up nscd
<mgalgoci> verified it worked by doing getent passwd bugzilla

Instead of above I tar'ed /var/lib/ldap from label and placed it on box.
However, due to 32bit vs 64bit this only broke box even more. Also tried
the slapcat method, but that asked for a password (it was not the master
LDAP password).

Not sure about starting nscd. I think Stric mentioned that it can cause
problems.

> Was it *just* the bugzilla LDAP slave server that went down? I saw a 
> bunch of 'create-auth-*' script errors too around that time that suggest 
> the master LDAP server was having problems too. Maybe a bit more to this 
> that the above?

The LDAP server on label was up and running, however it did not respond
to backchannel queries (it might or might not have externally --
socket).
That was very confusing to diagnose (tcpdump was lacking). 

> All seems to be working hunky-dorey now, though. Thanks, mgalgoci.

-- 
Regards,
Olav



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]