We need to fix our ldap setup

From: Owen Taylor <otaylor redhat com>
To: gnome-infrastructure gnome org
Subject: We need to fix our ldap setup
Date: Wed, 31 Oct 2007 14:59:51 -0400

OK, so we just had all services down for a couple hours because:

A) label is our LDAP master
B) label ran out of memory (probably because of live.gnome.org)

We need to avoid this single point of failure. Some things:

* Shouldn't we move the LDAP master to a box that isn't as
liable to be run out of memory (doesn't handle web requests
in Python)? I think we moved LDAP to label from button because
at that point label was RHEL 4 and button RHEL 3? But they
are all RHEL 5 at this point.

* We seem to have been replicating to box, which is out of
service at the moment. Should we be replicating to a different
machine? Can we configure fallover to the replicant?

* Can we figure out how to make login work for the wheel group,
who should be in /etc/passwd, /etc/group on all machines,
even when LDAP is down? Or is nss-ldap just irretrievably
busted?

- Owen

P.S. - Two notes on recovery:

* When we brought label back up, slapd immediately ran out of
file descriptors because all the other machines flooded
it. I worked around this by shutting off the other machines
with iptables and opening up to them one by one.

* slapd was complaining:

Checking configuration files for slapd: bdb_db_open: unclean shutdown detected; attempting recovery.
bdb_db_open: Recovery skipped in read-only mode. Run manual recovery if errors are encountered.
config file testing succeeded

I shut it down, and ran:

# /usr/sbin/slapd_db_recover -v -h /var/lib/ldap

(Found via google), and after two more restarts things were happy,
but I'm not sure this step was necessary. Maybe it would have
done the recovery itself if given time.

Attachment: signature.asc
Description: This is a digitally signed message part

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]