Basically, around  4:30 - 4:45 am US EDT (-0400),
ran out of memory for unknown reasons. Ross called me, about an
hour later, and I called Matt Galgoci, but we were unable to 
perform a remote reboot through the Dell Remote Access Card.

(we had the same problem with last week, problem
needs to be investigated.)

After some delays, we got a colo technician there about 11:00 am
and rebooted the system; came up fine. No clues into the syslog as
to what caused the EOM situation.

Future remediation:

 - Fix the DRAC configuration
 - Debug why we are running out of memory (from the memory usage
   logs from mrtg, we seem to be gradually running out of memory
   in some cases, though we also seem to be less gradually running
   out of memory in others)
 - If we can find out what parts of the system are triggering
   the EOM, look at limiting them via ulimit (?)


Attachment: signature.asc
Description: This is a digitally signed message part

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]