[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: No Outage (not at 0650Z anyway!)



Eric Rescorla wrote:
So, this is still puzzling. What's going on at 0630 to cause increased
processing?

I agree.

Based on everything, I am guessing that:

1. Things were already busy at that time (outside cron jobs running against the datatracker, spam runs. etc.) 2. The addition of TMDA increased the loading effect of the spam runs on the server. 3. The need for additional memory page buffers caused the oom-killer to start up, which, in turn, partially shut down the server.

Based on that, I hoping that:

1. The new TMDA script will more correctly engage TMDA (i.e. bounces were going through it before, now they are not.) 2. The memory adjustments I made will keep the server from running out of buffers.
3. The availability of buffers will keep the oom-killer from starting.

And of course, if that fails:

1. I've set the system to panic instead of oom-killing, and
2. I've set the system to reboot on panic, so
3. If this problem happens again, it should recover itself with just a reboot.

Glen