[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: No Outage (not at 0650Z anyway!)
- To: Eric Rescorla <ekr at networkresonance.com>
- Subject: Re: No Outage (not at 0650Z anyway!)
- From: Glen <glen at amsl.com>
- Date: Tue, 13 May 2008 11:31:41 -0700
- Cc: Working Group Chairs <wgchairs at ietf.org>, testlist at mail.ietf.org, Russ Housley <housley at vigilsec.com>, Lisa Winkler <lwinkler at amsl.com>, Henrik Levkowetz <henrik at levkowetz.com>, Matt Larson <mlarson at amsl.com>, Ray Pelletier <rpelletier at isoc.org>, jari.arkko at piuha.net, danny at tcb.net, Bill Fenner <fenner at fenron.com>, Karen Moreland <kmoreland at amsl.com>
- Delivered-to: ietfarch-wgchairs-web-archive at core3.amsl.com
- Delivered-to: wgchairs at core3.amsl.com
- In-reply-to: <20080513144605.726B55081A@romeo.rtfm.com>
- List-archive: <https://www.ietf.org/mailman/private/wgchairs>
- List-help: <mailto:wgchairs-request@ietf.org?subject=help>
- List-id: Working Group Chairs <wgchairs.ietf.org>
- List-post: <mailto:wgchairs@ietf.org>
- List-subscribe: <https://www.ietf.org/mailman/listinfo/wgchairs>, <mailto:wgchairs-request@ietf.org?subject=subscribe>
- List-unsubscribe: <https://www.ietf.org/mailman/listinfo/wgchairs>, <mailto:wgchairs-request@ietf.org?subject=unsubscribe>
- References: <4828704D.5010809@amsl.com> <48289AEB.6000706@levkowetz.com> <48293EBA.50106@amsl.com> <20080513144605.726B55081A@romeo.rtfm.com>
- Sender: wgchairs-bounces at ietf.org
- User-agent: Thunderbird 2.0.0.14 (Windows/20080421)
Eric Rescorla wrote:
So, this is still puzzling. What's going on at 0630 to cause increased
processing?
I agree.
Based on everything, I am guessing that:
1. Things were already busy at that time (outside cron jobs running
against the datatracker, spam runs. etc.)
2. The addition of TMDA increased the loading effect of the spam runs on
the server.
3. The need for additional memory page buffers caused the oom-killer to
start up, which, in turn, partially shut down the server.
Based on that, I hoping that:
1. The new TMDA script will more correctly engage TMDA (i.e. bounces
were going through it before, now they are not.)
2. The memory adjustments I made will keep the server from running out
of buffers.
3. The availability of buffers will keep the oom-killer from starting.
And of course, if that fails:
1. I've set the system to panic instead of oom-killing, and
2. I've set the system to reboot on panic, so
3. If this problem happens again, it should recover itself with just a
reboot.
Glen