Jeffrey Hutzelman wrote:
The key point here is that the problem is not that you are running the system out of VM, but that you are exhausting the memory available for kernel data structures which cannot be swapped out.
I saw suggestions of this during my research of the oom-killer yesterday. That was what was bothering me - because there was no way I could think of that an addition 4GB of RAM could be consumed so quickly. But when I considered the page structures - aHA! - that made more sense. So I modified min_free_kbytes and cranked it up to hold more memory open for the kernel memory tables. (Against my thinking, this seems to me to be the most correct choice. I'll feel better about this after Wednesday night / Thursday. :-) )
The oom killer also applies a weighting algorithm intended to kill less important processes and bigger memory users first. If it is killing daemons like sshd, cron, and syslog, then it is getting pretty desperate.
Yeah, I looked at the process scores. Some of them seemed to be quite random.
Good, because throwing another server at it likely won't solve the problem.
I agree completely!
You may be able to do one better. If you set the sysctl parameter vm.panic_on_oom to 2 (do this in /etc/sysctl.conf, if SuSE has that, or else echo 2 into /proc/sys/vm/panic_on_oom), then the system will panic instead of starting to kill processes. This may result in a usable backtrace in the system logs, will result in a crash dump if you have crash dumps enabled, and will also eventually result in a reboot, if you have set a nonzero panic timeout (this is the value of the sysctl parameter kernel.panic, in seconds; the default is 0, which disables automatic reboot).
Yes, thanks. I had set that up but left it commented out last night, but I think I'll just go ahead and apply it:
kernel.panic=5 kernel.panic_on_oops=1 vm.min_free_kbytes=16384 vm.panic_on_oom=2But I'm going to turn it back off briefly Wednesday night. In the event that something bad is happening on schedule, I still want to try to catch it in the act. :-)
THANK YOU for these additional comments! Glen