In the latest version of the proc filesystem the OOMKiller has had some adjustments. The valid range is now -1000 to +1000; previously it was -16 to +15 with a special score of -17 to outright disable it. It also now uses /proc/<pid>/oom_score_adj
instead of /proc/<pid>/oom_adj
. You can read the finer details here.
Given that, systemd now includes OOMScoreAdjust
specifically for altering this. To fully disable OOMKiller on a service simply add OOMScoreAdjust=-1000
directly underneath a [Service]
definition, as follows.
... [Service] OOMScoreAdjust=-1000 ...
This score can be adjusted if you want to ensure the parent PID lives, but children processes can be safely reaped by setting it to something like -999, then if “/bin/parent”, has “/bin/parent –memory-hungry-child,” it will be killed first.
If you have a third-party daemon (like Datadog, used in this example below) which manages itself and uses a sysvinit script you can still calm the OOMKiller. A good way I’ve found this is, at whatever regular interval you choose, adjust the oom_score_adj
, manually.
As a raw example, using all datadog processes, I’ve done the following (as root):
pgrep -f "/opt/datadog-agent/embedded/bin/python" | while read PID; do echo -1000 > /proc/$PID/oom_score_adj; done;
In an example Ansible playbook that would allow you to exclude more than one group of processes:
default.yml
--- oomkiller_exclusions: - "/opt/datadog-agent/embedded/bin/python" - "/opt/my-process/bin/foo"
main.yml
--- - name: Exclude processes from oomkiller cron: name: "Exclude {{ item }} from oomkiller" job: "pgrep -f \"{{ item }}\" | while read PID; do echo -1000 > /proc/$PID/oom_score_adj; done" minute: "*/5" state: present with_items: "{{ oomkiller_exclusions }}" tags: - oomkiller
Note:
It’s not directly in any of the docs that I linked, but some commenters mentioned that children processes inherit parent processes oom_score_adj. I confirmed this with some quick testing. In the below, 11745 is a python CLI, and 12203 is subprocess.call(["sleep", "60"])
called by 11745.
$ cat /proc/11745/oom_score_adj 0 $ echo -1000 > /proc/11745/oom_score_adj $ cat /proc/12203/oom_score_adj -1000
In which kernel do you find that the oom_score is NOT inherited by the child process?
LikeLike
Currently 3.10.0-514.2.2.el7.x86_64, will be writing a small script to determine if there are any cases where this is not true, or if I’ve simply had a very anecdotal experience.
LikeLike
> Any children processes spawned by the main service process are still susceptible to be killed by the OOMKiller
Nope, by default child process inherits oom_score_adj from parent.
LikeLike
Can you link the relevant chunk of documentation for this? Section 3.1 and 3.2 on /proc in the document I linked unfortunately have nothing to say about this.
LikeLike
I didn’t find it in kernel documentation, I found it somewhere on the internet and checked manually on my machine.
LikeLike