Out-of-memory (OOM) in Kubernetes – Part 4: Pod evictions, OOM scenarios and flows leading to them

This is part 4 of a four-part article that looks into what happens in detail when Kubernetes runs into out-of-memory (OOM) situations and how it responds to them. You can find the article’s full table of contents here as part of the first post.

Pod evictions

What are pod evictions in Kubernetes? It’s an automatic action that Kuwbernetes takes on a node experiencing low resources whereby one or more pods are terminated in an effort to reduce the pressure. As this article deals with memory, we’ll talk exclusively about memory as the resource a node is experiencing a shortage of.

Previously (in OOM killer and Cgroups and the OOM killer) we’ve seen how the OOM killer will ensure the available memory on the node doesn’t go below critical levels. So it’s quite apparent these 2 mechanisms – pod evictions and the OOM killer – have the same goal: ensure the node doesn’t end up without any memory left. So why have both of them active at the same time?

Kubernetes doesn’t have direct control on the OOM killer. Remember that it’s a Linux kernel feature. What Kubernetes does – and to be more specific the Kubelet on each node – is adjust the “knobs” for the OOM killer: e.g. by setting different values for oom_score_adj it alters the latter’s behavior as to which victim gets chosen first. This still doesn’t answer the initial question of why are both mechanisms needed, but only goes to tell that Kubernetes has to “live with” the OOM killer. Let’s go ahead for now, and as we get further along the answer will gradually appear.

Continue reading

Out-of-memory (OOM) in Kubernetes – Part 3: Memory metrics sources and tools to collect them

This is part 3 of a four-part article that looks into what happens in detail when Kubernetes runs into out-of-memory (OOM) situations and how it responds to them. You can find the article’s full table of contents here as part of the first post.

Metrics components

There are several components that generate and consume metrics for a Kubernetes cluster. In this section, we’ll go over some of those components that can provide useful memory metrics for us, see what they do and how they interconnect, look at the endpoints they expose, and the format data is provided in. We’ll briefly discuss some Prometheus components now, even if they’re not builtin inside Kubernetes, as we’ll make use of them further into the article. Grafana and Prometheus will be treated in depth separately in their own section later on.

There is as of now (Dec 2021) overlapping functionality between some of the components we’ll go over, since some are slated for decommissioning. Note also that things are moving fast in the Kubernetes land, so by the time you read this some of the items presented might have already changed.

Keep in mind that we’ll not discuss “grand” topics such as types of monitoring pipelines (for which you can find a starter here) nor the Kubernetes monitoring architecture (which is discussed in a proposal document – although dated by now – here). Nor will we touch upon components that don’t have a direct bearing on the problem we’re trying to solve, such as the Custom Metrics API. We’ll stick to the point and only do detours when strictly required.

Continue reading

Out-of-memory (OOM) in Kubernetes – Part 2: The OOM killer and application runtime implications

This is part 2 of a four-part article that looks into what happens in detail when Kubernetes runs into out-of-memory (OOM) situations and how it responds to them. You can find the article’s full table of contents here as part of the first post.

Overcommit

We need to discuss overcommit, otherwise some of the stuff that comes next won’t make sense, particularly if – like me – you’re coming from the Windows world. What does overcommit mean? Simply put, the OS hands out more memory to processes than what it can safely guarantee.

An example: we have a Linux system running with 12 GB RAM and 4 GB of swap. The system can be bare-metal, a VM, a WSL distribution running on Windows, etc as it doesn’t matter from the standpoint of how overcommit works. Assume the OS uses 1 GB for its kernel and various components it runs. Process A comes along and requests to allocate 8 GB. The OS (Operating System) happily confirms, and process A gets a region of 8 GB inside its virtual address space that it can use. Next process B shows up and asks for 9 GB of memory. With overcommit turned on – which is usually the default setting – the OS is happy to process this request successfully as well, so now process B has a region of 7 GB inside its own virtual address space. Of course, if we sum 1+8+9 this shoots over the total quantity of memory that the OS knows it has at its disposal (12+4), but as long as processes A and B don’t need to use all their memory at once, all is well.

Continue reading

Out-of-memory (OOM) in Kubernetes – Part 1: Intro and topics discussed

Intro

You’ve seen it all before: that important application service running inside Kubernetes goes down at the most inconvenient time. There’s a tug of war between the DevOps team – that maintains Kubernetes did nothing wrong and the service simply exited with an error message (“can’t they just fix their code?”) – and the Development team – that maintains all the logs show the application was alive and well, and there must be something that Kubernetes did that resulted in the service to be taken down (“can’t they just make that Kubernetes thing stable?”). Sometimes an “OOMKilled” tell-tale status is observed for one of the containers, other times some Kubernetes pods are seen in an “Evicted” state. There are times when there’s some strange 139 exit code, which gets mapped to something called “SIGSEGV”, but it doesn’t solve the mystery either. The investigation continues, and nodes are found that show weird numbers like 107% memory usage at times. Is that number normal? Is there some glitch in how Kubernetes reports memory on that very node? You start suspecting there might be a memory issue, and so you turn to the Kubernetes logs to get more details. But where are those to begin with? This continues, with more questions coming up as you go along and hours spent trying to figure out what went wrong. And there’s always management that at the end of the day wants an answer to “how can we prevent this from happening in the future?”…

This 4-part article will look into what happens in detail when Kubernetes runs into out-of-memory (OOM) situations and how it responds to them. The discussion will include the concepts involved, details around how the various mechanisms work, the metrics that describe memory and the tools used to gather them, diagrams to show how things come together, and in-depth analysis about some of the situations that can be encountered.

Why would you want to read this article? You’re using Kubernetes,…

Continue reading