Kubernetes is a powerful orchestration tool that can be used to manage clusters of servers. It can be used to deploy applications and scale them up or down as needed. Kubernetes can also be used to manage the resources on those servers, such as CPU and memory usage. There are a few things to keep in mind when setting CPU limits in Kubernetes. First, it’s important to understand how Kubernetes uses CPUs. For most workloads, Kubernetes will use all available CPUs on the nodes in a cluster. However, there are cases where you may want to limit the number of CPUs that a particular workload can use. For example, if you have a high-performance application that requires all available CPUs, you might want to set CPU limits for that application in Kubernetes. Second, it’s important to understand how CPU limits will affect your overall system performance. If you set too many CPU limits, your system may become unstable and slow down as more and more tasks are scheduled on fewer CPUs. Conversely, if you don’t set any CPU limits at all, your system may become overloaded and unable to handle additional workloads or traffic. It’s important to test your proposed CPU limit settings before deploying them into production so that you know how they will affect your system performance. ..


Managing the resources available to your Pods and containers is a best practice step for Kubernetes administration. You need to prevent Pods from greedily consuming your cluster’s CPU and memory. Excess utilization by one set of Pods can cause resource contention that slows down neighboring containers and destabilizes your hosts.

Kubernetes resource management is often misunderstood though. Two mechanisms are provided to control allocations: requests and limits. This leads to four possible settings per Pod, if you set a request and limit for both CPU and memory.

Following this simple path is usually sub-optimal: CPU limits are best omitted because they harm performance and waste spare capacity. This article will explain the problem so you can run a more effective cluster.

How Requests and Limits Work

Requests are used for scheduling. New Pods will only be allocated to Nodes that can satisfy their requests. If there’s no matching Node, the Pod will stick in the Pending state until resources become available.

Limits define the maximum resource utilization the Pod is allowed. When the limit is reached, the Pod can’t use any more of the resource, even if there’s spare capacity on its Node. The actual effect of hitting the limit depends on the resource concerned: exceeding a CPU constraint results in throttling, while going beyond a memory limit will cause the Pod OOM killer to terminate container processes.

In the following example, a Pod with these constraints will only schedule to Nodes that can provide 500m (equivalent to 0.5 CPU cores). Its maximum runtime consumption can be up to 1000m before throttling if the Node has capacity available.

Why CPU Limits Are Dangerous

To understand why CPU limits are problematic, consider what happens if a Pod with the resource settings shown above (500m request, 1000m limit) gets deployed to a quad-core Node with a total CPU capacity of 4000m. For simplicity’s sake, there are no other Pods running on the Node.

The Pod schedules onto the Node straightaway because the 500m request is immediately satisfied. The Pod transitions into the Running state. Load could be low with CPU use around a few hundred millicores.

Then there’s a sudden traffic spike: requests are flooding in and the Pod’s effective CPU utilization jumps right up to 2000m. Because of the CPU limit, this is throttled down to 1000m. The Node’s not running any other Pods though, so it could provide the full 2000m, if the Pod wasn’t being restricted by its limit.

The Node’s capacity has been wasted and the Pod’s performance reduced unnecessarily. Omitting the CPU limit would let the Pod use the full 4000m, potentially fulfilling all the requests up to four times as quickly.

No Limit Still Prevents Pod Resource Hogging

Omitting CPU limits doesn’t compromise stability, provided you’ve set appropriate requests on each Pod. When multiple Pods are deployed, each Pod’s share of the CPU time gets scaled in proportion to its request.

Here’s an example of what happens to two Pods without limits when they’re deployed to an 8-core (8000m) Node and each simultaneously requires 100% CPU consumption:

If Pod 1’s in a quieter period, then Pod 2 is free to use even more CPU cycles:

CPU Requests Still Matter

These examples demonstrate why CPU requests matter. Setting appropriate requests prevents contention by ensuring Pods only schedule to Nodes that can support them. It also guarantees weighted distribution of the available CPU cycles when multiple Pods are experiencing increased demand.

CPU limits don’t offer these benefits. They’re only valuable in situations when you want to throttle a Pod above a certain performance threshold. This is almost always undesirable behavior; you’re asserting that your other Pods will always be busy, when they could be idling and creating spare CPU cycles in the cluster.

Not setting limits allows those cycles to be utilized by any workload that needs them. This results in better overall performance because available hardware’s never wasted.

What About Memory?

Memory is managed in Kubernetes using the same request and limit concepts. However memory is a physically different resource to CPU utilization which demands its own allocation method. Memory is non-compressible: it can’t be revoked once allocated to a container process. Processes share the CPU as it becomes available but they’re given individual portions of memory.

Setting an identical request and limit is the best practice approach for Kubernetes memory management. This allows you to reliably anticipate the total memory consumption of all the Pods in your cluster.

It might seem logical to set a relatively low request with a much higher limit. However using this technique for many Pods can have a destabilizing effect: if several Pods reach above their requests, your cluster’s memory capacity could be exhausted. The OOM killer will intervene to terminate container processes, potentially causing disruption to your workloads. Any of your Pods could be targeted for eviction, not just the one that caused the memory to be exhausted.

Using equal requests and limits prevents a Pod from scheduling unless the Node can provide the memory it requires. It also enforces that the Pod can’t use any more memory than its explicit allocation, eliminating the risk of over-utilization when multiple Pods exceed their requests. Over-utilization will become apparent when you try to schedule a Pod and no Node can satisfy the memory request. The error occurs earlier and more predictably, without impacting any other Pods.

Summary

Kubernetes allows you to distinguish between the quantity of resources that a container requires, and an upper bound that it’s allowed to scale up to but cannot exceed. However this mechanism is less useful in practice than it might seem at first glance.

Setting CPU limits prevents your processes from utilizing spare CPU capacity as it becomes available. This unnecessarily throttles performance when a Pod could be temporarily using cycles that no neighbor requires.

Use a sensible CPU request to prevent Pods scheduling onto Nodes that are already too busy to provide good performance. Leave the limit field unset so Pods can access additional resources when performing demanding tasks at times when capacity is available. Finally, assign each Pod a memory request and limit, making sure to use the same value for both fields. This will prevent memory exhaustion, creating a more stable and predictable cluster environment.