Exit code 137 occurs when a process is terminated because it’s using too much memory. Your container or Kubernetes pod will be stopped to prevent the excessive resource consumption from affecting your host’s reliability.
Processes that end with exit code 137 need to be investigated. The problem could be that your system simply needs more physical memory to meet user demands. However, there might also be a memory leak or sub-optimal programming inside your application that’s causing resources to be consumed excessively.
In this article, you’ll learn how to identify and debug exit code 137 so your containers run reliably. This will reduce your maintenance overhead and help stop inconsistencies caused by services stopping unexpectedly. Although some causes of exit code 137 can be highly specific to your environment, most problems can be solved with a simple troubleshooting sequence.
What is exit code 137?
All processes emit an exit code when they terminate. Exit codes provide a mechanism for informing the user, operating system, and other applications why the process stopped. Each code is a number between 0 and 255. The meaning of codes below 125 is application-dependent, while higher values have special meanings.
A 137 code is issued when a process is terminated externally because of its memory consumption. The operating system’s out of memory manager (OOM) intervenes to stop the program before it destabilizes the host.
When you start a foreground program in your shell, you can read the
? variable to inspect the process exit code:
As this example returned
137, you know that
demo-binary was stopped because it used too much memory. The same thing happens for container processes, too—when a memory limit is being approached, the process will be terminated, and a 137 code issued.
Pods running in Kubernetes will show a status of OOMKilled when they encounter a 137 exit code. Although this looks like any other Kubernetes status, it’s caused by the operating system’s OOM killer terminating the pod’s process. You can check for pods that have used too much memory by running Kubectl’s
get pods command:
$ kubectl get pods
Memory consumption problems can affect anyone, not just organizations using Kubernetes. You could run into similar issues with Amazon ECS, RedHat OpenShift, Nomad, CloudFoundry, and plain Docker deployments. Regardless of the platform, if a container fails with a 137 exit code, the root cause will be the same: there’s not enough memory to keep it running.
For example, you can view a stopped Docker container’s exit code by running
docker ps -a:
$ docker ps -a
|cdefb9ca658c||demo-org/demo-image:latest||"demo-binary"||2 days ago||Exited (137) 1 day ago|
The exit code is shown in brackets under the
STATUS column. The
137 value confirms this container stopped because of a memory problem.
Causes of container memory issues
Understanding the situations that lead to memory-related container terminations is the first step towards debugging exit code 137. Here are some of the most common issues that you might experience.
Container memory limit exceeded
Kubernetes pods will be terminated when they try to use more memory than their configured limit allows. You might be able to resolve this situation by increasing the limit if your cluster has spare capacity available.
Application memory leak
Poorly optimized code can create memory leaks. A memory leak occurs when an application uses memory, but doesn’t release it when the operation’s complete. This causes the memory to gradually fill up, and will eventually consume all the available capacity.
Natural increases in load
Sometimes adding physical memory is the only way to solve a problem. Growing services that experience an increase in active users can reach a point where more memory is required to serve the increase in traffic.
Requesting more memory than your compute nodes can provide
Kubernetes pods configured with memory resource requests can use more memory than the cluster’s nodeshave if limits aren’t also used. A request allows consumption overages because it’s only an indication of how much memory a pod will consume, and doesn’t prevent the pod from consuming more memory if it’s available.
Running too many containers without memory limits
Running several containers without memory limits can create unpredictable Kubernetes behavior when the node’s memory capacity is reached. Containers without limits have a greater chance of being killed, even if a neighboring container caused the capacity breach.
Preventing pods and containers from causing memory issues
Debugging container memory issues in Kubernetes—or any other orchestrator—can seem complex, but using the right tools and techniques helps make it less stressful. Kubernetes assigns memory to pods based on the requests and limits they declare. Unless it resides in a namespace with a default memory limit, a pod that doesn’t use these mechanisms can normally access limitless memory.
Setting memory limits
Pods without memory limits increase the chance of OOM kills and exit code 137 errors. These pods are able to use more memory than the node can provide, which poses a stability risk. When memory consumption gets close to the physical limit, the Linux kernel OOM killer intervenes to stop processes that are using too much memory.
Making sure each of your pods includes a memory limit is a good first step towards preventing OOM kill issues. Here’s a sample pod manifest:
The requests field indicates the pod wants 256 Mi of memory. Kubernetes will use this information to influence scheduling decisions, and will ensure that the pod is hosted by a node with at least 256 Mi of memory available. Requests help to reduce resource contention, ensuring your applications have the resources they need. It’s important to note, though, that they don’t prevent the pod from using more memory if it’s available on the node.
This sample pod also includes a memory limit of 512 Mi. If memory consumption goes above 512 Mi, the pod becomes a candidate for termination. If there’s too much memory pressure and Kubernetes needs to free up resources, the pod could be stopped. Setting limits on all of your pods helps prevent excessive memory consumption in one from affecting the others.
Investigating application problems
Once your pods have appropriate memory limits, you can start investigating why those limits are being reached. Start by analyzing traffic levels to identify anomalies as well as natural growth in your service. If memory use has grown in correlation with user activity, it could be time to scale your cluster with new nodes, or to add more memory to existing ones.
If your nodes have sufficient memory, you’ve set limits on all your pods, and service use has remained relatively steady, the problem is likely to be within your application. To figure out where, you need to look at the nature of your memory consumption issues: is usage suddenly spiking, or does it gradually increase over the course of the pod’s lifetime?
A memory usage graph that shows large peaks can point to poorly optimized functions in your application. Specific parts of your codebase could be allocating a lot of memory to handle demanding user requests. You can usually work out the culprit by reviewing pod logs to determine which actions were taken around the time of the spike. It might be possible to refactor your code to use less memory, such as by explicitly freeing up variables and destroying objects after you’ve finished using them.
Memory graphs that show continual increases over time usually mean you’ve got a memory leak. These problems can be tricky to find, but reviewing application logs and running language-specific analysis tools can help you discover suspect code. Unchecked memory leaks will eventually fill all the available physical memory, forcing the OOM killer to stop processes so the capacity can be reclaimed.
Exit code 137 means a container or pod is trying to use more memory than it’s allowed. The process gets terminated to prevent memory usage ballooning indefinitely, which could cause your host system to become unstable.
Excessive memory usage can occur due to natural growth in your application’s use, or as the result of a memory leak in your code. It’s important to set correct memory limits on your pods to guard against these issues; while reaching the limit will prompt termination with a 137 exit code, this mechanism is meant to protect you against worse problems that will occur if system memory is depleted entirely.
When you’re using Kubernetes, you should proactively monitor your cluster so you’re aware of normal memory consumption and can identify any spikes.
If you’re looking for a way to build a dashboard around monitoring your Kubernetes clusters or a pipeline to process your logs to catch errors like this, try using Airplane. With Airplane, you can create internal tools like these and many more to support your engineering workflows. If you need a frontend, Airplane offers a UI framework called Views that helps even non-technical team members understand and use your team’s internal tools.