Unexpected Kubernetes pod terminations can be frustrating when you’re left with a vague error message. Unclear root causes often delay remediation, prolonging the duration of problems inside your cluster.
“Terminated with exit code 1” is one such generic problem that you might encounter from time to time. It occurs when the foreground process inside a container stops because of an error.
In this article we’ll look at some of the possible causes, show how to identify when a pod terminates with exit code 1, and walk through your options for debugging the issue. This should equip you to address these errors in your cluster, reducing failure rates to maximize your application’s uptime.
What is an exit code 1 error?
Processes emit a numerical exit code when they terminate. A command that successfully runs to completion should emit a 0 exit code. All other codes (from 1 to 255) indicate the program stopped unexpectedly, often because of an internal error or invalid arguments.
You can view a command’s exit code by inspecting the
? variable in your shell:
cat command issued an exit code of
1 because it received an invalid argument. Had you specified a valid file path,
cat would have successfully read its content, leading to an exit code of 0.
Exit codes between 1 and 128 are reserved for internal use by applications, while codes between 129 and 255 are used when a process is stopped by an external input. One example is the code 137: this means the operating system sent a
SIGKILL signal, perhaps to resolve a low memory situation.
An exit code 1 error can mean many different things depending on the process you’re working with. It’s a generic code that applications can use freely. A loosely held convention among Unix utility commands sees exit code 1 used to report bad inputs, such as the invalid file path in the example above. Other programs may use exit code 1 for internal or unhandled errors.
Instances of this error may be surfaced as
Exited (1) or
Terminated with exit code 1 in a container’s logs. Next, you’ll see how to identify and diagnose this exit code when working with Kubernetes applications.
Viewing Kubernetes pod exit codes
Pods that have stopped because of a non-zero exit code will show an
Error status when you list them using kubectl’s
get pods command. You can see this by adding an intentionally broken pod to your cluster. Save the following YAML to
demo-pod.yaml in your working directory:
Next, use kubectl to add the pod to your cluster:
List your pods with the
get pods command:
$ kubectl get pods
The pod has ended up in the
Error state. This is because its restart policy is set to Never so Kubernetes won’t automatically start a new container when one terminates. If you were using the
Always (default) restart policy, the pod may have a status of ‘CrashLoopBackOff’:
$ kubectl get pods
|demo-pod||0/1||CrashLoopBackOff||1 (4s ago)||9s|
Kubernetes has tried to restart the pod, but it has failed on multiple consecutive attempts. It’ll keep retrying, with an exponentially longer backoff delay before each attempt.
Whether you’re allowing automatic restarts or not, you can inspect a pod’s last exit code using the
describe pod command:
The output is relatively verbose—some sections have been omitted from the example above. Piping the command through
awk can display the exit code in isolation, without the extraneous supporting information:
Troubleshooting unexpected exit codes
Now you’ve identified that a container’s exiting with status code 1, it’s time to start solving the problem. There’s no guaranteed resolution path, because this is a catch-all error where the cause naturally varies between applications. Here are some techniques that should help uncover the problem.
Check container logs
As exit code 1 is issued from within a pod, checking its logs should be your first troubleshooting step. Although containers may seem to crash on startup, they will be briefly running until the termination occurs. Most applications will write logs that can help you debug.
kubectl logs command to retrieve the logs for the first container in your pod. When the pod’s stuck in a restart loop, this will be the container created by the most recent restart attempt.
The logs immediately reveal the root cause of the exit code 1 produced by our basic example. You can use this information to fix the
command field in the pod’s YAML file, then re-apply it to your cluster with
Carefully inspect names and arguments
Sometimes the logs won’t help you. Perhaps the application’s simply crashing too early in its lifecycle to record something useful. In this situation, the best approach is to start with the basics.
Check your pod’s YAML file for simple typos that could be executing the wrong command or providing invalid arguments. Although it’s far from universal, many applications do use exit code 1 to signal an input error, so it’s worth looking for mistakes like passing
--hostname is expected.
Make sure the image tag reference is correct, too. Specifying the wrong version of an image, such as
my-image:1 instead of
my-image:2, could trigger unexpected incompatibilities that leave your container unable to interpret your input.
Try running the command yourself
Running the command on your local machine can help identify problems that stem from the container’s environment. The application might depend on certain external characteristics that aren’t satisfied by your container image. There may even be an incompatibility with other programs, libraries, or your Kubernetes distribution, although this is rare.
You can also try manually starting a container using the same image. This can help further narrow down the possibilities:
Here, Docker is used to run the
busybox image with equivalent arguments to our Kubernetes pod manifest. The application still failed in the same way, confirming the problem isn’t something specific to the Kubernetes deployment.
Completely recreate the pod
Sometimes an “off and on again” approach can prove effective. Delete the pod completely, then add it back into your cluster. This can help to resolve transient issues that could be specific to a single Kubernetes node.
This isn’t guaranteed to succeed, as an exit code of 1 originates from inside the container. However, it could help resolve any environmental issues that are preventing the command from successfully running.
Manage resource consumption
Sometimes, you might find a pod only crashes after it’s been running for a while. This suggests that the application could have a memory leak, cache mismanagement, or another transient fault that occurs under specific conditions.
Checking the resource utilization of the hardware that hosts your Kubernetes cluster can be helpful in this situation. If your cluster’s routinely encountering low memory scenarios, your applications could break in unexpected ways. It’s possible this can provoke an exit code 1 error if the code crashes because it can’t use any more memory.
Provisioning extra nodes to serve your workloads is a good way to address this problem. Kubernetes will be able to horizontally scale your application across additional hardware, making it less likely that faults will occur. You can also try increasing the resource limits on your individual pods—in this example, each container is limited to 100 MB of memory, which may not be enough for a busy workload:
Kubernetes pod terminations that report exit code 1 indicate something has gone wrong inside the pod’s container. The application will have crashed, causing the container’s foreground process to stop and emit the exit code. This signals to Kubernetes that an error occurred.
These problems are usually caused by issues with your container image or the config parameters you supply. They can also be due to programming bugs that allow exceptions to propagate without being caught. Reviewing your Kubernetes pod logs can help you spot troublesome sections of code. Transient or recoverable issues could be mitigated by registering a catch-all error handler at the start of your program, allowing subsequent issues to be gracefully dealt with. Failing to address exit code 1 errors could leave you facing downtime if pods keep terminating.
If you're looking to build a solution to help make troubleshooting Kubernetes errors easier, you should check out Airplane. With Airplane, you can build robust internal workflows and UIs to help support your engineering workflows. For example, you can build a troubleshooting dashboard using Airplane Views and utilize the component and template library to get started easily.