kubectl scale command is used to immediately scale your application by adjusting the number of running containers. This is the quickest and easiest way to increase a deployment’s replica count, and it can be used to react to spikes in demand or prolonged quiet periods.
In this article, you’ll see how to use
kubectl scale to scale a simple deployment. You’ll also learn about the options you can use when you need a more sophisticated change. Finally, you’ll look at the best practices for running
kubectl scale, as well as at some alternative methods for adjusting Kubernetes replica counts.
Kubectl Scale use cases
The kubectl scale command is used to change the number of running replicas inside Kubernetes deployment, replica set, replication controller, and stateful set objects. When you increase the replica count, Kubernetes will start new pods to scale up your service. Lowering the replica count will cause Kubernetes to gracefully terminate some pods, freeing up cluster resources.
You can run
kubectl scale to manually adjust your application’s replica count in response to changing service capacity requirements. Increased traffic loads can be handled by increasing the replica count, providing more application instances to serve user traffic. When the surge subsides, the number of replicas can be reduced. This helps keep your costs low by avoiding utilization of unneeded resources.
The most basic usage of
kubectl scale looks like this:
Executing this command will adjust the deployment called
demo-deployment so it has three running replicas. You can target a different kind of resource by substituting its name instead of
Now we’ll look at a complete example of using
kubectl scale to scale a deployment. Here’s a YAML file defining a simple deployment:
Save this YAML to
demo-deployment.yaml in your working directory. Next, use kubectl to add the deployment to your cluster:
Now run the
get pods command to view the pods that have been created for the deployment:
Only one pod is running. This is expected, as the deployment’s manifest declares one replica in its
A single replica isn’t sufficient for a production application. You could experience downtime if the node hosting the pod goes offline for any reason. Use
kubectl scale to increase the replica count to provide more headroom:
get pods command to confirm that the deployment has been scaled successfully:
There are now five pods running for the
demo-deployment deployment. You can see from the
AGE column that the
scale command retained the original pod and added four new ones.
After further consideration, you might decide five replicas are unnecessary for this application. It’s only running a static NGINX web server, so resource consumption per user request should be low. Use the
scale command again to lower the replica count and avoid wasting cluster capacity:
get pods command:
Kubernetes has marked two of the running pods for termination. This will reduce the running replica count down to the requested three pods. The pods selected for eviction are sent a SIGTERM signal and allowed to gracefully terminate. They’ll be removed from the pod list once they’ve stopped.
Sometimes you might want to scale a resource, but only if there’s a specific number of replicas already running. This avoids unintentional overwrites of previous scaling changes, such as those made by other users in your cluster.
--current-replicas flag in the command to use this behavior:
This example scales the
demo-deployment deployment to five replicas, but only if there’s currently three replicas running. The
--current-replicas value is always matched exactly; you can’t express a condition as “less than” or “greater than” a particular count.
Scaling multiple resources
kubectl scale command can scale several resources at once when you supply more than one name as arguments. Each of the resources will be scaled to the same replica count set by the
This command scales the
database deployments to five replicas each.
You can scale every resource of a particular type by supplying the
--all flag, such as this example to scale all the deployments in your
This selects every matching resource inside the currently active namespace. The objects that were scaled are shown in the command’s output.
You can obtain granular control over the objects that are scaled with the
--selector flag. This lets you use standard selection syntax to filter objects based on their labels. Here’s an example that scales all the deployments with an
Changing the timeout
--timeout flag sets the time Kubectl will wait before it gives up on a scale operation. By default, there’s no waiting period. The flag accepts time values in human-readable format, such as
This lets you avoid lengthy terminal hangs if a scaling change can’t be immediately fulfilled. Although
kubectl scale is an imperative command, changes to scaling can sometimes take several minutes to complete while new pods are scheduled to nodes.
kubectl scale is generally the fastest and most reliable way to scale your workloads. However, there are some best practices to remember for safe operations. Here are a few tips.
- Avoid scaling too often. Changes to replica counts should be in response to specific events, such as congestion that’s causing requests to run slowly or be dropped. It’s best to analyze your current service capacity, estimate the capacity needed to satisfactorily handle all the traffic, then add an extra buffer on top to anticipate any future growth. Avoid scaling your application too often, as each operation can cause delays while pods are scheduled and terminated.
- Scaling down to zero will stop your application. You can run
kubectl scale --replicas=0, which will remove all the containers across the selected objects. You can scale back up again by repeating the command with a positive value.
- Make sure you’ve selected the correct objects. There’s no confirmation prompt, so be sure to pay attention to the objects you’re selecting. Manually selecting objects by name is the safest approach, and prevents you from accidentally scaling other parts of your application, which could cause an outage or waste resources.
- Use <terminal inline bold>--current-replicas<terminal inline bold> to avoid accidents. Using the
--current-replicasflag increases safety by ensuring the scale only changes if the current count matches your expectation. Otherwise, you might unintentionally overwrite scaling changes applied by another user or the Kubernetes autoscaler.
Alternatives to kubectl scale
kubectl scale is an imperative operation that has a direct effect on your cluster. You’re instructing Kubernetes to supply a specific number of replicas as soon as possible. This is logical if you created the object with the imperative
kubectl create command, but it’s inappropriate if you originally ran kubectl apply with a declarative YAML file, as shown above. After you run the
scale command, the number of replicas in your cluster will differ from that defined in your YAML’s
spec.replicas field. It’s better practice to modify the YAML file instead, then re-apply it to your cluster.
First change the
spec.replicas field to your new desired replica count:
Now repeat the
kubectl apply command with the modified file:
Kubectl will automatically diff the changes and take action to evolve the state of your cluster towards what is declared in the file. This will result in pods being automatically created or terminated, so the number of running instances matches the
spec.replicas field again.
Another alternative to
kubectl scale is Kubernetes’ support for autoscaling. Configuring this mechanism allows Kubernetes to automatically adjust replica counts between a configured minimum and maximum based on metrics such as CPU usage and network activity.
kubectl scale command is an imperative mechanism for scaling your Kubernetes deployments, replica sets, replication controllers, and stateful sets. It targets one or more objects on each invocation and scales them so a specified number of pods are running. You can optionally set a condition, so the scale is only changed when there’s a specific number of existing replicas, avoiding unintentional resizes in the wrong direction.
You can track the number of replicas in your cluster by using a dedicated Kubernetes monitoring platform.