Airplane has been acquired by Airtable. Learn more →
How to use Kubectl Scale

How to use Kubectl Scale

May 30, 2022
7 min read

The kubectl scale command is used to immediately scale your application by adjusting the number of running containers. This is the quickest and easiest way to increase a deployment’s replica count, and it can be used to react to spikes in demand or prolonged quiet periods.

In this article, you’ll see how to use kubectl scale to scale a simple deployment. You’ll also learn about the options you can use when you need a more sophisticated change. Finally, you’ll look at the best practices for running kubectl scale, as well as at some alternative methods for adjusting Kubernetes replica counts.

Get started with Airplane

Deploy your existing scripts in less than 60 seconds and never worry about scaling again.

Get Started

Kubectl Scale use cases

The kubectl scale command is used to change the number of running replicas inside Kubernetes deployment, replica set, replication controller, and stateful set objects. When you increase the replica count, Kubernetes will start new pods to scale up your service. Lowering the replica count will cause Kubernetes to gracefully terminate some pods, freeing up cluster resources.

You can run kubectl scale to manually adjust your application’s replica count in response to changing service capacity requirements. Increased traffic loads can be handled by increasing the replica count, providing more application instances to serve user traffic. When the surge subsides, the number of replicas can be reduced. This helps keep your costs low by avoiding utilization of unneeded resources.

Using Kubectl

The most basic usage of kubectl scale looks like this:


Executing this command will adjust the deployment called demo-deployment so it has three running replicas. You can target a different kind of resource by substituting its name instead of deployment:


Basic scaling

Now we’ll look at a complete example of using kubectl scale to scale a deployment. Here’s a YAML file defining a simple deployment:


Save this YAML to demo-deployment.yaml in your working directory. Next, use kubectl to add the deployment to your cluster:


Now run the get pods command to view the pods that have been created for the deployment:

demo-deployment-86897ddbb-jl6r6 1/1Running033s

Only one pod is running. This is expected, as the deployment’s manifest declares one replica in its spec.replicas field.

A single replica isn’t sufficient for a production application. You could experience downtime if the node hosting the pod goes offline for any reason. Use kubectl scale to increase the replica count to provide more headroom:


Repeat the get pods command to confirm that the deployment has been scaled successfully:


There are now five pods running for the demo-deployment deployment. You can see from the AGE column that the scale command retained the original pod and added four new ones.

After further consideration, you might decide five replicas are unnecessary for this application. It’s only running a static NGINX web server, so resource consumption per user request should be low. Use the scale command again to lower the replica count and avoid wasting cluster capacity:


Repeat the get pods command:


Kubernetes has marked two of the running pods for termination. This will reduce the running replica count down to the requested three pods. The pods selected for eviction are sent a SIGTERM signal and allowed to gracefully terminate. They’ll be removed from the pod list once they’ve stopped.

Conditional scaling

Sometimes you might want to scale a resource, but only if there’s a specific number of replicas already running. This avoids unintentional overwrites of previous scaling changes, such as those made by other users in your cluster.

Include the --current-replicas flag in the command to use this behavior:


This example scales the demo-deployment deployment to five replicas, but only if there’s currently three replicas running. The --current-replicas value is always matched exactly; you can’t express a condition as “less than” or “greater than” a particular count.

Scaling multiple resources

The kubectl scale command can scale several resources at once when you supply more than one name as arguments. Each of the resources will be scaled to the same replica count set by the --replicas flag.


This command scales the app and database deployments to five replicas each.

You can scale every resource of a particular type by supplying the --all flag, such as this example to scale all the deployments in your default namespace:


This selects every matching resource inside the currently active namespace. The objects that were scaled are shown in the command’s output.

You can obtain granular control over the objects that are scaled with the --selector flag. This lets you use standard selection syntax to filter objects based on their labels. Here’s an example that scales all the deployments with an app-name=demo-app label:


Changing the timeout

The --timeout flag sets the time Kubectl will wait before it gives up on a scale operation. By default, there’s no waiting period. The flag accepts time values in human-readable format, such as 5m or 1h:


This lets you avoid lengthy terminal hangs if a scaling change can’t be immediately fulfilled. Although kubectl scale is an imperative command, changes to scaling can sometimes take several minutes to complete while new pods are scheduled to nodes.

Best practices

Using kubectl scale is generally the fastest and most reliable way to scale your workloads. However, there are some best practices to remember for safe operations. Here are a few tips.

  • Avoid scaling too often. Changes to replica counts should be in response to specific events, such as congestion that’s causing requests to run slowly or be dropped. It’s best to analyze your current service capacity, estimate the capacity needed to satisfactorily handle all the traffic, then add an extra buffer on top to anticipate any future growth. Avoid scaling your application too often, as each operation can cause delays while pods are scheduled and terminated.
  • Scaling down to zero will stop your application. You can run kubectl scale --replicas=0, which will remove all the containers across the selected objects. You can scale back up again by repeating the command with a positive value.
  • Make sure you’ve selected the correct objects. There’s no confirmation prompt, so be sure to pay attention to the objects you’re selecting. Manually selecting objects by name is the safest approach, and prevents you from accidentally scaling other parts of your application, which could cause an outage or waste resources.
  • Use <terminal inline bold>--current-replicas<terminal inline bold> to avoid accidents. Using the --current-replicas flag increases safety by ensuring the scale only changes if the current count matches your expectation. Otherwise, you might unintentionally overwrite scaling changes applied by another user or the Kubernetes autoscaler.

Alternatives to kubectl scale

Running kubectl scale is an imperative operation that has a direct effect on your cluster. You’re instructing Kubernetes to supply a specific number of replicas as soon as possible. This is logical if you created the object with the imperative kubectl create command, but it’s inappropriate if you originally ran kubectl apply with a declarative YAML file, as shown above. After you run the scale command, the number of replicas in your cluster will differ from that defined in your YAML’s spec.replicas field. It’s better practice to modify the YAML file instead, then re-apply it to your cluster.

First change the spec.replicas field to your new desired replica count:


Now repeat the kubectl apply command with the modified file:


Kubectl will automatically diff the changes and take action to evolve the state of your cluster towards what is declared in the file. This will result in pods being automatically created or terminated, so the number of running instances matches the spec.replicas field again.

Another alternative to kubectl scale is Kubernetes’ support for autoscaling. Configuring this mechanism allows Kubernetes to automatically adjust replica counts between a configured minimum and maximum based on metrics such as CPU usage and network activity.

Final thoughts

The kubectl scale command is an imperative mechanism for scaling your Kubernetes deployments, replica sets, replication controllers, and stateful sets. It targets one or more objects on each invocation and scales them so a specified number of pods are running. You can optionally set a condition, so the scale is only changed when there’s a specific number of existing replicas, avoiding unintentional resizes in the wrong direction.

You can track the number of replicas in your cluster by using a dedicated Kubernetes monitoring platform. However, if you’re looking to move away from managing your own infrastructure and would prefer a serverless solution that handles scaling for you, consider Airplane.

With Airplane, you get a robust, performant maintenance-free platform that can run all the sorts of tools you might run with Kubernetes. Shell scripts, Python functions, Javascript files, REST API calls, and SQL queries can all be turned into Tasks on Airplane, allowing you to build tooling for your team quickly and easily. If you need a frontend, Airplane comes with a UI framework called Views that helps even non-technical stakeholders understand and use your team’s internal tools.

To start building internal tools that work for your whole team without the complexities of managing your own infrastructure, sign up for a free Airplane account or book a demo.

Share this article:
James Walker
James Walker is the founder of Heron Web, a UK-based digital agency providing bespoke software development services to SMEs.

Subscribe to new blog posts from Airplane.