How cron might be a bottleneck in your automation
Cron is a time-based task scheduler in Linux operating systems. It runs commands and scripts at specific times, dates, and intervals.
Cron jobs are handy for automating tasks such as creating automated posts, sending emails when events occur, and even running server security checks. They do, however, have limitations. Cron jobs are less than optimal for high-frequency jobs at intervals shorter than one minute.
Additionally, it's tough to diagnose when something goes wrong with a cron job. It's not easy to automatically get alerts or examine them for errors. Plus, cron doesn’t typically scale well. Cron tasks may be difficult to speed up, so the more jobs you have, the slower your entire system becomes.
In this article we discuss cron’s automation bottlenecks and touch on Airplane as an easier-to-use and scalable alternative to cron.
How cron causes bottlenecks
Cron is one of the most well-known utilities for automating repetitive operations in Unix-like operating systems. It does, however, have its limitations.
The most common of its issues (and the ones we'll cover in this article) include: system resource depletion, large dataset challenges, overlapping jobs, infrastructure changes, insufficient error handling, and scaling issues. We go into each of these in more detail below.
System resource depletion
If we don’t properly clean up after cron jobs, we could run out of resources and slow our system.
Let's take a simple example. Say we set up a cron job to send thank you emails to customers every time they make a purchase. This seems pretty straightforward, but there could be thousands of daily transactions with thousands of thank you emails. Imagine receiving a copy of every thank-you email. This could quickly clog our mailbox.
This is a simple case but if we don't clean up our inbox frequently, the situation could start causing thank you email delivery delays.
One way to solve this kind of issue is to prevent generating cron job output over every iteration by using a command like this:
0 * * * * /command/to/run > /dev/null 2>&1 || true
This code force redirects the output (
stderr) to null and guarantees that the command always returns a 0 exit code if the command fails.
Repeatedly cleaning your log files could be helpful too. After many iterations of cron jobs, our log files may be overpopulated and need removing for the system to work correctly.
Large dataset challenges
Cron jobs can be a significant bottleneck while working with large datasets or running data science scripts.
One use of cron jobs is for data-intensive tasks such as batch processing. Batch processing can sometimes take hours to run, especially when inferring the model output. The scripts might also use large memory spaces or generate outputs after every iteration to be displayed or sent as notifications.
This situation could slow the processing or cause the cron job to fail. Unfortunately, without building out failure handling logic, we wouldn't even know our cron job has stopped working.
One option to handle this is to execute a cron job more frequently in order to reduce the amount of data it handles each run. For example, let’s say we have a large dataset that is frequently updated. Our data must first be processed, then sent to its destination. Rather than running a daily job that takes several hours and generates a large output, we can run smaller jobs hourly that go through this process incrementally.
When using cron, overlapping jobs often cause issues and operational bottlenecks.
For example, we might schedule a system update or file update to occur every two hours. The initial update may be too large and take longer than expected (longer than 2 hours). As a result, the two cron tasks now overlap since the first is still running while the second starts to update.
Because both cron jobs are running at the same time, this overlap can cause data corruption by gathering repeating data. This overlap may also prevent subsequent cron tasks from running by slowing down the system.
While it may seem obvious and there's no easy way to do this, you can prevent overlapping cron jobs by manually checking the spacing of your jobs. One helpful tactic for this is monitoring the cron job by listing the outputs with their execution times in another file. Another option for preventing overlapping jobs is block the job from executing if the previous job is still running. This article provides a good example of how to prevent overlap.
When making changes to database connections or upstream APIs, it's easy to forget about cron jobs — especially when the jobs run on a separate server. This oversight could cause cron job failures and thus system failures and bottlenecks.
When using cron, we must ensure that we update our cron jobs when making infra changes, such as moving applications from on-premise to cloud or updating legacy software to the latest technologies.
Insufficient error handling
Without error handling, overseeing tasks to ensure they are performing as expected is time consuming, inconsistent, and generally infeasible. It can also be dangerous not to know immediately when jobs have failed. Cron doesn't have built-in error-handling abilities and without this, it's difficult to monitor and fix cron tasks when they fail, resulting in delays in your system's processes that rely on these jobs.
If we don't add a deploy or notification-based output to our cron job manually, we can actually preserve system memory. However, without this, we won't know if the cron job was successful.
Additionally, we often configure cron jobs to perform activities that need little to no human intervention. For example, a developer might set a cron job to run every ten minutes throughout the night while nobody is at work. Unfortunately then, they won’t discover until the following day that the cron job had failed in the middle of the night.
One thing we can do is periodically check the cron task to verify it’s working. This may be labor-intensive and not practical at scale. One way to make this a bit more manageable is by redirecting the output from cron job to a specific file (running_script.log) that can later be reviewed to find errors, as shown in the command below:
*/30 * * * * /tmp/running_script.sh >> /tmp/running_script.log
When multiple machines wake up at the same time and try to query a backend, they are all attempting to get the task done while only one can complete it at a time. This competition continues until all tasks are resolved.
Meanwhile, all the processes are competing for resources, potentially stalling the computer.
Log rotation, which compresses log files, is the most prevalent culprit. These log files can sometimes expand to as large as 1TB. If left untouched, these can fill all the memory space in just a few days, resulting in a slower system or an inability to open files. It can also present problems for programs that need to consume log data to process and output it in a usable format.
Consider two scenarios: one where a cron job runs a script to send the user automated emails, and the other where the email must be sent depending on user inquiries. The first may be for customers to get scheduled email notifications about product updates and offers, while the second could be for users to receive automatic email answers to their inquiries.
The first may not necessitate any monitoring because no action is needed for it to take place. However, in the case of the second, if there’s a system outage or a technical problem, the number of inquiries might quickly surpass hundreds per minute. Each inquiry will necessitate a separate set of actions, such as verifying the user's credentials and responding to their questions. If the system is backed-up, this may cause delays in the separate actions and thus result in cron job overlaps.
Since overlapping jobs can become exceptionally problematic at scale, we recommend setting up jobs so that they don't overlap and including blocking logic (block subsequent jobs from starting when previous jobs are still running). Ensuring a cron job only executes after the previous job completes will help with bottlenecks at scale. For a more detailed explanation of how cron jobs can overlap, and how to prevent this from occurring, you can take a look at this article.
Replacing cron with Airplane scheduled tasks
Although cron is one of the most popular ways for developers to perform jobs regularly, it has limitations and can be challenging to use as we've discussed. Even the most seasoned administrator will spend a significant amount of time monitoring job status and modifying jobs while using cron. Cron tasks are also tough to debug, don't scale effectively, and don’t offer error handling or alerts.
Airplane is a platform that lets engineers transform APIs, SQL queries, and scripts into sharable applications in minutes. Airplane supports schedules which are serverless and easy-to-use. Airplane also provides first-class support for permissions and audit logs, helping to avoid many of the bottlenecks caused by cron.
In addition to running scheduled tasks, you can use Airplane to automate other operational workflows at your company. You may have traditionally run scripts or SQL queries on behalf of your non-technical teammates. Airplane transforms these activities into self-service solutions for support, operations, and other teams.
Getting set up with Airplane takes minutes; sign up for a free Airplane account to try it out today.
Subscribe to new blog posts from Airplane: