Anyone who has used cron to schedule tasks in Linux knows that diagnosing and solving errors can be quite difficult. Cron jobs can fail for many reasons, and cron is not known for its error-handling prowess.
This article aims to help with some of the difficulties faced when diagnosing cron errors. We'll walk through the most common reasons why your cron job isn't running: schedule errors, environmental changes, depleted resources, and overlapping jobs. We’ll also share some additional troubleshooting tips along with code samples to help get your cron jobs back up and running.
If you're looking for an alternative to cron for scheduled jobs, we introduce Airplane at the end. Airplane is a developer platform for quickly building internal tools that supports serverless, maintenance-free schedules.
Four reasons your cron job isn’t running
1. Schedule errors
If your cron job isn’t operating as expected, first examine the job’s schedule expressions.
Writing schedule expressions can be tricky. For example, the regular cron expression contains:
<minute> <hour> <day-of-month> <month> <day-of-week> <command>
A single character can make a big difference in an expression’s functionality. For example, replacing
/ in the expression below drastically alters how it works.
Expression 1 : 0-10 14 * * ?
Expression 2 : 0/10 14 * * ?
Expression 1 runs the cron job every minute, starting at 2:00 PM and continuing until 2:10 PM every day. Expression 2 runs the cron job every 10 minutes every day, starting at 2:00 PM and finishing at 2:50 PM.
To address potential scheduling errors, you'll want to first locate your scheduled job, then make any necessary changes.
Find your job
You can find a job by typing
crontab -l to display the current user’s crontab.
Adding a crontab file to
/etc/cron.d/ or /etc/crontab is a standard way to generate jobs so be sure to check these locations.
If you don’t see the cron job you’re looking for, another user probably created it. You can run crontab
-u username -l to help confirm.
If you still can’t find the job, you should check your permissions. Another user might have created the cron job and you may not be authorized to execute it. Note that the root must own jobs added as files in a
/etc/cron.*/ directory. That means that if you don’t have root permissions, you might not be able to access the job.
2. Environmental incompatibilities
You might discover that your job works when run via the command line, but won’t run via cron.
This may be due to the location you’re executing from or the PATH you’re using. The use of relative paths is a common cause of cron issues. In cron, the PATH is set by default to
/bin:/usr/bin. Furthermore, only
/bin will be searched for executables.
By default cron jobs are executed from the user's home directory. This means that if you use relative paths like
../documents/script.sh, cron won’t know where to look for the file.
If you really want to use a file from a location other than your home directory, you should use an absolute path. Absolute paths include the full file path. For example:
1 * * * * /path/to/script.sh
Alternatively, you can define your own PATH and use it in the schedule command as shown below.
It’s also possible that your job includes qualities that are incompatible with cron, such as environment variables and advanced bash features. Errors can happen when using environment variables since cron doesn’t load
.bashrc and related files. Similarly, sophisticated bash capabilities can cause difficulties because by default, cron uses
3. Depleted resources
Cron tasks may fail due to resource depletion. These tasks may run out of disk space or a lack of available space may cause an operating system to be unable to start new threads.
There are various ways to check how much disk space is used and what it's used by.
For example, you can use the
df stands for disk-free and shows the available and used disk spaces in Linux. Here are some ways you can use the df command:
df -h— human-readable format of available disk space
df -a— entire disk space used including the available space as 0
df -T— block filesystem type and disk use, for example, xfs, ext2, ext3, and others
You can also use the
du command to view the disk use in kilobytes. Here are some ways you can use the
du -h— human-readable format of all directories and subdirectories
du -a— disk use for all files
du -s— disk space by a specific directory
When you need to see disk size along with disk partitioning information, you can also use
fdisk -l to show disk size.
Most of the above commands, particularly the
df command, only display disk space for mounted filesystems. You might have multiple operating systems sharing the same disk space or have multiple disks.
In this case, you must first mount the discs before determining the used space. To do so, you can use
sudo mount /dev/sdb1 /mnt.
The most common file systems you’ll encounter while checking for disk use are
tmpfs(temporary files) include virtual memory-based temporary files.
udevstores information on plugged-in devices like USB, network cards, CD ROMs, and external keyboards.
/dev/loops(or loop devices) are virtual devices to access regular files like block devices. You don’t need to count their space use individually because they’re beneath the root.
If you prefer a GUI over the command line, you can get a graphical view using gnome-disk-utility in GNOME desktops.
4. Overlapping jobs
When a second task begins to run while the first is still running, they overlap. This overlap might cause your cron job to slow down or even crash.
Common measures to take include using a locking mechanism, wrapper scripts, or a concurrency policy to prevent multiple copies of the same cron job from running. When implementing this method, it's good to save past run times. This way you can identify, for example, if a job that typically takes 30 seconds took more than 2 minutes to finish or if the system is slowing down.
Say you want to initiate a system backup every 30 minutes using a script called
script1.sh. For this backup to occur automatically every 30 minutes, you can schedule cron like so:
*/30 * * * * /tmp/script1.sh >> /tmp/script1.log
script1.sh results in
script1.log files, making it easy for you to follow — and easy to troubleshoot if the cron job fails.
This is great in theory, but what if the backup requires more than 30 minutes to run? This could result in two backup creations at the same time, which not only slows the system but also includes bulky, repetitive data.
To prevent this, you can use
pgrep to check if your script is running and determine the execution time. Using
pgrep --list-full bash| grep '/tmp/script1.sh', you can detect every script invocation related to
script1.sh, including the present one. If you don’t want to see the current script invocation, you can use
grep -v "$$ " to remove any lines beginning with the current PID and the current invocation:
pgrep -a bash | grep -v "^$$ " | grep --quiet '/tmp/script1.sh'
In this code,
--quiet is what prevents the output from showing up in the log files.
Then, using the code below, you can see the execution time and halt the job if it overlaps with the previous job:
The output should look something like this:
Checking to see if/when jobs are overlapping can help you determine if you should increase the time between the execution of the two cron jobs.
Other troubleshooting tips
The four examples above are some of most common reasons your cron job may not run as expected. While these are the most common, there are other possible reasons for cron failure. We outline a couple of other ways to continue to troubleshoot cron below.
Check system logs
When cron tries to execute a command, it records the result in syslog. To check your logs for errors, you’ll need root or sudo access to search for it at
/var/log/syslog using this command:
grep filename.sh /var/log/syslog
If you can't locate your command in the syslog using grep, someone may have deleted the log or moved it to another place.
If a command doesn’t show in syslog after two minutes, the issue might be with the underlying cron daemon, crond.
To help locate the command, you can search for cron-related content directly in the Syslog using the following:
grep CRON /var/log/syslog
Check system resources
Sometimes depleted system resources can cause an error. Suppose your cron job requires you to send an email notification after completing every job. This could fill the available space quickly and slow the system at each iteration.
To help prevent that from happening, you can check if enough system resources are available to run the job you have set up and make adjustments accordingly such as make more memory available by adding more RAM.
Check for infrastructure changes
It’s easy to overlook the cron jobs on each server while making changes to your app configuration and code. These changes can cause cron jobs to fail. This is why it's worth checking if an infrastructure change may be causing the failure, especially if infra changes were made recently.
These changes could include server updates or migrations, or upgrading or changing routers/devices. These changes could also include things like software updates, internet regulation changes, and firewall or security changes. For example, if a cron job relies on a file path that changes due to data migration, the job will fail. Similarly, a job can fail if it relies on software functionality that is altered or removed with the release of a new version. New security measures might change the level of access required to successfully run a job.
Consider transaction size
Large data sets might cause queries to timeout or file transfers to become excessively slow, resulting in cron job failure.
Although there is no universal transaction size above which your cron jobs are likely to fail, you should check your data set’s size to determine if that might be the cause of your issue. The exact upper limit for transactions handled by your cron job will vary based on your choice of database and architecture, the specific workflow you have in place, and the capability of your system. However, you can typically consider data transactions involving more than 100,000 records to be too large.
A common indication of an overly large dataset is an “out of memory” error, which looks something like this:
Fatal error: Allowed memory size of 262349807 bytes exhausted (tried to allocate 4723456 bytes)...
It’s common to see this error when a job receiving a large response to a query is scheduled to run concurrently with smaller, more frequent jobs. For example, this error might occur if you’re working on a dataset of 200,000 users and try to pull the info for the most recent purchases, which could be 10,000 purchases, simultaneously.
It’s common to handle large queries and other data transactions by breaking the job into smaller fragments. You may be able to use temp tables, offsets, or more complex queries to process the dataset in chunks. If you are able to handle the added overhead, you can also use stored procedures to return subsets of the results you need.
However, although these options create more scalable solutions, they may not always solve query timeouts or slowdowns without creating additional management challenges. Smaller jobs running in parallel can still strain your available memory, so you may need to serialize the workload by rescheduling your jobs to work around the allotted time intervals between them.
Test for code bugs
Sometimes, the failure may not be cron-related. It's challenging to test cron tasks thoroughly in a development environment, and an issue might only exist in production code. Another troubleshooting tip is to check your code for relevant bugs. For example, maybe you made a recent change to the code base which fails to work as expected or there are other production-related code issues that would also cause issues with your jobs.
Airplane for job scheduling
While cron is one of the most popular methods for scheduling jobs, it may sometimes be difficult to maintain and challenging to troubleshoot. We've explored how to diagnose and handle some of the top reasons why cron may not be running, but even experienced developers may find tracking down and solving cron issues time-consuming.
Airplane is a developer platform that lets you turn code into internal apps quickly, supporting quick and easy job scheduling. Using Airplane, you can turn APIs, SQL queries, and scripts into tasks and run those tasks manually or configure them to run on schedules seamlessly. Airplane also comes with permissions, audit logs, and approval flows so you can manage your operations safely.
Check out the Airplane blog for more cron how-to guides such as: How to create Golang cron jobs, How to run cron in containers, and How to start, stop, and restart cron jobs. If you think Airplane could be useful, say hi at [email protected] or sign up for a free account.