Repeated Task Scheduling in Python

"History repeats itself"

~ Laptop and the Lady, Laptop and the Lady, Laptop and the Lady

Introduction

In this post, we'll go over what task scheduling is. You'll see examples of scheduled tasks, and we'll look at some approaches to scripting periodic tasks. Hopefully, you'll learn something useful that you can apply to your program.

What Is Task Scheduling?

Task scheduling is the process of allocating system resources to specific tasks. Task schedulers allow you to schedule jobs and, in certain cases, track them in batches. By executing predefined task statements, task schedulers may start and control tasks automatically. Task scheduling is commonly used for jobs that repeat themselves, such as:

Auditing and logging
Notifications, such as when an incident happens or when an event occurs
Generating monthly reports

Some other tasks that you can repeat include deleting inactive accounts from an existing database, deleting temporary files, and sending reminder emails.

Scheduling Mechanisms

In scripting, there are two frequently used mechanisms for scheduling repeated tasks:

Cron-style scheduling
Interval-based execution

The cron-style scheduling is the UNIX format used to specify the time in the schedule parameter of task procedures. Cron expressions are a widely used system format, and they use special characters with specific meanings.

In cron expression, * means All. ? means Any. - represents a range of arbitrary values. , represents a list of various values. / specifies the amount by which values in a field are incremented.

For example, the cron expression 0 */20 * ? * * means that the job runs every 20 minutes in a given hour. The cron expression 0 5, 10, 55 * ? * * indicates that the job will run every hour at the 5th, 10th, and 55th minutes of that hour. The cron expression 0 44 7 * * ? 2022 means that the job runs at 7:44 AM every day in 2022.

Drawback of Cron Expressions

Cron expressions are great. But they have their shortcomings. Consider the following scenario: a task that runs every 53 minutes. In the cron configuration, 0 */53 * ? * *, this means that the task will be executed on the 53rd minute of that hour. Because of the way cron expression is designed, it will only run 24 times because there are only 24 hours in a day.

The task will be executed "once every hour" (at the 53rd minute) as opposed to "every 53 minutes". In reality, the task was scheduled to run approximately 27 times that day. Given that there are approximately 1440 minutes in a day. Fortunately, interval-based scheduling allows running tasks at fixed time intervals. There are several packages available for performing jobs with preset time intervals.

Task Scheduling in Python

In Python, we may define tasks and schedule them using a variety of approaches, such as:

Looping
Using threaded loops
Scheduling libraries

When it comes to task scheduling, it all comes down to use cases. Nevertheless, depending on the size of the task and the main objective, some options are inefficient or overly complex. Let's have a look at the first approach, which is looping.

Spoiler alert 🚨 : this is quite inefficient.

Looping

Many beginners use this method of scheduling. Here, we use the while loop. Suppose we have a task that should run every day at 7:44. We can use the while loop to set up such a task.

import time, datetime
def task() -> None:
    print("Task is running...")
while True:
    now = datetime.datetime.now() # current date and time
    if now.hour == 7 and now.minute == 44: # 7:44
        task() # run task
    else:
        print('Task stopped running')
    time.sleep(5) # wait 5 seconds

In the above code snippet, we can surmise that our code will run twice every day at 7:44 AM and 7:44 PM. While this may appear simple at first, it becomes extremely complicated if we need a certain time interval. Furthermore, scheduling tasks in this manner is highly inefficient. That's because it prevents other codes from running once Python encounters this infinite loop. Any proceeding code will never run and therefore creates a blocking code execution, which is undesirable.

Threaded Loops

Now that we've seen the blocking nature of scheduling with a simple while loop, we can use knowledge from threading to design a better scheduler. In threading, a process can contain multiple threads. A thread is an execution that shares resources such as data and memory allocation with other threads belonging to the same process.

A thread usually consists of a series of executions, a thread ID that uniquely identifies the thread, and a context switch. A process typically delegates tasks to its threads to maximize data and input/output operations.

Suppose we have the same task to schedule at 9:40 every day. We can use the threaded approach as follows:

import threading
import time, datetime

def task() -> None:
    print("Task is running at 9:40 AM and PM...")
    time.sleep(20) # wait for 20 seconds

def looped_schedule() -> None:
    while True:
        now = datetime.datetime.now()
        if now.hour == 9 and now.minute == 40:
            print("Thread is running...")
            task()

t = threading.Thread(target=looped_schedule)
t.start()
print("Main thread is done...")

In the above snippet, our program allocates a separate thread for the scheduled task using the threading.Thread function.

Using threads is good. However, there are various libraries available in Python that make scheduling repeated tasks easier.

Scheduling Libraries

The Python community has a variety of packages for task scheduling. Some popular packages are APScheduler and Schedule. Also, these schedulers may require a job store or message queue to persist at a task. We'll illustrate how to schedule tasks using both the APScheduler and Schedule packages.

Option 1: Schedule Package

This package is ideal for in-process scheduling that requires no message queue. Therefore, it's not ideal for tasks that need to persist if the task fails or times out.

To install the Schedule package, we run the following command in the terminal:

pip install schedule

In our Python file for scheduling, we add the following snippet:

import schedule, time
def delete_inactive_users() -> None:
    """Function to delete inactive users"""
    print("Task is running...")
    print("Deleting user accounts that have been inactive...")
schedule.every(30).day.at("7:44").do(delete_inactive_users) # runs this task every 30 days at 7:44 am.
while True:
    schedule.run_pending()
    time.sleep(10) # simulates other code logic

From the above code snippet, we can immediately note that the schedule package is human-readable. The line schedule.every(30).day.at("7:44").do(delete_inactive_users) runs the code logic in delete_inactive_users, which contains appropriate logic that deletes accounts from the application's database.

The Schedule package is great for basic usage. However, some applications require back-end support for storing jobs that the Schedule package doesn't have.

Option 2: APScheduler Package

Similar to the Schedule package, Advanced Python Scheduler provides a means to schedule tasks as well as support integrations with popular frameworks like asyncio and Django. It also supports back ends for job stores such as Redis and SQLAlchemy.

To install APScheduler, we run the following command in the terminal:

pip install apscheduler

In our Python file, we can add the following code snippet:

from apscheduler.schedulers.background import BackgroundScheduler
sched = BackgroundScheduler(daemon=True)

def send_daily_report() -> None:
    print("Sending daily system report to admins...")
    # business logic for sending report goes here
sched.add_job(send_daily_report, 'cron', day_of_week='mon-fri', hour=12, minute=11)
sched.start()

In the above snippet, we set up a crontab to send a daily report to the application administrators from Monday to Friday at 12:11 PM. This crontab runs in the background using the BackgroundScheduler class. We can also configure job stores and executors that handle the running of the jobs.

APScheduler is great for scheduling tasks. However, if our application requires monitoring or time and rate-limiting, APScheduler becomes too complex to work with. We need a different package that handles more advanced processing.

Some other packages for scheduling periodic tasks include Celery and Dramatiq. Celery is a distributed task queue that communicates with clients and workers. For Celery to work, it requires a message broker to send and receive tasks. On the other hand, Dramatiq is a simple background task processing library that supports chaining and code auto-reload.

Luckily, with the help of developer communities, building scripts for frequently repeating tasks is getting easier. However, other processes in task scheduling continue to receive minimal support from developer communities. These processes include logging, addressing task failures, and maintenance.

Right Now, Often, Always, and Forever

As a society, we're seeing an increase in the number of applications that favor automated processes. Automation has been ingrained in our everyday lives. Programming will continue to include scripting operations that may be performed at predetermined intervals or on a regular basis.

In this post, we covered both cron-styled and interval-based task scheduling. Although the process is inefficient, we illustrated how to establish task schedules using infinite loops. Finally, we looked at how to use threads and Python packages to schedule tasks.

I shall stroll through the park and perhaps use the hidden pin money to purchase some cookies. Laptop and this Lady bid your farewell 👩🏾‍💻