What is backfilling?¶
Backfilling is basically a
crontabber app that receives a date to its
run() function. For example:
import datetime from crontabber.base import BaseCronApp from crontabber.mixins import as_backfill_cron_app @as_backfill_cron_app class MyBackfillApp(BaseCronApp): app_name = 'my-backfill-app' def run(self, date): with open(self.app_name + '.log', 'a') as f: f.write('Date supplied: %s\n' % date)
date parameter is a Python
<datetime.datetime> instance variable
with timezone information.
crontabber guarantees is that that method will never be called
with the same
date value twice.
The point of all this is if the app was to fail, it will be retried
crontabber and when it does that needs to know exactly
what dates have been tried before.
An example explains it¶
Suppose that you have a stored procedure in a PostgreSQL database. It needs to be called exactly once every day. Internally the stored procedure is programmed to raise an exception if the same day is supplied twice. For
example it might do something like this:
CREATE OR REPLACE FUNCTION cleanup(report_date DATE) RETURNS boolean LANGUAGE plpgsql AS $$ BEGIN SELECT 1 FROM reports_clean WHERE report_date = report_date; IF FOUND THEN RAISE ERROR 'Already run for %.',report_date; RETURN FALSE; END IF; INSERT INTO reports_clean ( name, sex, dob, report_date ) SELECT name, sex, dob, report_date FROM ( SELECT TRIM(both ' ' from full_name) gender, date_of_birth::DATE FROM data_collection WHERE collection_date = report_date AND gender = 'male' OR gender = 'female' ); RETURN TRUE; END; $$;
The example is not a real-world example but it demonstrates the importance of really making sure the same date isn’t passed into the function twice. If it was, you’d have duplicates for a particular date and that would be bad.
When does the magic kick in?¶
When things go wrong. If for example, you have some network outtage or a
bug in your code or something then the triggering will cause an error.
That’s OK because
crontabber will catch that and take note of exactly
what date it tried to pass in.
Then, the next time
crontabber runs it will re-attempt to execute the
job app with the same date, even if the wall clock says it’s the next day.
It will also know which other days it has not been able to execute and
re-attempt those too.
Suppose you have a daily app that is configured to be backfillable. The app
depends on presence of some external third party service which
unfortunately goes offline for three days. It’s not a problem,
will try and try till it works and will accordinly pass in the correct dates.
A caveat about backfillable jobs¶
Because the integrity of which apps have been passed with which dates is
important, it means you can’t use
crontabber to run an individual job as
a “one off”. That means that if you try:
crontabber --admin.conf=crontabber.ini --job=my-backfill-app
It will deliberately ignore that since there’s a risk it then “disrupts” its predictable rythem. Otherwise it could potentially be calling the same app with the same date twice.