What is backfilling?¶
Backfilling is basically a crontabber app that receives a date to its run() function. For example:
import datetime from crontabber.base import BaseCronApp from crontabber.mixins import as_backfill_cron_app @as_backfill_cron_app class MyBackfillApp(BaseCronApp): app_name = 'my-backfill-app' def run(self, date): with open(self.app_name + '.log', 'a') as f: f.write('Date supplied: %s\n' % date)
The date parameter is a Python <datetime.datetime> instance variable with timezone information.
What crontabber guarantees is that that method will never be called with the same date value twice.
The point of all this is if the app was to fail, it will be retried automatically by crontabber and when it does that needs to know exactly what dates have been tried before.
An example explains it¶
Suppose that you have a stored procedure in a PostgreSQL database. It needs to be called exactly once every day. Internally the stored procedure is programmed to raise an exception if the same day is supplied twice. For
example it might do something like this:
CREATE OR REPLACE FUNCTION cleanup(report_date DATE) RETURNS boolean LANGUAGE plpgsql AS $$ BEGIN SELECT 1 FROM reports_clean WHERE report_date = report_date; IF FOUND THEN RAISE ERROR 'Already run for %.',report_date; RETURN FALSE; END IF; INSERT INTO reports_clean ( name, sex, dob, report_date ) SELECT name, sex, dob, report_date FROM ( SELECT TRIM(both ' ' from full_name) gender, date_of_birth::DATE FROM data_collection WHERE collection_date = report_date AND gender = 'male' OR gender = 'female' ); RETURN TRUE; END; $$;
The example is not a real-world example but it demonstrates the importance of really making sure the same date isn’t passed into the function twice. If it was, you’d have duplicates for a particular date and that would be bad.
When does the magic kick in?¶
When things go wrong. If for example, you have some network outtage or a bug in your code or something then the triggering will cause an error. That’s OK because crontabber will catch that and take note of exactly what date it tried to pass in.
Then, the next time crontabber runs it will re-attempt to execute the job app with the same date, even if the wall clock says it’s the next day. It will also know which other days it has not been able to execute and re-attempt those too.
Suppose you have a daily app that is configured to be backfillable. The app depends on presence of some external third party service which unfortunately goes offline for three days. It’s not a problem, crontabber will try and try till it works and will accordinly pass in the correct dates.
A caveat about backfillable jobs¶
Because the integrity of which apps have been passed with which dates is important, it means you can’t use crontabber to run an individual job as a “one off”. That means that if you try:
crontabber --admin.conf=crontabber.ini --job=my-backfill-app
It will deliberately ignore that since there’s a risk it then “disrupts” its predictable rythem. Otherwise it could potentially be calling the same app with the same date twice.