Scheduling with GitHub Actions & cron¶
Intermediate
Let’s assume you have already imported a Data Source to your Tinybird account and that you have properly defined its schema and its partition key. Once everything is set, you can easily perform some operations seamlessly using the Data Sources API to periodically append to or replace data in your Data Sources. In this guide, you will learn some examples.
Using crontab¶
Crontab is a native UNIX tool that schedules commands execution at a specified time or time interval. It works by defining in a text file the schedule, and the command to execute, which can usually done with sudo crontab -e
. You can learn more about using crontab in many places on the internet.
The cron table format¶
This is the cron table format, but you can use external tools that help you define the cron jobs schedules:
These would be typical cron schedules to execute a command:
Every five minutes:
0/5 \* \* \* \*
Every day at midnight:
0 0 \* \* \*
Every first day of month:
\* \* 1 \* \*
Every Sunday at midnight:
0 0 \* \* 0
Appending Data periodically¶
It’s very common to have a Data Source that grows over time such. In this case, very often there is an ETL process extracting this data from the transactional database and generating CSV files with the last X hours or days of data, therefore you might want to append those recently generated rows to your Tinybird Data Source. Imagine you generate new CSV files every day at 00:00 that you want to append to Tinybird everyday at 00:10.
With a shell script¶
You would first need to create a shell script file containing the Tinybird API request operation:
and then just add a new line to your crontab file
Basics about crontab:
Type sudo crontab -e
in your terminal to start adding your cronjobs.
Using GitHub actions¶
If your project is hosted on GitHub, you can also use GitHub actions to schedule periodic jobs. Create a new file called .github/workflows/append.yml
with this code to append data from a CSV given its URL every day at 00:10
Replacing Data periodically¶
Think again about your events Data Source, but now imagine a scenario where you want to replace the whole Data Source with a CSV file sitting in a publicly accessible URL every first day of the month.
With a shell script¶
and then edit the crontab file which will take care of executing your script periodically. This can be done by typing sudo crontab -e
in your terminal
Be sure you save your scripts in the right location:
To be sure it works, save your shell scripts in the /opt/cronjobs/
folder.
With GitHub actions¶
Create a new file called .github/workflows/replace.yml
with this code to replace all your data with given the URL of the CSV with the new data every day at 00:10
Replacing just one month of data¶
Having your API call inside a shell script allows you to script more complex ingestion processes, for example, imagine that you want to replace the last month of events data, every day. Then each day, you would export a CSV file to a publicly accessible URL and name it something like events_YYYY-MM-DD.csv
.
With a shell script¶
For doing so, you could script a process that would do a conditional data replacement as follows:
Then, after saving that file to /opt/cronjobs/daily_replace.sh
, you should add the following line to crontab
to run it every day at midnight
With GitHub actions¶
Create a new file called .github/workflows/replace_last_month.yml
with this code to replace all the data for the last month every day at 00:10.
Use GitHub secrets:
Store TOKEN
as an encrypted secret to avoid hardcoding secret keys in your repositories, and replace DATASOURCE
, CSV_URL
by their values or save them as secrets as well.