Hi guys,
I want to create the following 2 workflows in Airflow:
The first workflow should be triggered every five minutes, consume new kafka messages and upload it to S3 to the daily partition.
The second workflow should be triggered once an hour and load all new files (could be various number of files in various partitions) to Redshift.
A few questions:
1. Is it a good Idea to move the consumption workflow to Airflow? I saw
here that Airbnb is using Secor.
2. I need some solution to link between the first workflow runs output and the second workflow. Meaning I need to know what are the new files I haven't yet uploaded to Redshift.
Do you think Airflow could contribute here? How would you implement a solution for it?
Thanks,
Or.