Create-Schedule-Monitor Workflows
Intro
Apache Airflow is the workflow management system to create, schedule and monitor workflows.
- A basic unit in Airflow is DAG - collection of tasks organized in a way that reflects their defined dependencies and relationships.
- While we treat DAG as conatiner for tasks, what triggers certain actions is operator - it says what kind of action we want to perform.
- Task - instantiated operator which describes single task in workflow. Instantiating task requires providing unique task id and a DAG container.
Features
App includes following features:
Demo
Apache Airflow:
- Scheduling workflow for scraping webpage for getting currencies rates, processing it with expense data and mailing outcome.
Python:
- Preparing 3 programs:
- scraper.py
- procesor.py
- mailer.py - Defining worflow:
- scraping >> processing >> mailing
Setup
Following installation required:
- pip install apache-airflow