Create-Schedule-Monitor Workflows

Intro

Apache Airflow is the workflow management system to create, schedule and monitor workflows.

  • A basic unit in Airflow is DAG - collection of tasks organized in a way that reflects their defined dependencies and relationships.
  • While we treat DAG as conatiner for tasks, what triggers certain actions is operator - it says what kind of action we want to perform.
  • Task - instantiated operator which describes single task in workflow. Instantiating task requires providing unique task id and a DAG container.

Features

App includes following features:

  • Apache Airflow

Demo

Apache Airflow:

  • Scheduling workflow for scraping webpage for getting currencies rates, processing it with expense data and mailing outcome.

Python:

  • Preparing 3 programs:
    - scraper.py
    - procesor.py
    - mailer.py
  • Defining worflow:
    - scraping >> processing >> mailing

Setup

Following installation required:

  • pip install apache-airflow

Source Code

You can view the source code: HERE