Asynchronous Selenium Tasking

Intro

Running time-consuming tasks asynchronously means running them at the same time in parallel. Long-time processing tasks being exeuted at once save a lot of time as opposed to launching them one by one (next starts once previous finishes).

Features

App includes following features:

  • Selenium
  • Pandas
  • Celery
  • Redis

Demo

Workflow:

  1. Running Task broker - Redis.
  2. Running Celery server in terminal.
    - celery server is on stand-by listening to python
  3. Running Python in terminal.
    - importing main function we want to run several times asynchronously,
    - defining phrases in the p_list - list length equals times main function will being executed asynchronously,
    - sending function to celery task queue by executing .delay() on function name.
  4. Celery executes tasks in parallel giving status.

Outcome:

  • Outcome comes with csv files that contain searching job descriptions and most ofthen appearing words.
  • The files can surve as a fast job offers review.
    - that way we can see what are current requirements for a specific job posistions,
    - we can see what are the most frequent key words: azure, experience, python etc...

Conclusions:

  • With celery we could make web scrapping process much faster.
  • For 3 items: 'Big Data', 'Data Analyst', 'Data Engineer' we could run the same script 3 times at the same time.
  • While celery executes the tasks, I could see in the backgrund 3 working chrome browsers that accepts commands from Selenium.
  • Web scraping is time-consuming process so is a perfect case for performing asynchronous tasks queue.
  • The ouctome saves itself as convenient to view or analyse csv files.

Setup

Python libraries installation required

  • pip install celery
  • pip install selenium
  • pip install pandas

Redis software installation required.

Source Code

You can view the source code: HERE