Asynchronous Selenium Tasking
Intro
Running time-consuming tasks asynchronously means running them at the same time in parallel. Long-time processing tasks being exeuted at once save a lot of time as opposed to launching them one by one (next starts once previous finishes).
Features
App includes following features:
Demo
Workflow:
- Running Task broker - Redis.
- Running Celery server in terminal.
- celery server is on stand-by listening to python
- Running Python in terminal.
- importing main function we want to run several times asynchronously,
- defining phrases in the p_list - list length equals times main function will being executed asynchronously,
- sending function to celery task queue by executing .delay() on function name. - Celery executes tasks in parallel giving status.
Outcome:
- Outcome comes with csv files that contain searching job descriptions and most ofthen appearing words.
- The files can surve as a fast job offers review.
- that way we can see what are current requirements for a specific job posistions,
- we can see what are the most frequent key words: azure, experience, python etc...
Conclusions:
- With celery we could make web scrapping process much faster.
- For 3 items: 'Big Data', 'Data Analyst', 'Data Engineer' we could run the same script 3 times at the same time.
- While celery executes the tasks, I could see in the backgrund 3 working chrome browsers that accepts commands from Selenium.
- Web scraping is time-consuming process so is a perfect case for performing asynchronous tasks queue.
- The ouctome saves itself as convenient to view or analyse csv files.
Setup
Python libraries installation required
- pip install celery
- pip install selenium
- pip install pandas
Redis software installation required.