Web scraping

How to gather all the data you want while staying under the radar?

Issue:

My client wanted to selectively scrape multiple websites by category and utilize the same sorting mechanisms available on the website. They also wanted the ability to filter results based on dealer and listing type at the time of scraping.

Solution:

The finished product is a Dashboard the client can use to control what and how much to scrape as well as filter and sort those results.

The dashboard utilizes the Scrapingbee service (https://www.scrapingbee.com/) to avoid detection. The program utilizes asynchronous execution in order to concurrently scrape listings and download photos. This concurrency allows the program to scrape thousands of products and photos relatively quickly.

Details:

  1. Study and understand the nature of the website in order to exploit it and get data most efficiently.

  2. Asynchronous execution for concurrency of web requests.

  3. Multiprocessing to speed up video downloads.

  4. Plotly Dash dashboard with unlimited configurability.

Dashboard to control scraper

Page, category, and products to scrape

Final, organized output

Download images and videos as well