Web scraping- Interesting!

conjurerconjurer
2 min read

A cool term:
CRON = programming technique that schedules tasks automatically at specified intervals

Web what?

When researching projects etc., we usually write info from various sites- be it in a diary / excel / doc etc.
We are scraping the web and extracting data manually.

Web scraping is automating this.

flow

Example

When googling say sneakers online, it shows a list of websites with products and prices. On the shopping tab is a more detailed record right?
Google just scraped websites for you to show sneakers from different sites.
This techinque is used by almost all big companies for their businesses since data has been increasing exponentially.

Web Crawler

This is a technique that although fetches information but differs from scraping in the sense that it searches for the best websites and indexes them whereas scraping is done in a single website.

It's used for SEO analysis (scraping - gathering data).

Famous web scraping technologies:

Issues!

Notice it's not a user making requests to get the info from site, it's the code written! If the websites know this task is automated, they will quickly block the IP address.
And this check has given rise to

  1. Captchas

  2. Rate limiting

  3. Dynamic content

Goal: simulate how humans work!

Bright data automates the job. It even rotates IPs to make the user unknown and unblocks sites (paid version!) for the user.

Shoutout to JSM for the wonderful explanation.
Ps:

captcha

Lol!

0
Subscribe to my newsletter

Read articles from conjurer directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

conjurer
conjurer