One of the most common questions about web scraping: Is Web Scraping Illegal?

The short answer:


The longer answer:

Of course there are rules and regulations regarding unauthorized or unethical scraping. 'Unethical' scraping may include running a scraper at too high a frequency such that it impacts the performance of the website you are collecting data from. 'Unauthorized' scraping refers to a breach of a websites terms of service. Notably, often breaching a websites terms of service does not render your scraper 'illegal' - a websites terms of service is different to the law.

And of course who is there to enforce the rules of the internet? Does a website hosted in India have the same regulations as a website hosted in the USA? Practically, it is almost impossible to enforce the internet. That is of course not to say that you shouldn't consider the ethical and legal implications of creating a web scraper. Often some common sense is required - for example, how will your scraper impact the target website? If it will run at too high a frequency, will you impact the target sites performance? Etc.

And - many legitimate businesses use web scraping as to power their main business model.

For example - Google scrapes data from various websites to answer your query without you having to follow one of their links:

Google web scraping

DuckDuckGo also scrapes data, in this case from Wikipedia:

DuckDuckGo web scraping


Despite potentially stealing traffic from the websites they scrape - this is a perfectly legal example of a huge company scraping data for their own benefit.