The History of Web Scraping and What the Future Holds

A recent decision by the U.S. Ninth Circuit of Appeals has concluded as an interim decision that the collection of publicly available data does not violate the Computer Fraud and Abuse Act (CFFA), following a legal challenge from LinkedIn. However, what are the origins of this technology, what role does it play in today’s business landscape, and what will the outcome of the decision mean for the future?

The origins of web scraping

The first instance of web crawling dates back to 1993, which was an important year for this technology. In June of that year, Matthew Gray developed the World Wide Web Wanderer Offsite Link to measure the size of the Internet. Later that year, this was used to generate an index called “Wandex”, resulting in the first web search engine. Today we take this for granted, with the major search engines delivering a wealth of results almost instantly. Remarkably, before the launch of JumpStation’s web scraping technology, data collection was done by a manual administrator collecting and formatting datasets, which would hopefully be what users were looking for.

Information is power in the digital age

Nearly twenty years of collecting publicly available data is a critical foundation for many businesses across a multitude of industries. Indeed, the Internet has become the largest data resource in the world and information for businesses no longer comes only from legacy channels, such as reports and manual databases, but also from live information from the web. Public web scraping enables leaders to make more informed decisions that have a significant impact on their organizational and operational strategies as well as business results.

There are many compelling academic and commercial use cases that highlight the importance of collecting and analyzing public web data. For example, large companies use this technology to gather information on the state of markets, competitor intelligence such as prices and inventory levels, and consumer sentiment. Researchers, academics, investors, and journalists also use public web scraping in their data strategies to gain real-time insights and base their reports on credible data points. These include insight into audience sentiment and well-being, organizational team structures, growth prospects, and the competitive landscape for target audience engagement.

The challenges of web scraping

Despite the clear and extensive benefits of web scraping, LinkedIn attempted to block hiQ Labs, a data analytics company that collects publicly available data from LinkedIn profiles, from accessing its website in 2017. Its technology is used by companies to retain highly desirable employees. , as well as identify knowledge/skills gaps within the organization. The LinkedIn ban prevented hiQ Labs from operating one of its services, which was followed by a legal battle in the United States

This resulted in a court case in which a district court ruled in favor of hiQ. This sparked a series of calls in recent years, after which the case was referred to the Ninth Circuit. In April 2022, the Ninth Circuit granted hiQ’s request for a preliminary injunction, meaning LinkedIn could not stop hiQ from accessing its website. The court ruled that LinkedIn’s claims that hiQ violated laws such as the CFAA were unfounded because the data in question is publicly available.

And after?

The Ninth Circuit’s decision reaffirms the foundation on which the Internet, the largest database ever created, was built: the democratization of information for all. The ruling makes it clear that scraping publicly available data from the Internet does not violate the CFAA.

Although the final outcome of this case is not yet known and there may be more legal challenges to come, the latest decision by the US courts is a major victory for archivists, scholars, researchers, journalists and businesses that rely on web scraping for insight. can provide. The future is bright for web scraping as the amount of data online continues to explode and it can be turned into information and exploited by users around the world.

Written by Or Lenchner, CEO, Bright Data

Photo credit: Anat Pamela Sharon

Disclaimer: The information provided here does not constitute and is not intended to constitute legal advice. All information, content and materials available here are for general information purposes only. The information contained herein may not constitute the most up-to-date legal or other information.

Comments are closed.