Big Data

France Issues Guidelines On Web Scraping And Reuse Of Publicly Available Online Data

Web Scraping

On March 20, the French Data Protection Authority (CNIL) published guidelines concerning the extraction of web users’ personal data from online public spaces. Along with the General Data Protection Regulation (GDPR), a law concerning data protection and privacy in the European Union (EU), the directive is designed to limit how users of web scraping tools use personal data for direct marketing campaigns.

Responding to complaints received by the CNIL, French authorities have set forth rules for companies using web scraping tools to collect personal data such as phone numbers that appear on ads displayed on consumer-to-consumer websites or in online directories.

The investigation into complaints found companies have used web-scraping tools to create databases of people to whom they sent direct-marketing communications, even persons who may have previously objected to receiving such content.

Specific complaints targeted companies creating and selling databases of individuals publishing real estate ads and companies collecting personal data from online directories for internal marketing campaigns.

Violations had occurred

CNIL confirmed a number of companies had used web scraping or data extraction software and services to collect web users’ data from online listings and directories. Other infractions against the GDPR and French Data Protection Act laws were also noted, such as failure to provide recipients details about how their data was collected and failure to obtain consent to contact.

The CNIL’s guidelines remind and stress that though individuals’ contact details have been published in online public spaces, it is still personal data and therefore falls under the domain of the GDPR. The GDPR states such data may not be reused or processed without the individuals’ knowledge and also prevents the collection of data for individuals included in opt-out lists.

New guidelines for web data scraping

Before using web-scraping tools, companies are now directed to verify the nature and origin of data that will be scraped and adhere to the site’s terms and conditions where prohibitions of extraction and reuse of data for marketing purposes are typically stated.

Additionally, users of web-scraping tools must exercise great care in avoiding the collection of irrelevant or excessive information. This is especially emphasized in cases where the information is sensitive, such as health, religion, or sexual orientation data.

Companies using web scraping tools are required to provide notice in the first communication with individuals if their information was scraped for the purpose of direct marketing.

For companies engaging the services of a web scraping service provider, they must ensure the service provider follows the CNIL measures, and service providers are required to document and report their procedures.

Companies are also required to ensure a proper data-processing agreement is in place with the service provider, as outlined in the GDPR. In some cases, a Data Protection Impact Assessment (DPIA) will be required before data processing can begin, but the CNIL guidelines suggest it is good practice to always do so.

Web data scraping used the right way

The CNIL’s new guidelines and promised future vigilance in confirming businesses are adhering to these practices, is not a reflection on the value of web scraping, rather an attempt to ensure those collecting alternative data are staying within GDPR guidelines.

Thousands, perhaps millions of companies use web data scraping, standardization, and integration for business insights, catalog building, curated news, travel pricing, and much more. Companies such as Kayak and Zillow are built upon a data foundation derived from other sources. With Kayak, data is collected from airlines, car rental agencies, and hotels, and Zillow aggregates data from real estate listings services around the world.

GDPR and the CNIL 

The CNIL’s investigation and response targeted companies using web scraping tools to specifically collect personal information, such as phone numbers, email addresses, and other personally identifiable information that appeared on the web for purposes other than receiving marketing.

These guidelines are likely to have little to no impact on companies using alternative data for business insights and following GDPR law. Even sensitive information such as race or religion of a customer may not be considered protected if it is not paired with a name or other data that would allow a specific person to be identified.

The EU made great strides in protecting individual privacy with the GDPR, but the law was not designed to stand alone. CNIL’s guidelines fill gaps not contemplated when the GDPR was first enacted. Over time, other countries are likely to add guidelines relevant to their populace as well.

In the future, a compilation of regional addenda could be used to establish guidelines for an internet-connected world community.

To Top

Pin It on Pinterest

Share This