Big Data

France Issues Guidelines On Web Scraping And Reuse Of Publicly Available Online Data

By Luke Fitzpatrick

Posted on July 14, 2020

On March 20, the French Data Protection Authority (CNIL) published guidelines concerning the extraction of web users’ personal data from online public spaces. Along with the General Data Protection Regulation (GDPR), a law concerning data protection and privacy in the European Union (EU), the directive is designed to limit how users of web scraping tools use personal data for direct marketing campaigns.

Responding to complaints received by the CNIL, French authorities have set forth rules for companies using web scraping tools to collect personal data such as phone numbers that appear on ads displayed on consumer-to-consumer websites or in online directories.

The investigation into complaints found companies have used web-scraping tools to create databases of people to whom they sent direct-marketing communications, even persons who may have previously objected to receiving such content.

Specific complaints targeted companies creating and selling databases of individuals publishing real estate ads and companies collecting personal data from online directories for internal marketing campaigns.

Violations had occurred

CNIL confirmed a number of companies had used web scraping or data extraction software and services to collect web users’ data from online listings and directories. Other infractions against the GDPR and French Data Protection Act laws were also noted, such as failure to provide recipients details about how their data was collected and failure to obtain consent to contact.

The CNIL’s guidelines remind and stress that though individuals’ contact details have been published in online public spaces, it is still personal data and therefore falls under the domain of the GDPR. The GDPR states such data may not be reused or processed without the individuals’ knowledge and also prevents the collection of data for individuals included in opt-out lists.

New guidelines for web data scraping

Before using web-scraping tools, companies are now directed to verify the nature and origin of data that will be scraped and adhere to the site’s terms and conditions where prohibitions of extraction and reuse of data for marketing purposes are typically stated.

Additionally, users of web-scraping tools must exercise great care in avoiding the collection of irrelevant or excessive information. This is especially emphasized in cases where the information is sensitive, such as health, religion, or sexual orientation data.

Companies using web scraping tools are required to provide notice in the first communication with individuals if their information was scraped for the purpose of direct marketing.

For companies engaging the services of a web scraping service provider, they must ensure the service provider follows the CNIL measures, and service providers are required to document and report their procedures.

Companies are also required to ensure a proper data-processing agreement is in place with the service provider, as outlined in the GDPR. In some cases, a Data Protection Impact Assessment (DPIA) will be required before data processing can begin, but the CNIL guidelines suggest it is good practice to always do so.

Web data scraping used the right way

The CNIL’s new guidelines and promised future vigilance in confirming businesses are adhering to these practices, is not a reflection on the value of web scraping, rather an attempt to ensure those collecting alternative data are staying within GDPR guidelines.

Thousands, perhaps millions of companies use web data scraping, standardization, and integration for business insights, catalog building, curated news, travel pricing, and much more. Companies such as Kayak and Zillow are built upon a data foundation derived from other sources. With Kayak, data is collected from airlines, car rental agencies, and hotels, and Zillow aggregates data from real estate listings services around the world.

GDPR and the CNIL

The CNIL’s investigation and response targeted companies using web scraping tools to specifically collect personal information, such as phone numbers, email addresses, and other personally identifiable information that appeared on the web for purposes other than receiving marketing.

These guidelines are likely to have little to no impact on companies using alternative data for business insights and following GDPR law. Even sensitive information such as race or religion of a customer may not be considered protected if it is not paired with a name or other data that would allow a specific person to be identified.

The EU made great strides in protecting individual privacy with the GDPR, but the law was not designed to stand alone. CNIL’s guidelines fill gaps not contemplated when the GDPR was first enacted. Over time, other countries are likely to add guidelines relevant to their populace as well.

In the future, a compilation of regional addenda could be used to establish guidelines for an internet-connected world community.

Related Items:CNIL, data, France, GDPR, Online Data, press, web data, Web Scraping

TechBullion

Trending Stories

Whnas Advances Global Financial Services and Ecosystem Development

Dust-A-Side Highlights Winter Dust Control for Mining Operations in 2026

The Prop Firm Rule That Decides Whether You Keep the Account — and Almost Nobody Advertises It

What “Trust Signals” Actually Mean to Singapore Consumers of Online Platforms

EchoYield Reports Accelerated Multi-Chain Growth as DeFi Staking Adoption Continues to Rise

Yepbit Exchange Global Market Watch – Three Scenarios for Crypto After the July Rebound

Fire the playbook: Meet the $10M founder scaling without the usual army of hires

LiquidWhales Goes Live: The First Hyperliquid Whale Tracker That Grades Every Wallet Net of Fees — and Lets You Copy the Winners in One Click

The Growing Popularity of Lab Diamonds Malaysia and Singapore Among Modern Couples

Best Smart Ring for Sleep, Fitness, and Heart Rate Monitoring

Follow On Facebook

Latest Interview

How Prasanna Anandan Is Redesigning Reconciliation in Financial Risk Systems

Why Legacy Trade and Risk Platforms Still Sit at the Center of Global Banking

Press Release

Insignary Closes SBOM Accuracy Gap With Binary-Level Clarity for Regulatory Risk

HoneyBook Study Finds Photographers’ Biggest Challenge Is Managing Client Bookings

Pin It on Pinterest