Gouvernement
PEReN – Center of expertise for digital platform regulation
Data science expertise at the service of digital regulation
The breakthrough of generative AI has brought to light new issues surrounding the automated collection of content from websites, particularly press websites. At the request of the Ministry of Culture, PEReN is publishing an inventory of opt-out protocols, or reservation of rights, which was carried out in 2024 for its Directorate-General for Media and Cultural Industries (DGMIC).
The proliferation of generative artificial intelligence systems raises issues in terms of innovation and the rights of press publishers, whose content is widely used by these systems, in a relationship that does not always enjoy consensus among these players. While these systems can eploit all content accessible online, European regulations allow rights holders to exercise their right of reservation - or opt-out, in other words to oppose the use of their content.
As part of its 2024 work programme, PEReN conducted an inventory of these opt-out protocols at the request of the Ministry of Culture, which has now requested this publication. It provides an overview of
There is a minimal technical framework that allows site publishers to declare which parts of their publications can be protected from data collection robots using opt-out protocols such as robots.txt. The latter, by far the most widely used, has a number of limitations: on the one hand, it is not always appropriate (lack of granularity or simply poor configuration by some sites); on the other hand, it is based on a system of trust, in which the robot declares itself and must then respect the instructions given by the site it is visiting, without the latter always being able to verify this objectively.
These limitations can erode trust between certain players, whereas the proper appropriation of these reservation technologies on the one hand, and compliance with collection instructions on the other, could provide all players with greater transparency and objectivity of the issues at stake, as a prerequisite to finding a balance between data and content enhancement and innovation in an Internet that remains open.