Gouvernement
PEReN – Center of expertise for digital platform regulation
Data science expertise at the service of digital regulation
It is within this framework of intervention that the PEReN develops its work program in close consultation with the partner administrations and organizes its activity into projects, according to an agile functioning.
In 2024, PEReN confirmed the key areas identified in previous years, with an ever-increasing emphasis on playing a coordinating and coordinating role:With projects on a growing scale and building on its previous achievements, PEReN has contributed to many topical issues for public authorities: protection of minors and verification of age online, interoperability of online services, detection of artificial content on social networks, development of AI evaluation methodologies, studies related to online advertising, operation of mobile versions of platforms, etc.
Technical support for European Commission services in the development of tools for segmenting and classifying videos from social networks and conducting statistical tests.
In collaboration with Arcom, and as part of the implementation of Article 40 (access to platform data for research) of the Digital Services Act, configuration of an aggregator of resources, data, and tools useful to the academic community, civil society, and public authorities.
Technical support for Arcom in identifying players falling within the scope of the Digital Services Act (hosting services including online platforms, simple transport services, caching services, search engines) and likely to fall under the jurisdiction of France.
On behalf of the DGCCRF, conducting an assessment of the automated usability of advertising registers on major internet platforms (Art. 39 of the DSA) for the detection of fraudulent advertising.
Technical support for the services of the Defender of Rights via its European network in the field of non-discrimination, Equinet, which is represented on the CEN/CLC JTC 21 technical standardization committee, concerning the standardization process for the implementation of the European regulation on artificial intelligence, particularly with regard to the aspects of transparency and human supervision of so-called high-risk AI systems.
Work related to that carried out by the Higher Council for Literary and Artistic Property on the implementation of the AI Regulation, and in particular Article 53 requiring AI providers to provide a “sufficiently detailed summary” for the public, in accordance with a template provided by the AI Office.
In anticipation of the Data Act, the law aimed at securing and regulating the digital space (SREN law) has entrusted Arcep with the authority to specify rules for interoperability, portability, and openness of interfaces for cloud services. PEReN has conducted concrete tests of the migration of virtual machines, a simple subset of cloud services, between different service providers.
As part of the 2023 edition of the ARPE report on activity indicators for workers on delivery and ride-hailing platforms, contribution to data analysis and publication of the corresponding computer code. (open source code).
At the request of the CNIL, development of a tool to verify the adoption of the Topics tool from the Privacy Sandbox by a selection of websites and initial measures.
Technical support for the work of the Digital Economy Department of the DGE on the attention economy, which also led to the publication of an “Shedding light on…” article explaining the technical issues raised by the hypothesis of unbundling different social media features. (study available here).
On behalf of ANSSI and in conjunction with the DGE, analysis of the behavior of xPay systems present on smartphones, as examples of electronic banking wallets that could provide useful information for preparing for the implementation of the eIDASv2 regulation.
Supporting the General Secretariat for European Affairs in discussions on the draft European regulation and analysis of client-side scanning (CSS) technologies for detecting child pornography content.
At the request of the Directorate-General for Media and Cultural Industries (DGMIC), production of a summary of techniques for detecting child pornography content (existing and new) deployed by major platforms.
Technical support in developing the Arcom reference framework setting out the minimum technical requirements applicable to age verification systems for the protection of minors against online pornography.
In support of the DGMIC, creation of an overview of age verification solutions currently implemented by various major platforms accessible to minors.
Supporting ARPE on algorithmic issues, particularly those related to worker disconnections, facial recognition, and “algorithmic management” in the context of its biannual report on the working conditions of self-employed workers.
Following on from experimental work carried out in 2022 and 2023, PEReN has developed a prototype tool that can be used to verify, on a large scale, in an automated and statistically tangible manner, route recommendations on mobile applications and their potential impact on the use of different types of road networks.
In 2023, PEReN developed a toolkit for implementing causal discovery processes in algorithmic auditing. In 2024, these resources, including the computer code produced, were made available to CNIL agents who were trained in the assumption that they would use these tools on real cases.
At the request of the General Inspectorate of Cultural Affairs (IGAC), PEReN provided technical support for assessing the impact of the individual portion of the Culture Pass on the intensification and diversification of cultural practices among young people. In line with previous work on recommendation systems and the assessment of their diversity, PEReN audited the algorithmic systems of the Culture Pass (operation, use of data, training and performance of models) and made recommendations.
In 2023, PEReN developed a toolkit for implementing causal discovery processes in algorithmic auditing. In 2024, these resources, including the computer code produced, were made available to CNIL agents who were trained in the assumption that they would use these tools on real cases.
At the request of the General Inspectorate of Cultural Affairs (IGAC), PEReN provided technical support for the assessment of the impact of the individual portion of the Culture Pass on the intensification and diversification of cultural practices among young people. Building on previous work on recommendation systems and the assessment of their diversity, PEReN audited the algorithmic systems of the Culture Pass (functioning, use of data, training and performance of models) and made recommendations.
In support of the CNIL's work on the risks associated with the generation of hyper-fakes (identity theft, fraud, pornographic disclosure, disinformation, etc.), exploration of the ease with which these hyper-fakes can be produced. (study available here)
As part of its participation in the National Institute for AI Evaluation and Safety (INESIA), PEReN has replicated tests conducted by international counterparts to evaluate the performance of large language models. This work has made it possible to compare the methodologies used by each party to address the same AI evaluation issue.
For the Transport Regulatory Authority, development of a platform enabling the querying and collection of travel solutions proposed by mobile route calculators, via the instrumentation of phones or emulators.
For Arcom, development of modules for harvesting data relating to works featured on catch-up TV and video-on-demand interfaces.
As part of ongoing work with the CNIL, enhancement and refinement of an automated consent banner collection and processing tool: addition of features for anonymizing screenshots and emulating mobile browsers, creation of a large database of banner types.
Provision of an online programming interface (web API) for the automated collection of publicly accessible data to all partners with scraping needs who can demonstrate that they have the appropriate legal framework.In 2024, developments to this interface pursued two main objectives: to produce an automated collection result that is as close as possible to that of a manual collection and to enable users to minimize the time spent on manual data collection. In 2024, developments to this interface pursued two main objectives: to produce an automatic collection result as close as possible to that of a manual collection and to enable users to minimize the amount of data collected.
Development for the French Financial Markets Authority (AMF) of an experimental prototype for monitoring signs of abnormal activity on online forums, to detect both spikes in activity and the presence of specific profiles that may be artificial.
In support of the missions of the National Gaming Authority (ANJ), which can block websites offering illegal gambling and games of chance, development of a more automated tool for monitoring illegal websites and detecting mirror sites of websites that have already been banned.
Support for the CNIL through the automatic and reproducible collection of advertising auction data conducted by advertisers, with a view to analyzing the impact of a user's web browsing history on the level of these auctions.
At the request of Arcep and Arcom, development and implementation of a testing protocol to measure the environmental impact of online video consumption using different codecs and devices. The tests were conducted using automation and programmatic tools. (open source code for tools : wattmeter instrumentation ; instrumentation of smartphones & computers used ; generation of study result graphs)
Au travers d’une bibliographie académique, réalisation d’un état de l'art (scientifique et technique) des méthodes de détection de contenu généré, en particulier à partir des modèles génératifs de textes et d’images. This review focused on issues related to measuring the artificiality of content and existing watermarking solutions (testing the effectiveness and detection of this marking).
Analysis of the state of the art in terms of data poisoning risks in training or retraining databases for generative AI systems, and more specifically LLMs, as well as possible defenses.
Exploration of scientific research on the issue of information unlearning in AI systems, a technique that involves causing a machine learning model to forget certain elements that it has been “taught,” without causing too much loss of performance.
Conducted in partnership with Arcom, the project aimed to develop a definition for measuring pluralism of opinion and schools of thought by reviewing scientific literature and proposing an initial experimental methodology on a content-sharing platform.
Launch of a research project aimed at studying how data collection for advertising purposes via SDKs (Software Development Kits), software components embedded in mobile applications, operates here in an Android environment.
This project aims to develop a tool for detecting hate speech in French. By collaborating with researchers and evaluating large open-source language models (LLMs), the project seeks to address the lack of annotated data in French. The goal is to create an accessible tool for detecting hate speech.
Organization of regular meetings aimed at promoting exchanges between administrations responsible for regulating digital platforms and the research community. At these meetings, a researcher or expert is invited to present their work on topics related to PEReN's activities.
Quarterly publications as part of the “Shedding light on…” series, which provides technical analysis on topics related to the regulation of digital platforms in a manner accessible to a general audience. (Issues available here).
A new section of this website, “PEReN Lab” aims to share, through blog-style tech articles, the cluster's state-of-the-art expertise on technologies deployed in its projects and, where applicable, the source code developed. (Published articles available here)
Open source publication, in a dedicated space, of application service codes and tools put into production.
Conducting the “Digital Services (h)Acked” hackathon, co-organized with the European Commission (DG Connect and ECAT) in Brussels in February 2024. At the time of the full entry into force of the Digital Services Act (DSA), this competition aimed to provide innovative open-source tools to facilitate in-depth study of the algorithms that influence our daily lives, from our purchasing habits to our social interactions.
As part of preparations for the AI Action Summit (February 2025), PEReN has partnered with VIGINUM to build a standardized interface for evaluating the performance and robustness of artificial content detection models. How does the tool work? It brings together a series of state-of-the-art detectors via a tool that allows new models to be added easily, enabling these detectors to be benchmarked against content similar to that found on social media, thereby exploring their effectiveness and complementarity.
Overview of how robots that explore and collect data from websites work and the issues involved: explanation of exploration robots, the importance of this data collection for AI systems and the openness of the internet, exclusion protocol.
Following the publication of “Shedding light on…” No. 7 – Open source and AI: synergies to be rethought – which explored licensing formats and their suitability for stakeholders' needs, PEReN designed and published a graphical interface for comparing the degrees of openness of certain generative AI models with regard to the OSI's unprecedented definition of open source (Open Source Initiative). This interactive comparison tool allows users to determine which models are compatible with their use cases or philosophy by configuring the criteria according to their preferences. The considerable number of models means that the tool cannot be exhaustive. It therefore covers a selection of widely used models or those associated with specific licenses. (link to the comparison tool ; “Shedding light on…” No. 7 available here)