Publication of a study on evaluation methods for content recommandation algorithms

The DMGIC, as part of its participation to an international multi-stakeholder working group on the diversity of online content, commissioned PEReN and Inria/Regalia to conduct a feasibility study evaluating the conditions under which audits of content recommendation algorithms would be possible.

Publication of a study on evaluation methods for content recommandation algorithms

The DGMIC (Direction Générale des Médias et des Industries Culturelles), as part of its participation to an international multi-stakeholder working group on the diversity of online content, commissioned PEReN and Inria/Regalia to conduct a feasibility study evaluating the conditions under which audits of content recommendation algorithms would be possible.

In this study, we describe the evaluation methods of recommendation algorithms that could be implemented, based on a dual progression of cost to the regulator and the audited platform, and on the granularity of the audit. The reports starts by presenting the operating principles of a recommendation algorithm, understood here as an algorithm that filters objects from a (potentially extremely large) catalogue for a given user or segment of users, based on some of their characteristics (platform use, centres of interest, economic value of the content, etc.), while taking into account the user’s feedback (likes, comments, time spent, abandon, etc.)

The report then presents the methods that can be used by a third party auditor to collect representative data, which is essential for any statistical analysis of a black-box algorithm. Such an analysis, which we refer to as “input/output testing", is a third way of considering a more detailed analysis than by simply sending out forms, while lowering the associated risks (intellectual property, personal data, etc.) of a full audit of source code or data (white-box audit). In particular, the possibilities of using occasional audits of samples of source code or data sets, voluntary or paid feedback from users, or even direct access to information through the platform’s public interfaces in order to build up a more substantial data set are analysed according to this dual progression of costs and audit granularity.

Finally, the study analyses the use of such data to conduct audits of recommendation algorithms, according to three typical scenarios of interest for the regulator:

  • Cross-exposure measurements according to a double segmentation of users and contents (the segmentation of users and contents being in that case known in advance). It could be, for example, making sure that a moderation algorithm does not recommend more violent contents to users identified as being male than to users identified as being female;
  • Identification of potential biases within a recommendation algorithm (i.e. determining a segmentation of content or users), which could involve identifying which content categories benefit most from algorithmic recommendations;
  • A more refined understanding of an algorithmic mechanism, by the realization of a simplified (virtual) model of the algorithm, which could then be used to determine the main explanatory parameters of the recommendations of the algorithm.

PEReN and Inria/Regalia were able to present this work and the lessons learned from this study to the multi-stakeholder working group on the diversity of online content led by the Department of Canadian Heritage on May, 28th. This study, which lays the first theoretical foundations for black-box auditing methods co-developed by PEReN and Inria/Regalia, now paves the way for the validation and experimental demonstration of these black-box auditing methods on specific algorithms.

A complete translation of the study can be downloaded in English . The original French version is also available. This file may not be suitable for users of assistive technology, you can request an accessible format.