INESIA - Results of an AI agent evaluation exercise by the International Network of AI Safety Institutes

As part of the International Network of AI Safety Institutes, INESIA, including PEReN, has taken part in a new joint exercise on the evaluation of AI agents.

As part of a new international cooperation involving INESIA, PEReN has taken part in the evaluation of models integrated within AI agents. This work, conducted along with the AI Safety Institutes of Singapore, Japan, Australia, Kenya, South Korea and the UK, as well as with the AI Office of the European Commission, was divided into two parts.

Multilingual assessment of fraud and sensitive information leakage risks

PEReN contributed to the first part, a multilingual assessment of two models (an open model and a proprietary model) embedded in an AI agent, focusing on the risks of fraud and leakage of sensitive information. The aim of this work was to explore the following key questions:

Do models present a high risk of fraud or leakage of sensitive information when integrated into an agent? Do these risks vary according to language?
Are “judge” models good evaluators of AI agent safety? Is this conclusion valid for all languages?

To this end, an English evaluation set was constructed from existing open evaluation sets and additional examples, then translated into 8 other languages. The behavior of the models on each example in the evaluation set was evaluated in parallel by an “judge” model and by humans.

Evaluation of cybersecurity capabilities

The second part of the project involved an evaluation of the cybersecurity capabilities of two open models (using the Cybench and Intercode datasets), with the additional aim of identifying variables with a strong influence on the robustness of the evaluation.

To find out more about the methodological elements and the results obtained, please consult:

this blog post published online by the British AI Security Institute;
and the detailed report.

INESIA in a nutshell…

The French National Institute for the Evaluation and Security of Artificial Intelligence (INESIA), co-piloted by the General Secretariat for Defense and National Security (SGDSN) and the General Directorate for Enterprise (DGE), is a major initiative for France to dialogue with other world leaders in AI on issues of security and evaluation. Created in accordance with the Seoul Declaration, INESIA federates the actions ofANSSI, Inria, LNE and PEReN to structure French public efforts in the field of systemic risk analysis, regulation and evaluation of AI models. It also aims to develop concepts and tools for assessing the performance and reliability of AI systems. Through INESIA, France plays a key role in the international network of AI Safety Institutes, working alongside countries such as Canada, South Korea, the United States, Japan, Kenya, Singapore, the United Kingdom and the European Union to promote safety, inclusiveness and trust in AI.