INESIA - Results of joint testing for the improvement of LLM multilingual evaluation methodologies

Within the framework of the international network of AI safety institutes, PEReN, through INESIA, took part in a joint testing exercise for the improvement of LLM multilingual evaluation methodologies. Publication of the results evaluation report.

In order to establish a common approach to LLM (Large Language Models) security testing in several languages, the AI security institutes and offices mandated by the governments of Singapore, Japan, Australia, Canada, the European Union, France, Kenya and South Korea, as well as the UK’s AI security institute, have conducted a joint study, the conclusions of which are now published.

This evaluation exercise, in which the French institute INESIA (Anssi, Inria, LNE, PEReN) took part, sought to address the following three issues:

effectiveness of the guarantees: are the safety guarantees equivalent in the different languages?
quality of the LLM’s refusal to respond: when the models refuse to respond, are they too cautious or are they always useful? Does this behaviour vary from one language to another?
quality of the assessor: is the use of an LLM-as-a-judge a reliable means of multilingual assessment?

The published report describes in detail the methodological elements and the results observed.

Download the evaluation report on the joint testing to improve LLM multilingual assessment methodologies

INESIA in a nutshell…

The French National Institute for the Evaluation and Security of Artificial Intelligence (INESIA), co-piloted by the General Secretariat for Defense and National Security (SGDSN) and the General Directorate for Enterprise (DGE), is a major initiative for France to dialogue with other world leaders in AI on issues of security and evaluation. Created in accordance with the Seoul Declaration, INESIA federates the actions ofANSSI, Inria, LNE and PEReN to structure French public efforts in the field of systemic risk analysis, regulation and evaluation of AI models. It also aims to develop concepts and tools for assessing the performance and reliability of AI systems. Through INESIA, France plays a key role in the international network of AI Safety Institutes, working alongside countries such as Canada, South Korea, the United States, Japan, Kenya, Singapore, the United Kingdom and the European Union to promote safety, inclusiveness and trust in AI.