Gouvernement
PEReN – Center of expertise for digital platform regulation
Data science expertise at the service of digital regulation
The rise of generative AI has led to the usage of millions of original images to generate new ones. In this context, identifying which specific original works were used to produce a particular artificial image is not only very costly in terms of computational power but also uncertain. PEReN explores an alternative approach in this prototype, based on the nearest neighbors search method, which allows for a low-cost identification of the images in the training set that are most similar to a generated image. Although imperfect, this approach provides a method for objective comparisons, serving as a basis for subsequent discussion.
The research field of “Training Data Attribution” explores responses to the technical issue raised. Initially focused on studying the influence of training data on generated content, new research is deepening these techniques to identify the original contents used to train generative AI models.
Two types of methods are at work:
Causal methods offer robust explainability in a statistical and theoretical sense, but their implementation is complex and resource-intensive, even for lighter approaches such as SHAP or LIME. Designed to explain predictions of classifiers, they quickly reveal their limitations when facing the text-to-image generative models studied in this note, whose complexity and volume of training data make it difficult to precisely identify influences on the generated content.
We therefore explored a non-causal approach that is simpler, more lightweight, and can be applied in a fraction of the computation time required for generation, even on very large datasets. The proposed identification approach is based on a method relying on the proximity between original contents present in a training database and those generated by the generative AI trained on this database. For a generated image, the goal is thus to find the original images that are the most similar to it. As our tests will show, the concept of similarity between works can cover several dimensions and depends on the chosen embedding model: for example, similarity in content or similarity in style.
Similar content search, or "nearest neighbor search," relies on three steps (see Figure 1):Figure 1 : llustration of the nearest neighbors search system. In this example, the search was limited to 2 nearest neighbors.
The developed prototype underwent a series of tests to characterize its performance and limitations. The experimental framework focused on text-to-image models not integrated into systems such as RAG (Retrieval-Augmented Generation), which allows the inclusion of additional original images beyond those in the training dataset. In our case, the original content database that we use is a subset of the generative model’s training dataset. To select the attributions, we fixed a number K of nearest neighbors.
From the attribution dataset created by Wang et al. (2023) following the process illustrated in Figure 2, we selected a subset consisting solely of copyright-free works and images generated from them, comprising 1,576 original works and 8,400 generated images.
Figure 2: Principle of the attribution dataset derived from Wang et al.
The resulting reference dataset is divided into two subsets:
In the attribution dataset, each artist corresponds to a set of original works. The synthetic images were generated by Wang et al. using a text-to-image model retrained with the works belonging to the respective artist.
Initially, two methods for searching similar content were considered:
A more in-depth comparison of the two methods is available in the appendix 1.
Given the small size of our indexing database, we could not observe any performance difference between the exact search method (KD-Tree) and the approximate search method (HNSW). Therefore, we only detail the results obtained at the overall dataset level and at the level of each generated image for the HNSW method (example of attribution in Figure 3). The number K of assigned images, set between 1 and 100 for this experiment, is fixed for all generated images.
Figure 3 : Example of attributions obtained using the HNSW method for a generated image, based on the attribution rank (from 1 to 100). It is observed that as the rank increases, the attributed images become visually more distant from the generated image. This visualization highlights the importance of varying the number of attributions K to analyze the impact on the accuracy of the method used.
To evaluate performance at the overall dataset level, we measure the percentage of generated images with at least one assigned image from the actual artist who inspired the generated image. In most cases, the method successfully attributes part of the inspiration to the correct artist, as illustrated in Figure 4.
Figure 4 : Proportion of generated images with at least one correctly attributed image (the correct artist is present at least once), as a function of the fixed number of K nearest neighbors.
Two metrics, averaged across each dataset, evaluate the performance for the generated images (see Figure 5):
Figure 5 : Average precision and recall on the set of generated images, as a function of the fixed number K of nearest neighbors.
We also observe that the method performs better on the “gpt” dataset (vague and abstract prompts) than on the “object” dataset (specific and concrete prompts). A possible explanation for this performance difference will be proposed later. In a second experiment, we varied K according to the number of real inspirational images used for each generated image. Based on the results (see Figure 13 in the appendix 2), we reached the same conclusions.
In the reference dataset used, each generated image is inspired by images from a single artist. It is therefore possible that a high diversity of artists in the images attributed by the method is linked to incorrect attribution: this could indicate that the style of the generated image was poorly discerned and could be attributed to several different artists. Conversely, if the method attributes images from a single artist, this could indicate greater certainty regarding the inspirational style of the generated image. If the hypothesis is confirmed, this provides an indicator of the confidence we can have in the attribution (see illustration Figure 6).
Figure 6 : llustration of the link between the diversity of assigned artists and the success of the assignment. At the top: maximum diversity of assigned artists is associated with a poor assignment. At the bottom: minimal diversity of assigned artists is associated with a good assignment.
Statistical tests were conducted on the relationship between the diversity of artists attributed to a generated image and binary performance metrics. Diversity is calculated using Shannon entropy on the attributed artists, ranging between 0 and 1. It is equal to 1 when as many distinct artists are attributed as there are images, and 0 when only one artist is attributed.
We first performed a Mann-Whitney test comparing the distributions of artist diversity attributed depending on whether the actual artist was present among the attributed images. The low p-value associated with this test (p = 5.4e-56) confirms the difference in distribution. We then performed the same test comparing cases where the attributed work closest neighbor corresponds to the original artist or not. A difference in distribution is once again identified (p = 5.9e-87). The results of these tests show a significant link between the diversity of attributed artists and:
The diversity of attributed artists can therefore serve as a partial indicator of confidence in the method’s predictions. However, this finding may not be generalizable to a reference dataset composed of images generated from original works of multiple artists.
So far, we have used the distance between embeddings to identify the nearest neighbors, but this distance is also information in itself. We could use it as an indicator of the level of confidence we can assign to an attribution. Correctly attributed images would have a low distance because they are very close, while incorrectly attributed images would have a higher distance since they are farther away semantically. A distance between the embedding of the generated image and its nearest neighbor below a certain threshold could therefore indicate high confidence in the attribution (see illustration in Figure 7).
Figure 7 : Illustration of the link between distance from the nearest neighbour and successful attribution. Top: a generated image inspired by Fra Carnavele, falsely attributed to a work by Gustave Caillebotte. Bottom: a generated image inspired by Carracci, correctly attributed to a work by Carracci.
We performed three Mann-Whitney statistical tests on the distribution of distances between embeddings. These distributions are indeed different depending on the presence of the actual artist among the attributed images (p = 2.4e-54), the identification of a work by the actual artist as the closest neighbor (p = 2.1e-106), and the singular presence of the artist in the attributed images (p = 4.0e-42).
The results tend to show a statistically significant link between the distance to the nearest neighbor of the embedding of the generated image and:
A low distance between embeddings can therefore constitute another indicator of the relevance of the method’s attributions. This distance itself depends on the chosen embedder.
| | Real artist present | Nearest neighbor from the real artist | Real artist only |
|---|---|---|---|
| Hypothesis 1: diversity of artists | p = 5.4e-56 | p = 5.9e-87 | N/A |
| Hypothesis 2: distances between embeddings | p = 2.4e-54 | p = 2.1e-106 | p = 4.0e-42 |
Table 1 : Summary table of statistical test results for two hypotheses. Here, we consider the “gpt” dataset and a number of attributed images equal to the number of inspiration images for each generated image. For each hypothesis, the performed test is the Mann-Whitney test between the distributions of two classes defined by the metric in the column. For example, the p-value associated with the diversity values of artists for images where the real artist is present versus images where the real artist is not present is 5.4e-56.
An image can be characterized by both its content (e.g., the objects present in the image) and its style (artistic movement, colors used, etc.). Thus, insufficient consideration of the style of images by embedders could lead to attribution errors.
It is possible that the embedder used (CLIP) primarily encapsulates the content of images while neglecting the
style. Indeed, the image-text pairs in CLIP’s training dataset come from a web corpus called WebImageText, which includes
400 million image-text pairs. These texts primarily describe the visual content of each image to ensure
accessibility, rather than the image’s style.
The embeddings extracted in this way would lose part of the information related to the image’s style, influencing
the distance calculation results: two images with different styles but featuring the same objects might appear
very similar. Thus, the attribution errors of the method could be linked to inadequate embeddings: choosing a more
relevant embedder, i.e., one adapted to the priorities and attribution objectives, could improve the method at a
low cost.
Specialized embedders for extracting style information, such as ALADIN (Ruta et al., 2021), re not available under licenses permitting their use. We therefore sought to verify whether attributed images were closer, in terms of content, to the generated image than to reference images. This could suggest that CLIP-based attribution favors content over style.
Given the difficulty of finding an embedder we can be certain encapsulates strictly only the content, we decided to generate descriptions to extract only the content of the images. Indeed, due to the nature of the training data described above, multimodal text-to-image models seem better suited to extract the content of an image. To quantify the similarity between two images in terms of content, we followed these steps (see details in the appendix 3) :Figure 8 : Distribution of cosine similarity between embeddings capturing the content of attributed images and those capturing the content of real images used as inspiration. A higher similarity indicates more homogeneous content among the images. The Mann-Whitney statistical test is significant at the 0.0001 threshold. Here, we consider images attributed using the HNSW method, the “gpt” dataset, and a number of attributed images equal to the number of inspiration images for each generated image.
The results in Figure 8 show that the similarity distribution of content differs between attributed images and those that actually inspired the generated images (p = 6.6e-116). We also observe that embeddings are slightly more similar for attributed images than for reference images.
Thus, attributed images are more homogeneous in content than the real images that served as inspiration. This aligns with the hypothesis that the embedder encapsulates the content of images rather than their style (see illustration in Figure 9). However, this does not formally prove that images are attributed because they have similar content to the generated image.
Figure 9 : Illustration of the bias in embedding towards image content rather than their artistic style. On the left: the source images for the generated image, with a similar artistic style but different content. On the right: the incorrectly attributed images, with a different artistic style but similar content (boat, sea, clouds).
This greater homogeneity in content among attributed images could also explain why the proposed method performs worse on the “object” dataset than on the “gpt” dataset: the former is generated using prompts that enforce a greater variety of content (flowers, animals, landscapes, etc.) than the “gpt” dataset. An embedder more sensitive to content could explain a higher number of false attributions due to content farther from the actual inspirational images.
To strengthen these observations, using an embedder specialized in recognizing artistic styles, such as ALADIN (Ruta et al., 2021), or fine-tuning a standard embedder on artistic visuals (paintings, sculptures, etc.) based on the reference dataset could be a relevant future experiment.
When evaluating the robustness of an image attribution process, it is necessary to consider the possibility that the initial index may be incomplete or erroneous. In this context, we introduce “noise,” referring to the potential addition of extra images to the index: either homogeneous images (same domain, art), potentially completing an initially incomplete base, or heterogeneous images (différent domain, faces – FFHQ), which could disrupt the process.
To measure this robustness, we compare the initial attribution of original works (limited to a subset of the reference index) to the attribution after introducing noise.
The results in Figure 10 shows that:
Figure 10 : Average matching percentage between the initial assignment (without noise) and that obtained after introducing noise, depending on the level (10%, 25%, 50%, 75%) and the type of noise (Homogeneous/Art vs. Heterogeneous/Faces).
These results highlight a strong sensitivity to the index composition. An incomplete or contaminated index risks critical omissions in identifying the original works that inspired the generated images, posing a major challenge for the reliability of the attribution system.
The experiments show that the proposed method, based on similarity search, present promising efficiency, given that the reliability of the results must be analyzed taking into account the nomber of artists attributed by the algorithm, the distance between embeddings of original and generated images, or the presence of noise in the index.
An interesting property of this approach is that the computed similarity depends on the choice of the embedder, meaning that it is possible to choose one which is most adapted to the use case, whether it is by prioritizing the semantic information of the content in an image, its style, or some other parameter.
This prototype does not always make it possible to accurately identify the exact original works that inspired each image generated. Indeed, even if the truly influential artist is identified, other artists may also be attributed by mistake or not (for example, in the case of authors from the same artistic movement with strong markers). Tolerance to this type of error is configurable (by varying the threshold and the number of attributions) and remains a field to be explored in order to obtain results that are consistent on average.
These attribution mechanisms could form the basis for considering models of artist remuneration distribution based on their estimated contribution to the generation of a work. Many fields could be explored regarding the remuneration methods that would result from this attribution (see an example in Figure 11). Two approaches can be mentioned:
Figure 11 : Illustration of the attribution of source images for an image generation. At the top, the two Cimabue paintings (“Virgin Enthroned with Angels” and “Madonna Enthroned”) used for generation according to the algorithm, each in equal parts, annotated “50%”. At the bottom, the two paintings actually used for generation, still indicated at 50% each, which turn out to be the same as those suggested by the algorithm, thus illustrating a successful example of attribution based on the similarity of embeddings.
Although the development of this prototype has focused on image-generative models, similar methods exist for audio. Regarding textual content generation, other techniques such as TracIN or DataInf are applicable, but the discrete nature of text generation can make identifying the proportion of training data that influenced the generation more complex.
These prototyping efforts, intended for exploratory and technical purposes, do not presume the cases in which they could be adapted or applied. They are intended to fuel emerging thinking. Many uncertainties remain regarding these different methodologies, inherent to the current limitations of AI systems in providing a clear and undeniable answer regarding the origin of generated content.
For our prototype, we compared two specific techniques:
| | Accuracy | Speed | Ease of Implementation. | Scalability and robustness to variations |
|---|---|---|---|---|
| Nearest Neighbor Search (HNSW) | Approximate search, good accuracy with a slight margin of error | Very fast after indexing, but the construction phase is longer | Requires precise parameter tuning | Reliable despite variations and highly effective on large bases and large dimensions |
| Nearest Neighbor Search (KD-Tree) | Exact search, effective on small datasets but less performant in high dimensions | Fast indexing, but search time increases with the size of the dataset | Easy to implement | Less reliable and efficient when the data is large, suitable for small databases |
Evaluating the method by assigning the same number K of images for all generated images is subject to
a bias: not all generated images were created using the same number of real artworks. Between 1 and 67
artworks were used to retrain the generation model, depending on the generated image. Thus, generated
images inspired by only two real images will necessarily have incorrect attributions if the method is
asked to assign five inspirational images to them, not due to a flaw in the method itself.
To eliminate this bias, we assign for each generated image a number of images equal to the number of
real images used to retrain the model that generated it. Consequently, in this framework, precision
and recall metrics become equivalent. Figure 13 shows the attribution performance. We can also observe
that the method performs significantly better on the "gpt" dataset than on the "object" dataset.