- within Intellectual Property topic(s)
- in European Union
- in European Union
- in European Union
- in European Union
- with readers working within the Metals & Mining industries
- within Antitrust/Competition Law and Corporate/Commercial Law topic(s)
In recent times, copyright law has faced the following issues in relation to artificial intelligence (AI): (i) human authorship as a legal requirement for the "result" generated by AI to be considered a "work." Human authorship exists in AI-generated works when an author intervenes completely and absolutely or, when this is not the case, exercises creative, perceptible, and transformative control to direct the process or to select, arrange, dispose, and/or edit the "AI result"—in this case, an artistic work; (ii) large-scale reproduction of pre-existing works as training data or Text & Data Mining (hereinafter "TDM") for AI systems, including "prompts" (input instructions) and "outputs" (generated or output results), accessible through digital platforms; (iii) exceptions to copyright for TDM, for research or teaching; (iv) "deepfakes" as rights for impersonating one's own image or identity; and (v) algorithmic transparency, as an operational obligation to report on model training or output labeling. The above in aspects such as watermarking for content (which is not limited to works) or AI-generate traceability of logs for auditing AI processes.
I. INTRODUCTIÓN
A. Legal Challenges of AI
AI has crossed the legal threshold with new legal challenges. It has raised questions and faced dilemmas in all fields of law. There is no specialty that AI has not touched and impacted. It has been done from the substantive to the procedural, including law, jurisprudence, and doctrines. Nowadays, AI provides access to accurate and detailed information quickly and expeditiously, which allows for improved processes and reduced task completion times. But not only that, AI has evolved to such an extent that it has generated "results" or "outputs" of creative or aesthetic value, such as artistic works. Generative AI is based on deep learning models that produce works or other results or outputs.
B. AI and Copyright
Copyright follows this trend. In fact, it is one of the legal disciplines in which AI has become most prevalent and has posed the greatest challenge. Programmers have developed systems using algorithms that allow works to be generated as if they had been created by humans. We say they are works because they meet various criteria. These include: i) written text, in the form of novels, essays, scripts, poetry, or computer programs; ii) music, instrumental, sung, or both; iii) graphic production, in the form of architectural plans, drawings, or designs; or iv) audiovisual production, animation, or live action.
In principle, AI systems need human intervention to generate works. However, technologies have advanced to such an extent that they now do so with little or no human involvement. To a greater or lesser extent, AI systems generate, on their own, independent or autonomous results that are not called works because there is no author. This has sparked a debate about the meaning of work, authorship, originality and, in parallel, the legal protection of so-called "AI works." Authorship is not the only problem facing copyright law in relation to AI works. There are others that will be discussed in this paper.
C. Tensions
Generative AI has caused tension regarding copyright. Once again in history, the traditions of droit d'auteur and copyright are clashing to resolve issues arising from new technologies. The former focuses on personal rights resulting from human creativity. The latter focuses on the exclusive right arising from investment in the training of AI models, technologies, or processes. All laws around the world protect human authors. However, in some countries, especially those with a copyright system, the laws also protect the developer, coordinator, or investor of technical works, as if they were the author of artistic works.
II. GENERATIVE AI AND ARTISTIC WORKS
A. Presentation of Technical and Legal Aspects
- Conceptual Framework. Generative AI, rather than a means of communication, is a highly complex synthesis technology that has brought about a substantial change in the production of works. After "training" models with training data or TDM - "corpus"/"datasets" - the systems can generate diverse "results" in the creative field. AI "learns" in a technical sense: it adjusts neural network parameters to model "patterns." It "thinks" in the sense that it optimizes "loss" functions—it updates parameters that optimize a function.
- AI operating cycle. The AI operating cycle is described in five phases: i) "curation" and "selection" of "data" (deciding what data to use and why, in terms of quality and relevance) = "lawfulness" (verifying data permission, legal basis, and limits) + "governance" (rules, roles, and controls regarding who selects and complies with legality and how traceability is sought— knowing the sources of information used—and auditing to ensure traces); ii) "pre-training" (learning general representations in corpus and fine-tuning, which is specializing the model for a specific task) = "segmentation" of "tokens" (breaking down the input into minimum computational units or "tokens" of text/NLP, image, audio, and video) + "normalization" (the form of the data before training to reduce noise and irrelevant variability); iii) "training" (adjusting weights to minimize a loss function and thereby learn useful representations of the data) = "weight or parameter adjustment" (numbers that determine how the neural network transforms its input into output) + "representation learning" (vectors that encode useful features of text and audio, which appear in intermediate layers or "embeddings" and facilitate classification, generation, and semantic search tasks, among others); iv) "evaluation and alignment" (encompassing tests of model capabilities and risk and techniques for adjusting it to standards, including legal and human preferences); and v) "inference" (the model generates an output from "input instructions" or "prompts") and optionally with "RAG" - Retrieval Augmented Generation - (a pattern for searching documents in external databases, as well as retrieving and writing texts).
- Legal aspects. From a legal standpoint, AI processes or operational cycles normally require technical, temporary, and intermediate reproduction of artistic works. This links it to copyright law. Forms of reproduction serve various purposes, regardless of the operational phase in which they occur, for example: i) copies for non-expressive "analysis" or internal use. In industry jargon, unlike an "expressive" copy, a "non-expressive" copy is one of a technical nature, necessary for the operation of the system and which the user of a work does not perceive, know, or obtain—TDM and embedding are typical forms; ii) "samples" of a work or fragments thereof. In RAG, the operator or provider allows and the user requests the download of a work or fragments or pieces thereof, to view, read, or listen to in some medium, as opposed to when they "calculate" and "decide," without disseminating the work or when they disseminate its sample reduced to metadata; and iii) "governance"—certificates of origin or other standards to ascertain the origin and history of works used and thus avoid copyright issues. The basic difference between TDM and RAG is that the latter reproduces works or fragments by displaying retrieved pieces, while TDM does so for analysis. In the case of RAG, lawful access to the works is required through the appropriate licenses. TDM justifies a legal exception to the economic right of reproduction because the copy is technical and is made for analysis, without the user's knowledge.
- Rights or exceptions to rights. AI systems not only make copies in RAG or TDM, but they also make other types of copies during the AI operating cycle. Some are for "sampling," especially in the acquisition or ingestion of works; also in post-processing, dissemination, or distillation. Others are for analysis, traceability, or security; for example: i) to curate, verify, and perform permission or license; ii) to tokenize and normalize; iii) to annotate or label; iv) to package or take snapshots of datasets; v) adjusting weights; vi) checkpointing models; vii) evaluating—sometimes the evaluation is published; viii) aligning; ix) indexing, responding, or inferring, with or without RAG; and x) monitoring, security, or backups. The above cannot be ignored when defining the legal rules that diverge when the user of works needs to obtain authorization from copyright holders or when legal exceptions apply.
- What should the law say? International treaties and national copyright laws are structured on the principles of economic and moral rights. There are nuances depending on whether the country uses the droit d'auteur or copyright system.
However, all jurisdictions require at least a minimum of human creative activity, and therefore rights may vary. According to international treaties, there are four economic rights of use or exploitation: reproduction, distribution, public communication, and transformation. From these four pillars, a wide variety of modes of use or exploitation emerge, dictated by each particular industry. With rare exceptions, treaties or national laws are specific regarding concrete acts of exploitation. The reason for this is the general, illustrative, inclusive, technologically neutral, and illustrative nature of copyright. AI has brought new forms of use or exploitation of works, especially reproduction; these are listed above. As with other industries or media, the law should not designate each form of reproduction of AI works. It is sufficient for treaties and laws to recognize economic rights to include all acts relating to AI. In any case, the law could be amended only to regulate exceptional situations such as TDM. Legislators around the world must use inclusive and neutral formulas. This is to balance the interests of those who operate AI processes and those who hold rights. The purpose is to legislate with regulatory consistency and legal certainty.
III. AI COMPARED TO OTHER TECHNOLOGIES
A. What Makes Generative AI Different From Other Means of Using or Exploiting Works?
The key differences are:
- Autonomous works: AI reproduces works in the same way as typical technologies. However, it does so to generate new works, sometimes with little or no human intervention. This differs from technologies limited to reproducing or disseminating what has already been created by humans. AI is not a new technology for disseminating information, but for creating works or pseudo-creating them. The laws are based on the premise that a conscious human author is the sole source of artistic creativity. The copyright system qualifies this by referring to the author of technical works. AI has challenged the idea of the human author, questioning the basic principles and concepts of copyright.
- Works for mass training: The outputs—results generated by AI—of AI models do not derive from the copying of specific works, but from a corpus made up of millions of works reproduced for TDM, among other processes. All these works are subject to reproduction. On the other hand, AI does not transform original works from a recognizable or specific source, nor does it produce derivative works.
- Reproduction and beyond. In addition to reproducing works, AI reads, abstracts, synthesizes, and translates them. This expands the use or exploitation of works and shifts attention to other areas, including "diffuse" derivative works or market substitution without "identifiable" copying. AI cannot be defined as a simple tool for reproducing works. It offers something more.
- Multifaceted ambiguity. AI can create autonomous works without human intervention. This ability impacts traditional notions of work and authorship. The impact manifests itself ambiguously in creative, functional, legal, symbolic, and epistemic spheres. AI does not guarantee traceability in the generative processes it undertakes. New modes of access strain copyright in form and substance. In this regard, security and governance mechanisms have been developed to establish controls against multifaceted ambiguity.
- Embedding or TDM. AI differs from classic forms of exploitation in that it performs embedding or TDM. Both TDM and embedding involve indirect reproduction of works in a computer's intermediate memory. TDM is a process of reading data from temporary technical copies, which are obtained to extract patterns (rules, guidelines, or examples). Embedding, on the other hand, is a non-"expressive" numerical vector, which starts by copying works or data in order to read them and calculate the vector used to perform TDM. The reference to "expressive" alludes to the numerical vector's inability to create. Neither TDM nor embedding can copy a work, at least not directly. Copying or reproduction is rather prior and therefore indirect, because it occurs prior to embedding or TDM. In any case, under copyright law, indirect copying of works constitutes an act of reproduction, as if it were direct.
- Synthesis of deepfakes. AI can manipulate, simulate, or impersonate a person's voice, image, or other identifying features. It can also replace people with fake images or voices. Victims of deepfakes look real in photographs or videos, but their physical image, voice, or identity does not correspond to reality: they do or say things that are different from the real person.
- Algorithmic transparency. There are differences between AI and other technologies for various reasons: i) synthetic source information, model version, and AI auditing, as opposed to the DRM system of typical digital technologies; ii) AI processes, models, and metadata, as opposed to copies. Physical or digital aspects of other technologies; iii) in AI, diffuse transparency due to the source-result link and label that corrects it, as opposed to the obvious source (CD, signal, file) of other technologies; and iv) in AI, security review audits carried out by users and technical authorities, in addition to owners and platforms.
B. Practical Taxonomy Proposed By AI Because It Is Different From Other Technologies
- It is difficult to know who is the author of AI works when an AI work is produced by human creation and AI, carried out under verifiable human creative direction, as opposed to the automatic generation of another AI work, in which there was no creative control. When is there sufficient human contribution in these cases?
- Works reproduced for AI model training must be lawful. Those responsible for the operation of AI systems must have the authorization of the relevant copyright holders.
- It is important that researchers and teachers be able to use TDM, without the obligation to obtain authorization from rights holders, to reproduce works in order to operate AI systems. This applies to the training or TDM stages and inference of the already trained model. In this regard, it matters whether the reproduction is technical (such as RAM, cache, embedding) or expressive.
- There is no derivative AI work, because there is no other unrecognizable original work, when TDM is done as part of a creative process. It is not known whether a work was reproduced among millions of others.
- What happens if the voice of a singer is "simulated," "impersonated," or "cloned" (as they say in the industry, without the law recognizing equivalent meanings) to perform unreleased music? Or the image of a politician or celebrity? Or the identity of an ordinary person?
- Which laws should regulate algorithmic transparency, traceability, or governance: copyright law or AI standards? There is a similarity between AI and technologies such as digital or television: both involve the need for technical, temporary, or caching copies. These copies are not perceived by users and are made as part of a technological process. The legal exceptions adopted in the world of television or digital technology serve as a precedent for legislators regarding AI.
To view the full article, click here.
The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.