The relationship between artificial intelligence (AI) and copyright law is increasingly drawing interest and concern. While it is essential to explore the possible extension of copyright protection to works generated by AI, a topic addressed in a previous article, it is also appropriate to consider the potential infringement of copyright by AI during its training and subsequently in the creation of its outputs.
This is the issue of the so-called “unfair or infringing training” of generative AI, which can produce texts, images, videos, music or other media in response to user requests known as “prompts”.
In the United States, several legal proceedings have been filed against enterprises that own generative AI, accused of having violated the rights of multiple authors during its training. Many class actions have been filed by actors, playwriters, and writers against Open AI and META1 to establish their legal responsibility for copyright infringement.To understand the evolution of these litigations and the future decisions, which will undoubtedly acquire the status of legal precedents, it is necessary to analyze the legal issue underlying the entire matter.
1. Generative Artificial Intelligence Training
Firstly, it is relevant to note that the issue of copyright infringement arises exclusively in relation to generative AI, i.e., algorithms capable of creating new content – text, sound, images or art – upon user request. For AI to produce songs, texts, or images, it must be trained. To be trained, AI must have access to multiple data and sources: indeed, the more data it acquires, the more effective it becomes.
However, the works used to train AI are not stored; AI simply learns specific data within them, without reproducing any copy, except perhaps temporarily, of their content in its dataset. This is why, once the algorithm has been fed, it no longer requires such information: machine learning mechanisms use the inputs to enhance their accuracy and refine themselves, and afterward, they can continue to function without the original data.
2. The Core of the Issue: Does Generative AI training violate copyright?
For copyright infringement to occur, the work must be copied, reproduced, or published. In machine learning, during the initial phase, works must be necessarily sought on the internet, extracted from websites and, in essence, used to make the machine learn the data. Therefore, there could be a violation if such operations were not previously authorized by their owners/ by artists.
Open AI itself has admitted to having used two public datasets, which also contained copyrighted works, to train CHAT GPT, stating that the machine learning process necessarily requires copying these works for the AI’s best functioning.
3. Authors vs. Big Tech: The Two Perspectives.
From the authors’ perspective, the ability of AI to generate specific outputs inevitably stems from its prior training made using their works. Consequently, according to the authors, through generative AI models, big tech companies benefit and take profit from the unauthorized use of copyrighted works. For example, it is evident that if Chat GPT can compose a text in the style of a specific author when requested, it must have been previously trained on that author’s works. And if the author in question has not granted any authorization or license for the copying and use of his work, an infringement may exist.
On the other hand, big tech companies argue that, although theoretically copyright infringement may occur, it would still fall within the exceptions to copyright recognized by the U.S. law.
The first exception invoked is the temporary and non-commercial use of the work: it cannot be denied that the work is never stored, except temporarily, by the software.
The second, more established exception is the so-called “fair use exception,” under Article 17 U.S.C. According to big tech companies, their activity constitutes a legitimate use of the work, as the purpose of the work is not identical to that of the author, its nature is different, and, ultimately, it is not made available to the public but is solely used to train the program. Big techs justify this stance also in light of the legal precedent set by The Authors Guild, Inc. v. Google, Inc.
Furthermore, they argue that, in practice, AI’s functioning cannot be deemed distinct from mere human interaction. For instance, humans constantly draw inspiration from art and from their surroundings. This does not imply that our works, if stemming from our ideas, do not deserve protection. Likewise, artificial intelligence transforms and reinterprets the information on which its knowledge is based in such a broad and imaginative manner that it cannot be considered responsible for any infringement. The works of authors would thus be simple initial frameworks, subsequently overturned, interconnected, and blended by the algorithm.
4. Potential Outcomes of the Litigation
Considering the foregoing, within the U.S. context, authors find themselves in a less favorable position compared to big tech companies. Unlike the European Union, the United States currently lack a provision enabling authors to exclude their works from generative AI training developed for purposes other than pure scientific research. Therefore, in the United States, both law and case law (which, as we know, plays a guiding role in common law systems) currently favor big tech companies without any doubt.
While authors are mainly driven by ideals and a justified sense of justice, which, however, lacks legal support, big tech companies benefit from it: thus, the decision in favor of either party will depend solely on judicial will, and, more realistically, on the resolution of what seems to be more of a moral dilemma than a substantive legal issue. This is sometimes how the law evolves.
Currently, one can only reflect on the potential “injustice” of today’s AI training and contemplate possible trade-offs between the two parties, imagining innovative solutions that could lead to “fair training.” For example, following the European model, opt-out clauses could be put in place for artists, allowing them to decide whether to include their works in the generative AI training or not. And it could also be thought about taking legislative action to regulate this, as it is hoped to happen very soon in the EU with the enactment of the “AI Act”. Moreover, the fact that law currently seems to lean in favor of big tech does not imply that this solution is morally right. Authors invest their souls, originality, and creativity into their works; works that, nowadays, can be easily usurped by new technologies and used as a basis for new creations, thus garnering neither recognition nor profit for their creators.
Only legislators and judges, if they understand the concerns of artists and deem a change appropriate, can modify the existing legal framework and thereby promote a comprehensive reform to regulate this practice.
This situation raises various questions and concerns. While artificial intelligence is still an unexplored territory, it will acquire more significance and prominence in our society over time. And if there are no limits imposed on its activity, this reality may significantly modify the concepts of art and entertainment as we understand them today. How could the law, whose primary purpose is to protect citizens, tolerate this? What are the possible solutions to derive only benefits from AI development without harming people? We may have to wait months of legal proceedings to obtain a definitive answer to this dilemma.
Seeking to understand the direction in which AI will lead the Art’s World is the premise for acting.