Generative AI and Fair Use | UW Pre-Law Journal

The University of Wisconsin Pre-Law Journal

Generative AI and Fair Use

By Ellie Moseman Edited by Alexandra Tapia

Vol. 1, Issue 2. — July 2025

newspapers, music producers, authors, and artists. Artificial Intelligence (AI) makes decisions based on stimuli. It learns how to create content based on social media posts, books, art, photos, and music, much of which is copyrighted material, to create new text, art, and music [1]. A debate has risen about when use of copyrighted materials to train AI is infringement and when it is fair use.

[1] Sag 2024, 1891

Fair Use

There are four factors that must be considered when determining fair use, however, each factor does not need to

be considered with the same weight. The first consideration that determines fair use is the purpose and character of the use; whether the work is used for educational purposes or commercial gain and how transformative the use is. Transformative use employs the work for a different purpose than the original work and creates new expression [2]. Work that is used for educational purposes or is more transformative is in favor of fair use. The second factor is the nature of the copyrighted work, how creative it is and how factual it is. The third factor is the amount of the work used in relation to the whole–where using a smaller proportion of the original work is in favor of fair use. The last factor considered is the effect of the use on the potential market. Those suing AI companies argue their exclusive rights to reproduce, distribute, and sell are being violated by AI platforms. The AI companies may argue that their usage is fair use because of the trans formativeness of the use [3] and the effect on the market. Whether courts determine that AI use of copyrighted material is infringement, or fair use will have long-term consequences on the treatment of intellectual property, and the development of AI.

[2] Chandrakar 2024, 48

[3] Chandrakar 2024, 53

New York Times v. Microsoft

In December of 2023, the New York Times sued Microsoft and OpenAI for copyright infringement. The New York

Times’ complaint asserts that Microsoft and OpenAI use their work to create AI products that compete with the New York Times, making the Times unable to provide important journalism [4]. They claim that AI can recite copied works verbatim, providing summaries that mimic the style of Times articles and are significantly longer and more detailed than search engine summaries [5]. The Times argues this deprives them of subscription, licensing, and advertising revenue [6]. They claim that Microsoft and OpenAI made money from reproduction of Times work, unauthorized public display of Times work, and diverting traffic away from copyright holders like the Times.

By using New York Times articles to train its LLMs, Microsoft and OpenAI reproduced the Times’ copyrighted

content, which the Times has the exclusive right to. Microsoft could argue that its use is transformative enough to be permissible under fair use, as the AI systems supposedly learn from Times content to produce their own work, rather than purely displaying training data. However, direct quotations and long summaries produced by the AI are reproductions and not transformative. The purpose of these outputs is no different from the purpose of the original news articles, which is at the heart of the trans formativeness requirement [7]. The copyrighted data is used to inform the AI models, which the models then use to inform users of news. The other factors of fair use, including the effect of the use on the market, also weigh against fair use. Verbatim display of Times material and the AI system’s in-depth summaries of copyrighted works divert traffic away from copyright holders, as users can go to AI systems to get Times works instead of going directly to the Times, altering the market for New York Times articles.

The New York Times spends a large amount of their original complaint describing the type and quality of

reporting they do and how important they believe it is. Protection of intellectual property is crucial for their ability to produce world class journalism. If they no longer have the exclusive right to reproduce, distribute, or sell their work, they will not be able to monetize it as efficiently. If they lose funds, they’ll have less journalists and resources available for in-depth and high-quality reporting, which, they argue, will be harmful to society.

Alter v. Open AI

In another infringement case, Alter v. OpenAI, authors of fiction and nonfiction books, sued OpenAI for copyright infringement for using their works to train OpenAI’s LLMs. The LLMS can mimic, summarize, and paraphrase the author's works, harming the market for these creations. The plaintiff’s complaint argues OpenAI’s commercial success- which harms authors-was possible only because they copied the author’s work. The defendants argue that their use of copyrighted work constitutes fair use under 17 U.S.C. and that Microsoft’s models have commercially significant non-infringing uses [8]. Open AI discusses how teaching an AI model to understand language and human knowledge requires having it analyze “trillions of words [9].” They claim that using copyrighted works for information such as word frequency, patterns of syntax, and common themes [10] to train AI models is transformative enough to constitute fair use as the protected expression of the books is not replicated. The purpose of the AI models is not to reproduce works that already exist and communicate that expression to a human audience, but to use an understanding of language learned from these works to create something new [11]. If OpenAI is truly using the copyrighted books data in language training, its use could be considered educational as the works are being used to teach LLMs.

The first factor considered in fair use would then be in favor of fair use, because the use is transformative and

possibly educational. The nature of the copyrighted works is creative, and because entire books or large portions were used to train AI systems—under the logic that “the more data the better”—the second and third fair use factors would likely be against fair use. If, as OpenAI argues, the LLMs require lots of data to learn language patterns, then the use of the entire work would be because of the quantity of examples needed to train the models, and not because OpenAI was attempting to reproduce entire creative works. The second and third factors, then, are not as important in this case, especially if the use is as transformative as OpenAI claims. The effect on the market is the biggest factor weighing against fair use. The plaintiffs argue that LLMs allow anyone to generate works which people would otherwise pay authors to create, affecting their abilities to make a living. This is not a copyright infringement issue, as anyone could create works that compete with authors. However, the plaintiffs argue that the defendants would not have the ability to compete without use of copyrighted works. Writers have already reported losing income from copywriting, journalism, and content writing which can earn book authors about half their income [12].

Online publications are using generative AI, which uses authors’ copyrighted works during its training to write

content instead of hiring authors to write for the online publications, limiting the need for authors overall. These AI models could not exist without using author’s copyrighted work, which is now affecting the general market for writers. However, authors did not demonstrate that OpenAI was affecting the market for the specific works used in training, only demand for authors in general, so the author’s claim that OpenAI’s use of copyrighted material affects the market for their works is not entirely true. They only demonstrate how OpenAI’s use affects the market for authors. OpenAI’s models are no longer able to produce verbatim quotes of copyrighted works [13], instead they can only offer summaries. People who are interested in reading the plaintiff’s copyrighted works are not likely to want a summary of the book, so OpenAI is not affecting the market for the copyrighted works rather the market for authors in general. The court would have to decide which market authors are entitled to in order to determine AI’s effect on the market in Alter.

[4] The New York Times Company v. Microsoft Corporation, 1:23-cv-11195, (S.D.N.Y. Dec 27, 2023) ECF No. 1

[5] The New York Times Company v. Microsoft Corporation, 1:23-cv-11195, (S.D.N.Y. Dec 27, 2023) ECF No. 1

[6] The New York Times Company v. Microsoft Corporation, 1:23-cv-11195, (S.D.N.Y. Dec 27, 2023) ECF No. 1

[7] Chandrakar 2024, 54

[8] Alter v. OpenAI Inc., 1:23-cv-10211, (S.D.N.Y. Feb 16, 2024) ECF No. 51

[9] Alter v. OpenAI Inc., 1:23-cv-10211, (S.D.N.Y. Feb 06, 2024) ECF No. 47

[10] Alter v. OpenAI Inc., 1:23-cv-10211, (S.D.N.Y. Feb 16, 2024) ECF No. 51

[11] Alter v. OpenAI Inc., 1:23-cv-10211, (S.D.N.Y. Mar 04, 2024) ECF No. 67

[12] Alter v. OpenAI Inc., 1:23-cv-10211, (S.D.N.Y. Feb 06, 2024) ECF No. 47

[13] Alter v. OpenAI Inc., 1:23-cv-10211, (S.D.N.Y. Feb 06, 2024) ECF No. 47

UMG Recordings Inc. V. Uncharted Labs Inc.

UMG Recordings Inc. v. Uncharted Labs Inc. concerns infringing use of music to train AI. Uncharted Labs has a generative AI service, called Udio, that creates digital music files which sound like human recordings in response to inputs [14]. To produce convincing, human-sounding recordings, Udio is trained with copyrighted music recordings, including music owned by UMG. UMG claims that the music Udio produces will compete with their copyrighted music, which was used to train Udio. In addition, UMG argues that Udio can only create convincing and high-quality music if it copies vast amounts of music from across all genres, so the AI service could not exist without the use of copyrighted materials. UMG challenges the use of copyrighted materials by stating that the character of the use is not to deconstruct and learn patterns, but to imitate expressive features of music because Udio states its goal is to create convincing human-like music which is expressive. They argue that the music Udio creates is not transformative enough to constitute fair use. Udio’s outputs often directly imitate copyrighted songs, such as the Udio song “Prancing Monarch” which includes almost identical harmonies, similar pitches, and rhythms to the UMG owned ABBA song “Dancing Queen.” Uncharted Labs argues that UMG cannot own entire genres of music and that Udio simply creates music with a certain style [15]. However, music can be in the same style without copying exact elements of a particular song.

The use is not transformative, the songs generated by Udio closely resemble the copyrighted songs that were used

to train Udio. Since Udio had no rationale for using copyrighted materials other than producing similar music files for commercial gain, the first factor would weight against fair use. The second and third factors would also weigh in favor of the plaintiff and against fair use as the copyrighted works are creative and large amounts of them were used. UMG argues that “key elements of protectable expression” such as melodies, harmonies, vocals, and rhythms, are closely resembled or directly reproduced in Udio’s outputs, which would put the third factor of fair use in favor of the plaintiffs. Nonetheless, UMG also argues that Udio’s use has a great effect on the market, limiting UMG’s ability to license use of copyrighted music, specifically towards streaming services or content platforms which are an important part of UMG’s business model, as potential licensees could instead use Udio to create similar music for little or no cost. Udio could also create music that would compete with the existing copyrighted music used to train it, jeopardizing the jobs of songwriters and musicians who could be replaced by AI music [16]. Unlike Alter, UMG demonstrates examples of full songs created by Udio that exist on streaming platforms and can compete with copyrighted songs. UMG also argues that the purpose of fair use is to ensure public availability of arts so other humans can use the works to create new ones. Udio’s use of copyrighted music does not promote commentary, scholarship or human authorship [17] so it does not promote the purpose of fair use.

[14] UMG Recordings, Inc. v. Uncharted Labs, Inc., 1:24-cv-04777, (S.D.N.Y. Jun 25, 2024) ECF No. 9

[15] UMG Recordings, Inc. v. Uncharted Labs, Inc., 1:24-cv-04777, (S.D.N.Y. Aug 01, 2024) ECF No. 26

[16] Sunray 2021, 3

[17] UMG Recordings, Inc. v. Uncharted Labs, Inc., 1:24-cv-04777, (S.D.N.Y. Jun 25, 2024) ECF No. 9

Fair Use in the Cases

In all three cases the second and third factors weigh against fair use. When the copyrighted material is not closely reproduced and when used merely as data to train LLMs, the first and fourth factors weigh in favor of fair use. In that case, the usage is more transformative and creates little effect on the market. In Alter the copyrighted materials were used transformatively, according to the standard from Google v. Oracle [18]. The purpose and character of the use was different from the original purpose [19]. The original books’ purpose was to convey an author’s story while the use of the copyrighted material by AI companies was to train AI models. The outputs generated learned from the word choices and sentence structure of the books, but they were not created to tell a story like an author’s book. This is very different from the music produced by Udio, which is meant to mimic the styles of certain artists, making the purpose and character the same: to convey a song with a certain message and emotion, so this use is less transformative. In addition to being transformative, the use in cases like Alter is educational rather than commercial which makes factor one further in favor of fair use. The copyrighted materials are used as data to teach AI models, while companies generate profits from their AI models, they are not making money directly from the use of copyrighted materials. Using the materials to teach systems that they can then make money from is different than making money directly from the copyrighted materials themselves, especially when the outputs are different from the copyrighted materials like in Alter. Nevertheless, In the New York Times and UMG cases, the AI companies generate outputs that closely resemble or directly use copyrighted materials; effectively profiting from the copyrighted content itself.

The effect on the market is diminished when the copyrighted materials are used merely as data and the use is

highly transformative. All three cases focus on the negative effects of AI to their respective fields; however, Alter discusses how the advancement of AI will affect work for authors in general, not how the use of author’s novels specifically will affect the market for those novels. Unlike almost exact reproductions of songs, which people listen to instead of the original copyrighted versions, or summaries of articles that convey the same news, reading a summary of a novel is not the same as reading a full novel. While people may turn to sources like ChatGPT instead of the New York Times for news information, it is unlikely that those interested in reading a novel would be satisfied with a summary of that novel, so the effect on the market in a case like Alter would be less than cases where the copyrighted material is reproduced more directly and the use is less transformative. Uses where the material is more directly reproduced, including the use in NYT and UMG, would not be considered fair use.

Michael Murray, in his article “Generative AI Art: Copyright Infringement and Fair Use,” argues that analysis of

copyright infringement unfairly focuses on those who compile the data rather than users of the AI systems. He argues that users' prompts can steer the AI systems to make certain outputs that could be considered infringing, [20] for example by asking the AI to generate a song with a specific theme in the style of a certain musician. However, without inclusion of copyrighted materials in training data the AI would not have the material to create a reproduction, even if users steered it toward infringing outputs. Therefore, there must be some focus on the role of those who train the systems. As generative AI systems become more widespread and advanced, it is important that the law is defined to both protect human authorship and promote advancing technology. Certain uses of copyrighted materials to train AI should be considered fair use if the use is transformative and has less effect on the market. Otherwise, use of copyrighted materials is infringing.

[18] Breyer 2021

[19] Chandrakar 2024, 50

[20] Murray 2023, 260