Skip Navigation

License to Learn

Last year, Elon Musk proclaimed that AI was already smarter than the smartest person in the world, and soon will be smarter than all of the smartest people in the world collectively. No matter how one might feel about this statement, it is undeniable that AI has become deeply embedded in our daily routine. 

To become “the smartest,” or even to be good enough to help us with our daily tasks, generative AI models needed to learn—using a lot of data. Large Language Models (LLMs), such as ChatGPT or Meta’s LLaMA, are trained on vast collections of text, learning to replicate human writing styles. But the same data that makes these systems fluent also makes them ethically and legally fragile.

Activists and experts have highlighted the biases embedded into datasets, found explicit and inappropriate data included, and called attention to the abuse of workers who label toxic or traumatic material for under $2 per hour in developing nations—among other problems. However, the issues of justice, fairness, and data representativeness remain unaddressed largely due to the challenges of legislating them.

The one well-established legal principle that governs how text can be used is copyright, a system that grants authors exclusive control over their original work. But copyright law, designed for a world of books and classrooms, is now being stretched to its limits by generative AI.

As courts attempt to apply outdated copyright doctrines to AI training, they are testing the boundaries of a system built for a pre-digital era. The stakes are high: These cases are not just about liability, but about creating the legal precedent that will determine how we balance innovation with creative rights for years to come.

Currently, there are over 50 cases in federal courts against GenAI companies for training their models on copyrighted materials without permission. Two cases do a great job illuminating the copyright debate brought up by AI: Kadrey v. Meta and Bartz v. Anthropic. Both were brought forward by authors whose books appeared in the so-called “shadow libraries” that were used in early stages of LLM training without their permission. 

In both cases, the courts ruled that AI companies’ use of these texts qualifies as “fair use,” a doctrine within copyright law that allows limited use of copyrighted material without permission under certain conditions. AI companies have claimed that LLM training is fair use due to the principle of transformativeness—the idea that using copyrighted material adds a new purpose to the original work, like parody, commentary, or education. In the Bartz v. Anthropic case, Judge Alsup held that the training of the Claude model was transformative enough to remain in the domain of “fair use,” comparing it to teaching children how to write. However, according to the case, Anthropic also built and kept a centralized library of pirated works—an action that, in Judge Alsup’s view, crossed the line. Alsup said storing stolen texts for “whatever future use” is not transformative, so Anthropic still faces liability for piracy.

Judge Chhabria in the Meta case took a different approach in his ruling, focusing on the market impact doctrine of copyright laws. His reasoning was that, if AI trained on books undermines the market for those books—either by reducing demand for originals or by bypassing licensing markets—that would constitute infringement. The plaintiffs in Kadrey could not show evidence of market harm. Meta’s LLaMA model was not reproducing whole books, and there was no established licensing market for “AI training rights,” so the claim failed. The judge stressed that the ruling was plaintiff-specific and not a blanket approval: Meta won because these authors made weak arguments, not because AI training is universally fair use. This explicitly non-precedent-defining ruling brings us back to square one. These rulings highlight three key tensions: judges expressed uncertainty about the limits of fair use, public opinion remains overwhelmingly skeptical of AI companies’ ethics, and the law still lacks a clear, consistent precedent. While courts are not bound by public sentiment, this disconnect matters because it undermines the perceived legitimacy of legal outcomes in an area already suffering from mistrust of Big Tech.

Thus, both judges effectively left the door open for future plaintiffs with stronger records of harm. This uncertainty is echoed in the media: A 2025 Atlantic investigation about books included in the shadow libraries that followed commentaries on copyright in the age of AI noted the rulings are “not straightforward” and reflect “totally different conceptual frames for the problem.” 

Undeniably, these cases mark deep losses for creatives, revealing how legal and policy decisions increasingly align with the interests of Big Tech rather than individual creators. However, the cultural optics remain on the side of the artists: Authors, artists, and media outlets have consistently described AI companies as “stealing” from them. The Atlantic captured the anger of writers discovering their books inside the Books3 dataset made up of over 183,000 books: “The future promised by AI is written with stolen words.” Many authors only realized this after their works had already been used to train LLMs, fueling resentment and fear that machines trained on their words will replace them. 

The copyright battle with AI companies is fought on fronts that are beyond writing: Hollywood is also fueling the creatives’ negative sentiment towards GenAI. Disney and Universal described Midjourney, an AI image generation tool, as a “bottomless pit of plagiarism” that functions like a “virtual vending machine” for unauthorized content. Even the BBC has threatened legal action against Perplexity—an AI-powered search engine—for scraping its journalism, accusing the company of reproducing content verbatim.

The discrepancy between legal decisions and public dissatisfaction with Big Tech companies can be seen as a tech legitimacy crisis: Even if courts deem the Meta and Anthropic training as fair use, creators and the public still view it as unfair appropriation.

 There is a fear that creatives will pull back from innovation because of the anxiety of adding to the training dataset for generative AI. However, such an outcome seems unlikely. The greater risk lies in Big Tech companies continuing to achieve legislative and policy wins, consolidating their influence while the cultural conflicts with AI will remain with the creative class who are losing authority over art in its widest sense.

Even unfavorable rulings would give creatives clearer ground to adapt: the greater problem is uncertainty, not loss. The two rulings did not clarify general rules; they simply muddied the waters. Alsup elevated transformativeness, while Chhabria elevated market harm, leaving no unified doctrine. Although the two cases attracted significant public attention, the courts ultimately resolved them without delivering significant changes to the AI legal landscape.   

Meanwhile, global frameworks are diverging from the approach the American judges took. The EU AI Act introduces text- and data-mining exemptions, legal provisions allowing developers to copy and analyze large volumes of material for research or analytical purposes without explicit permission from its creators. The World Economic Forum warns that a “drastically uneven regulatory landscape” could destabilize both innovation and content creation by giving companies in more permissive jurisdictions unfair advantages and discouraging cross-border collaboration.

In September 2025, Anthropic agreed to pay $1.5 billion to settle a class action brought by authors, one of the largest copyright settlements in American history. Each author will receive around $3,000 per book. The move reflected corporate pragmatism over legal clarity: continuing the trial risked reputational damage, unpredictable rulings, and potential exposure in other cases. As one lawyer observed, “It’s not the end of AI, but the start of a more mature, sustainable ecosystem where creators are compensated, much like how the music industry adapted to digital distribution.” While the payout may ease tensions, it does little to define the underlying rights at stake. For Anthropic, the settlement was simply a manageable cost of doing business; the company raised a $13 billion Series F investment—a late-stage funding round for established startups—at a $183 billion valuation just days later.In the absence of a clear precedent, the AI industry is pressing ahead, training on whatever data they can access while the lawsuits pile up. As the Wall Street Journalobserved, “They’re fighting over who’s going to control and dictate the next generation of technological development.” The stakes are as high as they get; courts are now adjudicating the balance between human creativity and machine innovation, an endeavor for which judges and laws are clearly not prepared.

SUGGESTED ARTICLES