When AI Training Meets Copyright Law: What Three 2025 Rulings Actually Decided

6 min read
Key Takeaways
  • Three federal judges issued the first substantive AI copyright rulings between February and June 2025: Thomson Reuters won on market substitution; Alsup found Anthropic’s training transformative but piracy separately infringing; Chhabria found Meta’s piracy fair use but signalled a market-dilution theory might succeed.
  • The $1.5 billion Bartz v. Anthropic settlement — covering ~500,000 books at ~$3,000 each — establishes the first quantitative reference point for author-versus-AI negotiations and requires destruction of files from pirate repositories.
  • Chhabria’s explicit signal in Kadrey v. Meta: a market-dilution theory showing LLM outputs flood and devalue the market for original works might prevail — the Authors Guild v. OpenAI plaintiffs have had notice and time to build that evidentiary record.
  • Enterprise vendor indemnities cover output infringement only — they do not cover training-data claims or the market-dilution scenario Chhabria identified as potentially viable.

Key Claim: The first three federal copyright rulings on AI training data produced three different outcomes through three different legal paths — and the indemnification provisions in enterprise AI contracts were not written to address the gaps those rulings exposed.

In the four months between February and June 2025, three US federal judges issued the first substantive copyright rulings on AI training data. Two found for the AI companies; one did not. Each reached its conclusion through a different legal path. Together, they leave enterprise teams — and the vendors selling them indemnification — without the clear liability shield most assumed existed.

The Cases That Have Actually Been Decided

The first ruling landed on 11 February 2025, when Judge Stephanos Bibas of the District of Delaware issued partial summary judgment for Thomson Reuters in Thomson Reuters Enterprise Centre GmbH v. ROSS Intelligence. Thomson Reuters, owner of Westlaw, had refused to license its content to ROSS — a competing AI-driven legal research platform — before suing when ROSS used Westlaw headnotes to train its model. Bibas rejected the fair-use defence on two factors: ROSS’s use was commercial and substitutive, directly competing with the product from which it had copied. The Andy Warhol precedent from the Supreme Court governed the purpose-and-character analysis: a use must have a meaningfully different purpose from the original, not merely a different medium. ROSS had none.

Source: Davis Wright Tremaine

Four months later, two Northern District of California judges reached opposite conclusions on nearly identical facts — and disagreed with each other.

On 23 June 2025, Judge William Alsup ruled in Bartz v. Anthropic that training a large language model on copyrighted books constitutes fair use; the training is, in his words, “transformative — spectacularly so.” But Alsup drew a bright line around the source of those books: acquiring pirated copies from shadow libraries such as Library Genesis is a separate act of infringement, not absolved by the transformative nature of what happens next. Anthropic settled the case three months later for $1.5 billion — preliminary approval granted by Alsup on 25 September 2025 — covering roughly 500,000 books at approximately $3,000 each and requiring Anthropic to destroy files sourced from pirate repositories.

Source: IPWatchdog | NPR

Two days after Alsup, Judge Vince Chhabria ruled in Kadrey v. Meta that Meta’s use of pirated books to train its Llama models was also fair use — treating the piracy and the training as a single, indivisible transformative act rather than Alsup’s two-step sequence. Chhabria was candid about the limits of his ruling: it “does not stand for the proposition that Meta’s use of copyrighted materials to train its language models is lawful. It stands only for the proposition that these plaintiffs made the wrong arguments.” He suggested a market-dilution theory — showing that LLM outputs flood and devalue the market for original works — might have prevailed if the plaintiffs had built the evidentiary record for it.

Source: Goodwin Law

The Cases Still in Discovery

The New York Times Co. v. Microsoft Corp. and OpenAI (S.D.N.Y.) survived a motion to dismiss in March 2025; Judge Sidney Stein allowed the core copyright claims to proceed. In January 2026, the court ordered OpenAI to produce all 20 million ChatGPT logs demanded by the Times, rejecting OpenAI’s argument that user privacy barred disclosure. Discovery is ongoing and no trial date has been set.

Source: Bloomberg Law | NPR

The Authors Guild v. OpenAI class action was transferred from California to S.D.N.Y. and consolidated with related suits in April 2025. Fact discovery closes 27 February 2026; expert discovery runs to July 2026; summary-judgment motions are due August 2026. No substantive fair-use ruling in this case is expected before late 2026.

Source: Publishers Weekly

In music, Universal Music Group settled its suit against AI platform Udio in October 2025, announcing a joint subscription service for licensed AI-generated music. Sony and Warner remain in litigation against Udio; all three labels continue litigating against Suno. No fair-use ruling has yet emerged from the music cases.

Source: Variety

The most significant international ruling came from the UK High Court. On 4 November 2025, Mrs Justice Smith rejected Getty Images’ secondary copyright infringement claim against Stability AI. Her reasoning: Stable Diffusion’s model weights “do not and have not stored or reproduced” Getty’s images — they encode learned patterns from training data that is not retained in the model. Without stored copies inside the model, secondary infringement liability cannot attach. Getty was granted permission to appeal in December 2025. The case produced one narrow win for Getty: the court found trademark infringement where earlier model versions generated synthetic images with visible Getty or iStock watermarks.

Source: Latham & Watkins

What the Copyright Office and the EU Said

On 9 May 2025, the US Copyright Office released Part 3 of its AI study — its first substantive guidance on training data. The Office concluded that “making commercial use of vast troves of copyrighted works to produce expressive content that competes with them in existing markets, especially where this is accomplished through illegal access, goes beyond established fair use boundaries.” It stopped short of recommending compulsory licensing, preferring market-based arrangements. The guidance carries no legal force but provides analytical support for litigants on both sides. The Register of Copyrights, Shira Perlmutter, was dismissed the following day by President Trump.

Source: Crowell & Moring

The EU took a structurally different path. General-purpose AI model compliance obligations under the EU AI Act entered into force on 2 August 2025, requiring GPAI providers to implement a copyright policy and publish a standardised summary of training content regardless of where the model was trained. The EU’s text-and-data-mining (TDM) exception under the Copyright in the Digital Single Market Directive permits scraping unless a rightsholder has issued a machine-readable opt-out. A Munich court confirmed in GEMA v. OpenAI that LLMs fall within the TDM exception because training reproductions do not, by themselves, harm authors’ rights. Rights holders who fail to deploy machine-readable reservation protocols — not just robots.txt, but tagged metadata meeting the TDM Reservation Protocol standard — may find their opt-outs invalid.

Source: Clifford Chance | IAPP

What Enterprise Teams Actually Have in Their Contracts

Beginning in late 2023, Microsoft, OpenAI, Google, and AWS each introduced copyright indemnification programs for enterprise customers. On paper, vendors will defend and pay damages if third parties sue customers over AI-generated outputs. In practice, the coverage is narrower than procurement teams typically realise.

Microsoft’s Customer Copyright Commitment applies when customers use Copilot products “unmodified as provided” and with all content filters active. GitHub Copilot’s indemnity applies only when code is generated without modification — yet customisation is the entire point of the integration. Google Workspace’s coverage excludes output the user “knew or should have known was likely infringing,” placing an undefined legal standard on non-lawyers. OpenAI’s enterprise agreement covers claims that customer use of outputs infringes third-party IP — but limits protection to customers who have not deliberately circumvented safety systems. No vendor has publicly explained what “defense” means procedurally: whether they accept notice immediately, who selects counsel, or what triggers termination of coverage.

Source: Runtime News | Proskauer

The key gap is scope. Vendor indemnities cover output infringement — the claim that a generated text or image reproduces a protected work. They do not cover training-data claims, nor do they cover the scenario in Chhabria’s dicta: that AI outputs flood and dilute the market for original works at scale. That theory, if litigated successfully in a future case, would rest on aggregate harm attributable to use patterns — including enterprise deployments — with no clear indemnification counterpart in current vendor agreements.

What to Watch

Bartz v. Anthropic final approval (April 2026). If Judge Alsup grants final approval at the fairness hearing, the $1.5 billion settlement becomes the baseline reference for all subsequent author-versus-AI negotiations. The claim deadline of 30 March 2026 has passed; the size of the actual payout will signal how broadly courts interpret class membership. Source: Copyright Alliance

Authors Guild v. OpenAI summary judgment (late 2026). This is the highest-stakes pending ruling. Unlike Kadrey, the Authors Guild plaintiffs have had notice of Chhabria’s market-dilution warning and will have had time to build an evidentiary record around it. A ruling against OpenAI would directly conflict with Kadrey, creating a circuit split that would force the issue to the appellate courts.

Getty v. Stability AI appeal (UK Court of Appeal). The UK High Court’s model-weights reasoning — that learned patterns are not stored copies — is the most consequential technical finding in any AI copyright case to date. If the Court of Appeal reverses it, image-generation model operators globally would face a materially different liability exposure. Source: Taylor Wessing

Thomson Reuters v. ROSS on appeal (Third Circuit). The Third Circuit’s review of Bibas’s ruling will determine whether the market-substitution test hardened here applies to all AI training on licensed data or is limited to the narrow facts of a direct competitor scraping a rival product. Source: IPWatchdog

For enterprise teams, the immediate action is not to wait for appellate resolution — it is to read the indemnification sections of every AI vendor contract against the specific use case and output type deployed, and to identify which claims those sections do not cover. The gap between “vendor promises to defend output infringement” and “full liability protection” is where the legal exposure currently lives.

Further Reading

This article was produced with AI assistance and reviewed by the editorial team.

Marcus Webb, policy and regulation correspondent at Next Waves Insight

About Marcus Webb

Marcus Webb covers AI policy, regulation, and geopolitics — from EU legislation to DARPA programmes to US-China technology competition. He has a background in technology law and previously worked as a policy analyst at a nonpartisan technology policy institute. He tracks standards bodies, government procurement signals, and legislative developments that others miss.

Meet the team →
Share: 𝕏 in
The NextWave SignalSubscribe free

The NextWave Signal

Enjoyed this analysis?

One AI market analysis + one emerging-tech signal, every Tuesday and Friday — written for engineers, PMs, and CTOs tracking what shifts before it goes mainstream.

Leave a Comment