[an error occurred while processing this directive]

The Internet Archive Lost Its Lawsuit. Here's What That Means for AI and Your Library

[an error occurred while processing this directive]

In March 2023, a federal judge ruled against the Internet Archive in Hachette v. Internet Archive, finding that the Archive's "Controlled Digital Lending" program violated copyright law. The decision was upheld on appeal in September 2024. The Internet Archive chose not to petition the Supreme Court, and the certiorari deadline passed in December 2024.

TL;DR
  • Publishers vs. Internet Archive: publishers argue CDL violates copyright. Internet Archive contends fair use permits libraries to lend one copy of each acquired book regardless of format.
  • Core legal question: does fair use protection extend to digital lending? Historical library lending rights developed for physical books; digital context is legally untested.
  • If publishers win, implications cascade: future library digital content requires licensing not acquisition, costs balloon, and rural/poor patrons lose access (they can't use commercial e-lending services without stable housing/payment method).
  • If Internet Archive wins, libraries maintain lending rights across digital and physical, fair use doctrine strengthens, and public access to information increases.

If you haven\'t been following this, you might think: "Okay, but I\'m not the Internet Archive. Why do I care?"

You should care because this case is about to collide headfirst with AI. And libraries are caught in the middle.

What Actually Happened

Quick background: The Internet Archive ran a program called Controlled Digital Lending (CDL). The idea was simple: They'd scan physical books they owned, lend digital copies on a one-to-one basis (one physical book = one digital loan at a time), and call it "digital lending" under fair use.

Publishers sued. They argued CDL was just piracy with extra steps, making unauthorized digital copies and distributing them without permission.

The court agreed with the publishers. Judge John G. Koeltl ruled that CDL wasn't fair use because:

  1. The Internet Archive was making copies of entire works (not just excerpts)
  2. The digital copies directly substituted for licensed e-book sales
  3. The market harm was significant (publishers lost e-book revenue)
  4. The nonprofit nature of the Archive didn't save them

The Second Circuit upheld this in September 2024. CDL is dead. The Internet Archive owes damages. And every library that was watching this case as a potential legal foundation for digital lending just lost.

Why This Matters for AI (And Your Library)

Here's where it gets uncomfortable.

AI companies are training their models on copyrighted content. Books, articles, images, code. Massive datasets scraped from the internet and used without permission or payment.

When publishers and authors sue (and they are, there are multiple active lawsuits), AI companies are using the same fair use arguments the Internet Archive tried to use.

They're saying:

Sound familiar? The Internet Archive said:

The court didn\'t buy it. And there\'s a good chance courts won\'t buy the AI companies" version either.

Which brings us to libraries.

The Vendor Problem You're Not Thinking About

Your discovery systems, catalog tools, research databases, and recommendation engines? A lot of them use AI trained on copyrighted content.

Let\'s say EBSCO\'s AI research assistant was trained on journal articles. Or ProQuest\'s AI summarization tool learned from copyrighted book content. Or your library\'s AI chatbot was trained on datasets that include in-copyright text.

If the courts decide that's copyright infringement (not fair use), what happens?

Option 1: The vendor gets sued, loses, and passes the costs to you via higher subscription fees.

Option 2: The vendor loses access to the training data and has to rebuild the AI with licensed content only. That means the tool gets worse or more expensive.

Option 3: The vendor decides AI features aren't worth the legal risk and removes them entirely.

None of these are great for you.

The HathiTrust Exception (And Why It's Narrow)

"But wait," you might say, "what about the HathiTrust case? Libraries won that one!"

True. In Authors Guild v. HathiTrust (2014), libraries successfully argued that creating a searchable database of scanned books was fair use. The court said full-text search and accessibility features (like text-to-speech for the blind) were transformative uses that didn't harm the market for the original works.

That case is often cited as proof that libraries can use copyrighted content for digital tools. But here's the catch: HathiTrust was really narrow.

The court said full-text search was okay. Not full-text display. Not full-text lending. Just search. Users could find keywords and see snippets, but not read the whole book.

Now apply that to AI.

If your AI tool is summarizing full articles, generating answers based on copyrighted content, or recommending resources by analyzing entire texts... that\'s not just search. That\'s reproducing and transforming the content in ways HathiTrust didn't cover.

And after the Internet Archive case, courts are clearly skeptical of "but we're a library" as a fair use defense.

What the AI Copyright Cases Tell Us

Multiple lawsuits tested whether AI training is fair use:

The pattern is clear: Courts are skeptical of the "AI training is fair use" argument. They're asking hard questions about:

As of early 2026, no US court has definitively ruled that AI training is fair use. Which means every library using AI tools built on copyrighted content is operating in legal uncertainty.

The Questions You Need to Ask Your Vendors Right Now

I\'m not saying don\'t use AI tools. I\'m saying it\'s worth knowing what legal risks come with them.

Next vendor meeting, ask:

  1. "What data did you use to train this AI?"
    • If they say "publicly available data" or "internet-scale datasets," that\'s a red flag. "Publicly available" doesn\'t mean "licensed for AI training."
    • If they can\'t or won\'t tell you, that's an even bigger red flag.
  2. "Do you have licenses for the training data, or are you relying on fair use?"
    • If it\'s fair use, they\'re gambling. And you're gambling with them.
    • If they're licensing content, ask to see proof (or at least confirmation that licenses are in place).
  3. "If a court rules that AI training isn't fair use, what happens to this tool?"
    • Do they have a plan? Will they rebuild the AI with licensed data? Will the tool disappear?
    • Will you get a refund if the tool becomes unusable due to legal issues?
  4. "What happens if you get sued over AI training data?"
    • Are you (the customer) indemnified? Or are you on your own?
    • What's their legal defense strategy?
  5. "Does your AI reproduce copyrighted content in its output?"
    • If it\'s generating full article summaries, reproducing text verbatim, or creating derivative works... that\'s a problem.
    • Even if the training data use is fair, the output might still violate copyright.

If your vendor can't answer these questions clearly, you need to decide: Is the AI feature worth the legal uncertainty?

What You Can Control (And What You Can't)

Things you can't control:

Things you can control:

My Unpopular Opinion

I think libraries should stop pretending AI legal risks don't apply to them.

The Internet Archive case showed that "we\'re a nonprofit" and "we serve the public good" aren\'t magic words that make copyright law go away. Libraries are subject to the same rules as everyone else.

And when it comes to AI, the rules are murky. Fair use is a gray area. Training data sources are often undisclosed. Vendors are making bets on legal theories that might not hold up in court.

You don't have to avoid AI entirely. But you do need to go in with eyes open.

That means:

The Scenario You're Not Prepared For

Here's what keeps me up at night:

It\'s mid-2026. A court rules that AI training on copyrighted content without explicit licenses is infringement (the New York Times case could deliver this ruling any day). Your discovery system vendor (let\'s call them "MegaSearch") used unlicensed training data for their AI recommendation engine.

MegaSearch gets sued. They lose. They're ordered to stop using the AI and pay damages.

Suddenly, the AI features you\'ve been relying on disappear overnight. Your patrons are confused. Your staff doesn\'t know how to explain it. Your administration is asking why you didn't see this coming.

And then MegaSearch sends you a bill. A "compliance surcharge" to cover their legal costs and rebuild the AI with licensed data. It's 40% more than your current subscription. Take it or leave it.

What do you do?

If you don\'t have an answer, you're not ready.

What You Should Do This Week

  1. Make a list of every AI tool you use or are considering.
  2. Email your vendors and ask the five questions above.
  3. Document their responses (or lack of response).
  4. Decide your risk tolerance: Are you okay using AI tools with unclear legal foundations? Or do you want to wait until the law is clearer?
  5. Create a backup plan: If your AI tools disappear tomorrow, what's your fallback?

This isn't hypothetical. The Internet Archive case is done. The AI lawsuits are active. Courts are deciding this stuff right now.

Don\'t wait until it\'s too late.


See Also (Related Legal Timeline Posts)

Further Reading

Need help evaluating your vendor's AI legal risks? Reach out.

Filed under: Copyright & AI

Want updates (or backup)?

Get new posts by email, or book a free 30-minute call if you're facing a contract, AI policy, or vendor decision.

Get the newsletter Get help
[an error occurred while processing this directive]