The Internet Archive Lost Its Lawsuit. Here's What That Means for AI and Your Library
[an error occurred while processing this directive]In March 2023, a federal judge ruled against the Internet Archive in Hachette v. Internet Archive, finding that the Archive's "Controlled Digital Lending" program violated copyright law. The decision was upheld on appeal in September 2024. The Internet Archive chose not to petition the Supreme Court, and the certiorari deadline passed in December 2024.
- Publishers vs. Internet Archive: publishers argue CDL violates copyright. Internet Archive contends fair use permits libraries to lend one copy of each acquired book regardless of format.
- Core legal question: does fair use protection extend to digital lending? Historical library lending rights developed for physical books; digital context is legally untested.
- If publishers win, implications cascade: future library digital content requires licensing not acquisition, costs balloon, and rural/poor patrons lose access (they can't use commercial e-lending services without stable housing/payment method).
- If Internet Archive wins, libraries maintain lending rights across digital and physical, fair use doctrine strengthens, and public access to information increases.
If you haven\'t been following this, you might think: "Okay, but I\'m not the Internet Archive. Why do I care?"
You should care because this case is about to collide headfirst with AI. And libraries are caught in the middle.
What Actually Happened
Quick background: The Internet Archive ran a program called Controlled Digital Lending (CDL). The idea was simple: They'd scan physical books they owned, lend digital copies on a one-to-one basis (one physical book = one digital loan at a time), and call it "digital lending" under fair use.
Publishers sued. They argued CDL was just piracy with extra steps, making unauthorized digital copies and distributing them without permission.
The court agreed with the publishers. Judge John G. Koeltl ruled that CDL wasn't fair use because:
- The Internet Archive was making copies of entire works (not just excerpts)
- The digital copies directly substituted for licensed e-book sales
- The market harm was significant (publishers lost e-book revenue)
- The nonprofit nature of the Archive didn't save them
The Second Circuit upheld this in September 2024. CDL is dead. The Internet Archive owes damages. And every library that was watching this case as a potential legal foundation for digital lending just lost.
Why This Matters for AI (And Your Library)
Here's where it gets uncomfortable.
AI companies are training their models on copyrighted content. Books, articles, images, code. Massive datasets scraped from the internet and used without permission or payment.
When publishers and authors sue (and they are, there are multiple active lawsuits), AI companies are using the same fair use arguments the Internet Archive tried to use.
They're saying:
- "We're transforming the content, not redistributing it"
- "This serves the public interest"
- "The original works aren't being substituted"
- "This is how AI learns, just like humans learn"
Sound familiar? The Internet Archive said:
- "We're providing access, not selling copies"
- "This serves the public interest (libraries!)"
- "We\'re not substituting for the original, it\'s controlled lending"
- "This is how libraries have always worked, just digitally"
The court didn\'t buy it. And there\'s a good chance courts won\'t buy the AI companies" version either.
Which brings us to libraries.
The Vendor Problem You're Not Thinking About
Your discovery systems, catalog tools, research databases, and recommendation engines? A lot of them use AI trained on copyrighted content.
Let\'s say EBSCO\'s AI research assistant was trained on journal articles. Or ProQuest\'s AI summarization tool learned from copyrighted book content. Or your library\'s AI chatbot was trained on datasets that include in-copyright text.
If the courts decide that's copyright infringement (not fair use), what happens?
Option 1: The vendor gets sued, loses, and passes the costs to you via higher subscription fees.
Option 2: The vendor loses access to the training data and has to rebuild the AI with licensed content only. That means the tool gets worse or more expensive.
Option 3: The vendor decides AI features aren't worth the legal risk and removes them entirely.
None of these are great for you.
The HathiTrust Exception (And Why It's Narrow)
"But wait," you might say, "what about the HathiTrust case? Libraries won that one!"
True. In Authors Guild v. HathiTrust (2014), libraries successfully argued that creating a searchable database of scanned books was fair use. The court said full-text search and accessibility features (like text-to-speech for the blind) were transformative uses that didn't harm the market for the original works.
That case is often cited as proof that libraries can use copyrighted content for digital tools. But here's the catch: HathiTrust was really narrow.
The court said full-text search was okay. Not full-text display. Not full-text lending. Just search. Users could find keywords and see snippets, but not read the whole book.
Now apply that to AI.
If your AI tool is summarizing full articles, generating answers based on copyrighted content, or recommending resources by analyzing entire texts... that\'s not just search. That\'s reproducing and transforming the content in ways HathiTrust didn't cover.
And after the Internet Archive case, courts are clearly skeptical of "but we're a library" as a fair use defense.
What the AI Copyright Cases Tell Us
Multiple lawsuits tested whether AI training is fair use:
- Authors Guild, Sarah Silverman, and others v. OpenAI (filed 2023): Authors claimed OpenAI trained ChatGPT on pirated copies of their books without permission. Multiple cases have been consolidated into MDL 3143 in the Southern District of New York, with proceedings ongoing.
- Getty Images v. Stability AI (filed 2023): Getty claimed Stability AI scraped millions of copyrighted images to train AI without licensing them. The UK High Court ruled in November 2025; the US case has been refiled and is proceeding.
- New York Times v. OpenAI and Microsoft (filed 2023): The Times claimed ChatGPT reproduces Times articles verbatim, violating copyright. This case is moving forward and will likely be the landmark decision.
The pattern is clear: Courts are skeptical of the "AI training is fair use" argument. They're asking hard questions about:
- Whether AI output substitutes for the original works
- Whether AI companies are profiting off unlicensed content
- Whether "transformation" is enough when the original work is reproduced in output
As of early 2026, no US court has definitively ruled that AI training is fair use. Which means every library using AI tools built on copyrighted content is operating in legal uncertainty.
The Questions You Need to Ask Your Vendors Right Now
I\'m not saying don\'t use AI tools. I\'m saying it\'s worth knowing what legal risks come with them.
Next vendor meeting, ask:
-
"What data did you use to train this AI?"
- If they say "publicly available data" or "internet-scale datasets," that\'s a red flag. "Publicly available" doesn\'t mean "licensed for AI training."
- If they can\'t or won\'t tell you, that's an even bigger red flag.
-
"Do you have licenses for the training data, or are you relying on fair use?"
- If it\'s fair use, they\'re gambling. And you're gambling with them.
- If they're licensing content, ask to see proof (or at least confirmation that licenses are in place).
-
"If a court rules that AI training isn't fair use, what happens to this tool?"
- Do they have a plan? Will they rebuild the AI with licensed data? Will the tool disappear?
- Will you get a refund if the tool becomes unusable due to legal issues?
-
"What happens if you get sued over AI training data?"
- Are you (the customer) indemnified? Or are you on your own?
- What's their legal defense strategy?
-
"Does your AI reproduce copyrighted content in its output?"
- If it\'s generating full article summaries, reproducing text verbatim, or creating derivative works... that\'s a problem.
- Even if the training data use is fair, the output might still violate copyright.
If your vendor can't answer these questions clearly, you need to decide: Is the AI feature worth the legal uncertainty?
What You Can Control (And What You Can't)
Things you can't control:
- Whether courts decide AI training is fair use
- Whether your vendors get sued
- Whether copyright law changes to accommodate AI
Things you can control:
- Which vendors you choose (pick ones that are transparent about training data)
- What contract terms you negotiate (indemnification clauses, exit clauses if tools become unavailable)
- How you document your decision-making (if someone asks "Did you do due diligence?" you need to say yes)
My Unpopular Opinion
I think libraries should stop pretending AI legal risks don't apply to them.
The Internet Archive case showed that "we\'re a nonprofit" and "we serve the public good" aren\'t magic words that make copyright law go away. Libraries are subject to the same rules as everyone else.
And when it comes to AI, the rules are murky. Fair use is a gray area. Training data sources are often undisclosed. Vendors are making bets on legal theories that might not hold up in court.
You don't have to avoid AI entirely. But you do need to go in with eyes open.
That means:
- Understanding what your AI tools actually do
- Knowing where the training data came from
- Having a plan for what happens if the legal landscape shifts
- Being ready to turn off AI features if they become legal liabilities
The Scenario You're Not Prepared For
Here's what keeps me up at night:
It\'s mid-2026. A court rules that AI training on copyrighted content without explicit licenses is infringement (the New York Times case could deliver this ruling any day). Your discovery system vendor (let\'s call them "MegaSearch") used unlicensed training data for their AI recommendation engine.
MegaSearch gets sued. They lose. They're ordered to stop using the AI and pay damages.
Suddenly, the AI features you\'ve been relying on disappear overnight. Your patrons are confused. Your staff doesn\'t know how to explain it. Your administration is asking why you didn't see this coming.
And then MegaSearch sends you a bill. A "compliance surcharge" to cover their legal costs and rebuild the AI with licensed data. It's 40% more than your current subscription. Take it or leave it.
What do you do?
If you don\'t have an answer, you're not ready.
What You Should Do This Week
- Make a list of every AI tool you use or are considering.
- Email your vendors and ask the five questions above.
- Document their responses (or lack of response).
- Decide your risk tolerance: Are you okay using AI tools with unclear legal foundations? Or do you want to wait until the law is clearer?
- Create a backup plan: If your AI tools disappear tomorrow, what's your fallback?
This isn't hypothetical. The Internet Archive case is done. The AI lawsuits are active. Courts are deciding this stuff right now.
Don\'t wait until it\'s too late.
See Also (Related Legal Timeline Posts)
- Internet Archive Lawsuit: What It Means for AI and Your Library - Deep dive on how this case affects vendor AI tools and library contracts
- AI Clauses in Vendor Contracts - How this precedent shapes what vendors put in contracts now
- Your Digital Collections Are Training AI - The flip side: how AI companies are using library content without asking
Further Reading
Need help evaluating your vendor's AI legal risks? Reach out.
Want updates (or backup)?
Get new posts by email, or book a free 30-minute call if you're facing a contract, AI policy, or vendor decision.