The Internet Archive Lost Its Lawsuit. Here's What It Means for AI and Your Library
[an error occurred while processing this directive]Why I wrote this: I briefed a library lawyer who hadn't read the ruling and was about to sign a risky vendor deal.
Thinking "we\'re not the Internet Archive' is denial - your contracts collide with the same precedent.
Let me be clear about what happened here, because the library press got it wrong.
- Internet Archive Controlled Digital Lending (CDL) lawsuit: publishers arguing that libraries' digital lending infringes copyright. If publishers win, libraries lose ability to lend digital content fairly.
- At stake: libraries' legal right to lend without vendor middlemen, fair use interpretation for digital era, and whether "owned-to-loaned" model applies to digital materials.
- If publishers win: libraries must license all digital lending through publishers/vendors (much more expensive than acquiring and lending physical books), reducing digital access to poor and rural patrons.
- Library action: support Internet Archive in litigation, advocate for legal protection of library lending rights in digital context, and document current CDL usage for legislative support.
In June 2020, publishers sued the Internet Archive over "Controlled Digital Lending" - a program where they scanned books they physically owned and lent digital copies on a one-to-one basis. One physical book = one person reading digitally at a time. That's it.
A federal judge ruled against them. The Second Circuit upheld it. The Supreme Court let it stand. And suddenly, the one legal theory libraries had been counting on for digital lending just died.
Here\'s what I\'m angry about: Publishers spent millions suing a nonprofit that was doing exactly what libraries have always done - lending books. Meanwhile, tech companies are training AI on millions of copyrighted works without asking permission or paying anyone, and courts are letting them. The rules aren't being applied equally. Libraries trying to do right by copyright law got punished. Tech companies ignoring it are moving forward.
And here's the part that should terrify you: The same legal arguments Internet Archive lost? Your vendors are betting their entire AI business on those same arguments winning in a different courtroom.
TL;DR
- Internet Archive lost: "Fair use" doesn't protect digital lending of copyrighted books, even with nonprofit status and one-to-one lending limits.
- AI companies are using identical fair use arguments to justify training on copyrighted content. Courts are skeptical (Authors Guild v. OpenAI has mixed rulings, heading to trial 2026).
- Your vendor AI tools might be built on unlicensed copyrighted content. If courts rule against AI companies, you absorb the cost: higher fees, degraded tools, or removal of AI features.
What Actually Happened
The Internet Archive tried something bold. They scanned books they physically owned - books sitting in their building, books they held legal title to. Then they digitized them and lent digital copies with a one-to-one lending model: one person could borrow a digital copy only when nobody else was borrowing it. Not selling copies. Not giving away unlimited access. Lending, just like every library does.
Hachette, Penguin Random House, Simon & Schuster, and HarperCollins saw this and sued. They claimed it was piracy.
Let me be specific about what happened next, because this is the part libraries need to understand.
Judge John G. Koeltl sided with the publishers. Here's what he said mattered:
- The Internet Archive made complete digital copies, not excerpts
- Those copies substituted for e-book sales (libraries could borrow digitally instead of buying licenses)
- Publishers lost money - and he said the harm was real and measurable
- Being nonprofit didn\'t matter. The fair use exception doesn\'t have an asterisk for nonprofits
The Second Circuit Court of Appeals agreed in September 2024. They affirmed every part of Koeltl's decision. Then the Internet Archive asked the Supreme Court to hear it. The Supreme Court declined review in late January 2026.
That\'s the final nail. CDL is dead as a legal strategy. The Internet Archive lost. And every library director who thought "maybe we can do this someday" just found out they can\'t.
Why This Matters for AI (And Your Library)
Here's the part that should make you furious.
OpenAI, Google, Meta, Anthropic - they're training their AI models on massive amounts of copyrighted content. Books without permission. Articles without licenses. Images without payment. They scraped it from the internet, ran it through their systems, and built a trillion-dollar industry on top of content they never licensed.
Now they're getting sued. Authors Guild v. OpenAI. New York Times v. OpenAI. Getty Images v. Stability AI. Multiple lawsuits, all asking the same question: Is training AI on copyrighted content copyright infringement?
And here's where the hypocrisy lives: AI companies are using the exact same "fair use" defense that just got the Internet Archive destroyed in court.
OpenAI says: "We\'re transforming the content into a new tool. We\'re serving the public interest. The AI output doesn't substitute for the original works. This is how AI learns."
Internet Archive said: "We\'re transforming physical lending into digital lending. We\'re serving the public interest. The digital copies don't substitute for the originals - people can only borrow one at a time. This is how libraries work."
Judge Koeltl listened to the Internet Archive\'s arguments and said: "This causes market harm. Publishers lose money when libraries lend digital copies instead of buying licenses. Fair use doesn\'t apply."
So now tell me - how is what OpenAI did different? They cause market harm. Authors and publishers lose money when people get answers from ChatGPT instead of reading books. Publishers and authors are suing for exactly that reason. They're losing revenue. And unlike Internet Archive, OpenAI is a for-profit company making billions.
I know this sounds like I\'m being unfair to AI companies. But I\'m not. The Internet Archive lent books. OpenAI is the only party that scraped, copied, and commercialized copyrighted work without asking. And they're using a fair use argument that courts already rejected when someone else tried it.
The difference? Tech companies have better lawyers and more money.
The Vendor Problem You're Not Thinking About
Your vendors are sitting on a time bomb and hoping nobody notices.
EBSCO, ProQuest, Innovative, Ex Libris - they all have AI features now. Discovery systems with AI-powered recommendations. Catalog tools with intelligent categorization. Research assistants that summarize journal articles. Chatbots that answer patron questions.
Ask yourself: Where did the training data come from?
EBSCO\'s research assistant? Trained on journal articles. Where did they get permission to use those articles for AI training? They probably didn\'t ask. ProQuest\'s AI summarization tool? Trained on copyrighted books and academic content. Innovative\'s discovery layer? Trained on library data and published content. Your vendor probably scraped it, licensed it broadly for "search," and then used it for "AI" - which is not the same thing.
Now imagine this: It's mid-2026. The New York Times wins its case against OpenAI. The court rules that training AI on copyrighted content without explicit permission is infringement, not fair use. Publishers and authors celebrate. Tech companies scramble.
Your vendors panic.
Because suddenly they're running AI tools that a court just declared illegal.
What happens next?
Scenario 1: Your vendor gets sued. They lose. They settle for damages. Those costs get passed to you as a "compliance fee." Your subscription goes up 30%, 40%, maybe more.
Scenario 2: Your vendor doesn\'t get sued (because publishers have limited resources), but they panic anyway. They rebuild the AI with only properly licensed training data. The tool gets worse. It\'s slower. Less accurate. Less useful. Or it disappears entirely because building AI with only licensed data is too expensive.
Scenario 3: Your vendor sees the legal risk and just removes the AI features. You lose the functionality you've been relying on. Patrons are confused. Staff has to relearn workflows.
And in every scenario, you - the library - are stuck with the consequences. You didn\'t build the AI. You didn\'t choose the training data. You just signed a contract and trusted your vendor.
The HathiTrust Exception (And Why It's Narrow)
"But wait," you might say, "what about the HathiTrust case? Libraries won that one!"
True. In Authors Guild v. HathiTrust (2014), libraries successfully argued that creating a searchable database of scanned books was fair use. The court said full-text search and accessibility features (like text-to-speech for the blind) were transformative uses that didn't harm the market for the original works.
That case is often cited as proof that libraries can use copyrighted content for digital tools. But here's the catch: HathiTrust was really narrow.
The court said full-text search was okay. Not full-text display. Not full-text lending. Just search - where users could find keywords and see snippets, but not read the whole book.
Now apply that to AI.
If your AI tool is summarizing full articles, generating answers based on copyrighted content, or recommending resources by analyzing entire texts... that\'s not just search. That\'s reproducing and transforming the content in ways HathiTrust didn't cover.
And after the Internet Archive case, courts are clearly skeptical of "but we're a library" as a fair use defense.
What the AI Copyright Cases Tell Us
Multiple lawsuits tested whether AI training is fair use:
Authors Guild, Sarah Silverman, and others v. OpenAI (filed 2023): Authors claimed OpenAI trained ChatGPT on pirated copies of their books without permission. In late 2025, courts issued mixed rulings - some claims survived, others were dismissed, and the case is heading toward trial in 2026.
Getty Images v. Stability AI (filed 2023): Getty claimed Stability AI scraped millions of copyrighted images to train AI without licensing them. Settlement negotiations are ongoing.
New York Times v. OpenAI and Microsoft (filed 2023): The Times claimed ChatGPT reproduces Times articles verbatim, violating copyright. This case is moving forward and will likely be the landmark decision.
The pattern is clear: Courts are skeptical of the "AI training is fair use" argument. They're asking hard questions about:
- Whether AI output substitutes for the original works
- Whether AI companies are profiting off unlicensed content
- Whether "transformation" is enough when the original work is reproduced in output
As of early 2026, no court has definitively ruled that AI training is fair use. Which means every library using AI tools built on copyrighted content is operating in legal uncertainty.
The Questions You Need to Ask Your Vendors Right Now
I\'m not saying don\'t use AI tools. I\'m saying you need to know what legal risks you're taking on.
Next vendor meeting, ask:
1. "What data did you use to train this AI?"
If they say "publicly available data" or "internet-scale datasets," that\'s a red flag. "Publicly available" doesn\'t mean "licensed for AI training."
If they can\'t or won\'t tell you, that's an even bigger red flag.
2. "Do you have licenses for the training data, or are you relying on fair use?"
If it\'s fair use, they\'re gambling. And you're gambling with them.
If they're licensing content, ask to see proof (or at least confirmation that licenses are in place).
3. "If a court rules that AI training isn't fair use, what happens to this tool?"
Do they have a plan? Will they rebuild the AI with licensed data? Will the tool disappear?
Will you get a refund if the tool becomes unusable due to legal issues?
4. "What happens if you get sued over AI training data?"
Are you (the customer) indemnified? Or are you on your own?
What's their legal defense strategy?
5. "Does your AI reproduce copyrighted content in its output?"
If it\'s generating full article summaries, reproducing text verbatim, or creating derivative works... that\'s a problem.
Even if the training data use is fair, the output might still violate copyright.
If your vendor can't answer these questions clearly, you need to decide: Is the AI feature worth the legal uncertainty?
What You Can Control (And What You Can't)
Things you can't control:
- Whether courts decide AI training is fair use
- Whether your vendors get sued
- Whether copyright law changes to accommodate AI
Things you can control:
- Which vendors you choose (pick ones that are transparent about training data)
- What contract terms you negotiate (indemnification clauses, exit clauses if tools become unavailable)
- How you document your decision-making (if someone asks "Did you do due diligence?" you need to say yes)
My Unpopular Opinion
Libraries should be furious about the Internet Archive ruling, not just sad about it.
Here\'s what the court decided: If you're a nonprofit serving the public good, that still doesn\'t protect you from copyright law. "We\'re trying to help" is not a defense. "We're operating under fair use" needs to survive four legal tests, and lending digital books failed them all.
But somehow - somehow - OpenAI is still operating. Meta is still operating. These companies scraped copyrighted content at scale, trained billion-dollar AI models on it, and are making massive profits. Lawsuits are happening, but the tools are still there. The companies are still operating. They're betting that courts will let them keep doing this.
So here's what pisses me off: The nonprofit library trying to do lending responsibly got shut down by the courts. The for-profit tech company extracting value from creators has barely paused.
And libraries are now buying AI tools from these companies, paying for the privilege of using technology built on legal theories that lost in court.
I\'m not saying you should never use AI. I\'m saying: don\'t use it with your eyes closed. Don\'t accept your vendor\'s assurances. Don\'t assume the legal risk won't materialize.
You need to know exactly:
- What data trained your AI tools - not vague descriptions, specific sources
- Whether your vendor licensed that data or is betting on fair use
- What your vendor will do if courts rule against them
- Whether you're indemnified if this blows up
Because the Internet Archive taught us something: Fair use arguments that sound reasonable to librarians might not sound reasonable to judges. And "but I thought it was legal" isn\'t a defense when you're the one paying damages.
The Scenario You're Not Prepared For
Let me walk you through what\'s actually going to happen. Because the courts are moving. The cases are active. And this isn\'t hypothetical anymore.
June 2026. The New York Times case against OpenAI reaches summary judgment. The judge rules: training large language models on copyrighted content without licenses is copyright infringement, not fair use. The opinion is devastating. It\'s specific. It names names. It says "transformative use" doesn\'t matter when the original work's commercial value is destroyed.
Every AI company starts panicking. Every vendor holding unlicensed training data knows what's coming.
July 2026. Your discovery system vendor - let\'s call them "MegaSearch" - realizes their AI recommendation engine was trained on copyrighted journal articles and book excerpts they never licensed for AI training. They had broad licenses for search, but not for machine learning. They\'re exposed.
They send you a notification: "We're temporarily disabling the AI features while we address legal concerns."
Your staff calls you. The feature your patrons depend on is gone. Teachers who were using the AI research assistant can't anymore. Patrons are frustrated. Your circulation drops.
August 2026. MegaSearch decides to rebuild the AI with properly licensed data. The licensing costs are enormous. They have two choices: (1) eat the cost and go bankrupt, or (2) pass it to you.
They choose option 2.
Your renewal notice arrives: "Due to enhanced copyright compliance and AI licensing requirements, your subscription will increase to $X per year." It\'s 45% more than last year. Your director is furious. Your board asks why you didn\'t anticipate this. Your answer? You signed a contract with a vendor who didn't disclose their training data sources.
Do you pay it? Your patrons are asking for the features back. Your staff has already incorporated the AI into workflows. Switching vendors means retraining everyone and losing institutional knowledge.
Or do you drop the AI entirely? That means telling your patrons and staff that the tools they've been relying on are gone. It means admitting you made a purchase decision without understanding the legal risks.
If you don\'t have a plan for this - a real plan, not wishful thinking - you're going to make a bad decision under pressure.
What You Should Do This Week
- Audit your vendor contracts right now. List every AI-enabled tool you use. EBSCO, ProQuest, Innovative, Follett, whatever. Download the contracts if you have them. Find the sections about training data and AI.
- Email your vendors and ask the five questions above. Don\'t let them dodge. If they say "we can\'t disclose training data for competitive reasons," that's them admitting they have something to hide.
- Document everything. Save their responses, their non-responses, their evasions. If something goes wrong later, you need to prove you did due diligence.
- Talk to your board now, not later. Tell them: "We\'re using AI tools whose legal status is unclear. Here\'s the scenario that could happen. Here's what I need from you if it does."
- Create a contingency plan. If your AI features disappear tomorrow due to legal action, what's your backup? Can you fall back to non-AI search? Can you redirect patrons to free tools? Know this before you need it.
- Ask your legal counsel (if you have one) to review your vendor contracts specifically for AI indemnification clauses. If your vendor isn\'t indemnifying you for AI-related copyright issues, that\'s a massive red flag.
The Internet Archive case is final. The AI lawsuits are moving through federal courts right now. The New York Times case could produce a landmark ruling at any moment. This is not theoretical. This is happening.
Don't get caught unprepared. Move this week.
See Also (Related Legal Timeline Posts)
- Internet Archive Lawsuit Overview - Shorter summary of the CDL case and its implications
- AI Clauses in Vendor Contracts - How vendors are protecting themselves from AI liability while shifting risk to libraries
- Your Digital Collections Are Training AI - How your content may be used in AI training datasets without consent
Further Reading
- Hachette v. Internet Archive - Second Circuit Opinion (2024)
- Authors Guild v. HathiTrust - Second Circuit Opinion (2014)
- Active AI Copyright Lawsuits - Tracking List
Need help evaluating your vendor's AI legal risks? Reach out.