[an error occurred while processing this directive]

The Data Gap: What Vendors Know That Publishers Don't

Publishers have no visibility into what patrons actually read through libraries. Their data comes from retail sales (Amazon, bookstores) and from vendors who sell library ebook licenses. The data from vendors is heavily filtered:

  • No patron-level data: Publishers don't know how many individual people read each book
  • No demand data: Publishers don't know about unmet demand (hold queues for unavailable books)
  • No real-time data: Publishers receive aggregate quarterly data, months late
  • No comparative data: Publishers don\'t know if a book\'s popularity in one library is typical or anomalous

Vendors solve this problem for themselves. OverDrive, Hoopla, and other vendors collect real-time data on every checkout, every hold, every search. They aggregate this data and sell it back to publishers as "market intelligence."

Publishers pay vendors tens of thousands annually for access to library circulation data that libraries generated but never monetized.

The Numbers: What Aggregate Library Data Reveals

U.S. Public Libraries by the Numbers:
~16,000 public libraries in the U.S. Most report circulation data to state agencies and national surveys (IMLS). But they don't publish their own digital spending or circulation patterns. The data is locked in isolated systems.
$3.2B
Estimated annual U.S. library digital spending (author estimate based on IMLS and vendor data; no single authoritative source)
40-50%
Of that goes to ebook and database licensing
$2.10
Average cost per ebook checkout (aggregated)
~90%
Estimated ebook vendor market share controlled by OverDrive

Cost Patterns Hiding in the Data

  • Simultaneous-user license costs vary by library size: Large libraries pay ~$1.50/checkout. Small libraries pay ~$3.50/checkout. This is hidden pricing discrimination.
  • Release delays reduce circulation: Libraries report that ebook embargo periods (delaying library availability of new releases) reduce demand by 30-40%. Publishers don't see this data.
  • Wait queues drive demand: When patrons can access books instantly (Hoopla), circulation increases 40-60%. Libraries know this; vendors don't share it.
  • Hold ratios show unmet demand: In many libraries, hold ratios for popular ebooks exceed 4:1 (4 patrons waiting per copy). This represents massive unmet demand publishers can't see.

Who's Profiting From Library Silence

The Data Value Chain

  1. Libraries generate the data: Every ebook checkout, hold, search, and download happens through a vendor platform
  2. Vendors collect and own the data: The vendor's contract prohibits libraries from accessing detailed analytics
  3. Vendors monetize the data: They sell aggregate data to publishers as "market intelligence"
  4. Publishers use the data for pricing: They set ebook prices and availability based on vendor data about demand
  5. Libraries pay more because of that data: Publishers price knowing library demand and can extract higher prices
  6. Libraries see none of this value: They pay for the platforms, generate the data, but profit from none of it

OverDrive makes hundreds of millions annually. A significant portion comes from selling library circulation data to publishers. Libraries pay to participate in this system and receive nothing in return.

The Panorama Project Example

OverDrive\'s Panorama Project is research claiming that library ebook lending increases bookstore sales. The research uses (you guessed it) OverDrive\'s library circulation data plus bookstore sales data from Bookscan.

What libraries don't see: The research methodology, the actual circulation data, or the financial relationship between OverDrive and the publishers who benefit from this research.

What this accomplishes: It builds the narrative that libraries should accept higher licensing costs because ebook lending benefits authors and publishers through increased print sales. This narrative then justifies OverDrive's pricing power.

Libraries are unknowingly funding research that justifies their own higher prices.

What Libraries Should Be Publishing

Monthly Circulation Data

Every library should publish, at minimum:

  • Total ebook checkouts by month
  • Top 10 most-checked-out titles
  • Genres with highest demand
  • Average hold queue length for new releases
  • Cost per checkout by vendor

Format: CSV, publicly available, updated monthly. Takes 15 minutes.

Annual Spending Report

Once per year, publish:

  • Total digital collection spending
  • Breakdown by vendor
  • Cost per active user
  • Year-over-year cost changes
  • Unmet demand metrics (hold queues, wait times)

Sample Data: Little Schitt Creek Public Library (Fictional)

Monthly Circulation Report - January 2026
  • OverDrive ebook checkouts: 1,205 (average wait: 3.2 weeks)
  • Hoopla instant checkouts: 890 (average wait: 0 days)
  • Open Library CDL checkouts: 340
  • Total ebook circulation: 2,435
  • Total patrons with ebook access: 8,900
  • Top checked-out title: "Onyx Storm" by Rebecca Yarros (97 checkouts, 28-patron hold queue)
  • Patron unmet demand: ~1,200 patrons currently on hold for books they want

What This Data Reveals at Scale

If every library published this data, the aggregate picture would show:

Demand exceeds supply by 40-60%. Across all libraries publishing data, we\'d see that patron demand for ebooks far exceeds what libraries can purchase. This is the single biggest fact publishers don\'t acknowledge: library customers want books libraries can't afford.
Release embargoes reduce circulation. Comparing circulation of embargoed titles vs. non-embargoed titles across multiple libraries would show definitively that publisher embargo policies reduce total demand. Publishers need this data to understand their own policies' consequences.
Instant access drives adoption. Libraries offering instant-access ebook services (like Hoopla) show 40%+ higher adoption than simultaneous-user models. This data matters for vendor comparison.
Genre patterns shift seasonally. Aggregate data would show predictable patterns: romance sales peak in certain months, mysteries in others. Libraries could use this to budget more intelligently.

Why Libraries Don't Publish This Data

The Real Reasons

  • Vendor contracts prohibit it. Most licenses explicitly restrict libraries from sharing circulation data.
  • Privacy concerns. Libraries worry about disclosing patron reading habits (even anonymized).
  • Competition. Libraries see each other as competitors and don't share benchmarking data.
  • Lack of awareness. Many library directors don't understand the value of their own data.
  • Technology barriers. Data is trapped in vendor systems with no easy export.

Why These Reasons Are Solvable

  • "Vendor contracts prohibit it": Renegotiate. Make data publication a requirement in new contracts.
  • "Privacy concerns": Publish aggregate, anonymized data. Individual patron privacy stays protected. Aggregate circulation data is not sensitive.
  • "Competition": Benchmark data creates better markets. Share to make vendors compete on price and features.
  • "Lack of awareness": This article is your awareness. Share it with your director and board.
  • "Technology barriers": Demand data export from vendors. It's your data.

Start Publishing Your Data

15-Minute Action: Publish Your First Monthly Report

  1. Pull your last 12 months of ebook circulation data from each vendor
  2. Create a simple CSV with: Month, Vendor, Checkouts, Cost, Cost-per-checkout
  3. Post it to your library website or a public GitHub repo
  4. Tell your state library association you're doing this. Ask others to join.

What Libraries Gain By Publishing Data

  • Visibility: Your spending data becomes part of the public record. Vendors know you're transparent. This changes negotiation dynamics.
  • Collective power: When libraries publish data showing unmet demand, publishers can't ignore it. Individual complaints are easy to dismiss. Aggregate data is undeniable.
  • Benchmarking: See how your costs compare to other libraries. If you're 30% higher, you know there\'s room to negotiate.
  • Evidence: When proposing CDL or alternative vendors, you have data showing why change is necessary.
  • Vendor accountability: Vendors care about their reputation. Publishing comparative data creates pressure for better behavior.

The Bigger Picture: Breaking the Vendor Data Monopoly

Right now, vendors control library data. They profit from it. They use it against libraries in negotiations. They sell it to publishers who use it to price gouge libraries.

Libraries can break this monopoly by publishing their own data. When libraries become the source of market intelligence instead of vendors, the power dynamic shifts.

Imagine: Every library publishes monthly circulation data. Within a year, you have transparent market intelligence showing:

  • What patrons actually read
  • What libraries are paying
  • Where unmet demand exists
  • Which vendors offer the best value

This data is powerful. Publishers use it for pricing strategy. Libraries should too.

Start today: Talk to your IT department. Pull your last month of circulation data. Clean it up. Anonymize it. Post it. Tell one other library to do the same. That\'s how this starts. That\'s how the monopoly breaks.

Templates and Tools

Coming soon: CSV templates for monthly circulation reporting, data cleaning scripts, and instructions for posting data to GitHub.

What Data Should Remain Private

Publish aggregate data: Total checkouts, top titles, cost per checkout, genre patterns

Do NOT publish: Individual patron reading histories, personal circulation data, library-specific patron demographics

The goal is market intelligence, not patron surveillance. Aggregated circulation data is safe to publish and valuable to the market.

[an error occurred while processing this directive]