The Numbers: Why Library Digital Spending Data Is Valuable and Who's Profiting From It
Vendors collect and monetize library circulation data. They know how many patrons read which books at which institutions. They know what libraries want but can\'t get. They know your demand patterns and price accordingly. Libraries have all this data but share none of it. This is the biggest market intelligence failure in publishing. Here\'s what the numbers show, who's profiting from library silence, and why libraries need to start publishing their own market data.
The Data Gap: What Vendors Know That Publishers Don't
Publishers have no visibility into what patrons actually read through libraries. Their data comes from retail sales (Amazon, bookstores) and from vendors who sell library ebook licenses. The data from vendors is heavily filtered:
- No patron-level data: Publishers don't know how many individual people read each book
- No demand data: Publishers don't know about unmet demand (hold queues for unavailable books)
- No real-time data: Publishers receive aggregate quarterly data, months late
- No comparative data: Publishers don\'t know if a book\'s popularity in one library is typical or anomalous
Vendors solve this problem for themselves. OverDrive, Hoopla, and other vendors collect real-time data on every checkout, every hold, every search. They aggregate this data and sell it back to publishers as "market intelligence."
Publishers pay vendors tens of thousands annually for access to library circulation data that libraries generated but never monetized.
The Numbers: What Aggregate Library Data Reveals
~16,000 public libraries in the U.S. Most report circulation data to state agencies and national surveys (IMLS). But they don't publish their own digital spending or circulation patterns. The data is locked in isolated systems.
Cost Patterns Hiding in the Data
- Simultaneous-user license costs vary by library size: Large libraries pay ~$1.50/checkout. Small libraries pay ~$3.50/checkout. This is hidden pricing discrimination.
- Release delays reduce circulation: Libraries report that ebook embargo periods (delaying library availability of new releases) reduce demand by 30-40%. Publishers don't see this data.
- Wait queues drive demand: When patrons can access books instantly (Hoopla), circulation increases 40-60%. Libraries know this; vendors don't share it.
- Hold ratios show unmet demand: In many libraries, hold ratios for popular ebooks exceed 4:1 (4 patrons waiting per copy). This represents massive unmet demand publishers can't see.
Who's Profiting From Library Silence
The Data Value Chain
- Libraries generate the data: Every ebook checkout, hold, search, and download happens through a vendor platform
- Vendors collect and own the data: The vendor's contract prohibits libraries from accessing detailed analytics
- Vendors monetize the data: They sell aggregate data to publishers as "market intelligence"
- Publishers use the data for pricing: They set ebook prices and availability based on vendor data about demand
- Libraries pay more because of that data: Publishers price knowing library demand and can extract higher prices
- Libraries see none of this value: They pay for the platforms, generate the data, but profit from none of it
OverDrive makes hundreds of millions annually. A significant portion comes from selling library circulation data to publishers. Libraries pay to participate in this system and receive nothing in return.
The Panorama Project Example
OverDrive\'s Panorama Project is research claiming that library ebook lending increases bookstore sales. The research uses (you guessed it) OverDrive\'s library circulation data plus bookstore sales data from Bookscan.
What libraries don't see: The research methodology, the actual circulation data, or the financial relationship between OverDrive and the publishers who benefit from this research.
What this accomplishes: It builds the narrative that libraries should accept higher licensing costs because ebook lending benefits authors and publishers through increased print sales. This narrative then justifies OverDrive's pricing power.
Libraries are unknowingly funding research that justifies their own higher prices.
What Libraries Should Be Publishing
Monthly Circulation Data
Every library should publish, at minimum:
- Total ebook checkouts by month
- Top 10 most-checked-out titles
- Genres with highest demand
- Average hold queue length for new releases
- Cost per checkout by vendor
Format: CSV, publicly available, updated monthly. Takes 15 minutes.
Annual Spending Report
Once per year, publish:
- Total digital collection spending
- Breakdown by vendor
- Cost per active user
- Year-over-year cost changes
- Unmet demand metrics (hold queues, wait times)
Sample Data: Little Schitt Creek Public Library (Fictional)
- OverDrive ebook checkouts: 1,205 (average wait: 3.2 weeks)
- Hoopla instant checkouts: 890 (average wait: 0 days)
- Open Library CDL checkouts: 340
- Total ebook circulation: 2,435
- Total patrons with ebook access: 8,900
- Top checked-out title: "Onyx Storm" by Rebecca Yarros (97 checkouts, 28-patron hold queue)
- Patron unmet demand: ~1,200 patrons currently on hold for books they want
What This Data Reveals at Scale
If every library published this data, the aggregate picture would show:
Why Libraries Don't Publish This Data
The Real Reasons
- Vendor contracts prohibit it. Most licenses explicitly restrict libraries from sharing circulation data.
- Privacy concerns. Libraries worry about disclosing patron reading habits (even anonymized).
- Competition. Libraries see each other as competitors and don't share benchmarking data.
- Lack of awareness. Many library directors don't understand the value of their own data.
- Technology barriers. Data is trapped in vendor systems with no easy export.
Why These Reasons Are Solvable
- "Vendor contracts prohibit it": Renegotiate. Make data publication a requirement in new contracts.
- "Privacy concerns": Publish aggregate, anonymized data. Individual patron privacy stays protected. Aggregate circulation data is not sensitive.
- "Competition": Benchmark data creates better markets. Share to make vendors compete on price and features.
- "Lack of awareness": This article is your awareness. Share it with your director and board.
- "Technology barriers": Demand data export from vendors. It's your data.
Start Publishing Your Data
15-Minute Action: Publish Your First Monthly Report
- Pull your last 12 months of ebook circulation data from each vendor
- Create a simple CSV with: Month, Vendor, Checkouts, Cost, Cost-per-checkout
- Post it to your library website or a public GitHub repo
- Tell your state library association you're doing this. Ask others to join.
What Libraries Gain By Publishing Data
- Visibility: Your spending data becomes part of the public record. Vendors know you're transparent. This changes negotiation dynamics.
- Collective power: When libraries publish data showing unmet demand, publishers can't ignore it. Individual complaints are easy to dismiss. Aggregate data is undeniable.
- Benchmarking: See how your costs compare to other libraries. If you're 30% higher, you know there\'s room to negotiate.
- Evidence: When proposing CDL or alternative vendors, you have data showing why change is necessary.
- Vendor accountability: Vendors care about their reputation. Publishing comparative data creates pressure for better behavior.
The Bigger Picture: Breaking the Vendor Data Monopoly
Right now, vendors control library data. They profit from it. They use it against libraries in negotiations. They sell it to publishers who use it to price gouge libraries.
Libraries can break this monopoly by publishing their own data. When libraries become the source of market intelligence instead of vendors, the power dynamic shifts.
Imagine: Every library publishes monthly circulation data. Within a year, you have transparent market intelligence showing:
- What patrons actually read
- What libraries are paying
- Where unmet demand exists
- Which vendors offer the best value
This data is powerful. Publishers use it for pricing strategy. Libraries should too.
Templates and Tools
Coming soon: CSV templates for monthly circulation reporting, data cleaning scripts, and instructions for posting data to GitHub.
What Data Should Remain Private
Publish aggregate data: Total checkouts, top titles, cost per checkout, genre patterns
Do NOT publish: Individual patron reading histories, personal circulation data, library-specific patron demographics
The goal is market intelligence, not patron surveillance. Aggregated circulation data is safe to publish and valuable to the market.