book-search
Use Cautionnpx machina-cli add skill KevinBatdorf/anna/book-search --openclawBook Search, Library & Reader
This skill provides access to a self-hosted REST API that indexes book records from Anna's Archive (Zlib3). Optionally includes Goodreads ratings/reviews (a static Sep 2024 snapshot — useful for ratings, genres, and vector search but not regularly updated). Search for books, get recommendations, download to a local library, read PDF content page-by-page, and search within books using semantic embeddings. Call /stats to see current record counts.
API Base URL
The API runs locally. Default: http://localhost:3100
Endpoints
Search & Discovery
| Endpoint | Description |
|---|---|
GET /search?q=...&author=&publisher=&language=&year=&ext=&dedupe=true&limit=20&offset=0 | Search Zlib3 book records (FTS + filters) |
GET /search/goodreads?q=...&author=&year=&genre=&search_type=&limit=20&offset=0 | Search Goodreads ratings & reviews |
GET /similar?q=...&limit=10&min_rating=0&min_reviews=0 | Similar books via vector search |
GET /lookup/md5?md5=... | Look up a book by MD5 hash |
GET /lookup/isbn?isbn=... | Look up by ISBN (book file + Goodreads) |
GET /download?md5=... | Get download URL (proxies Anna's Archive API) |
GET /stats | Database stats |
Library (local collection)
| Endpoint | Description |
|---|---|
GET /library | List all downloaded books |
GET /library/search?q=... | Search within downloaded books |
POST /library/download?md5=... | Download a book to the local library |
GET /library/:md5/file | Serve a downloaded file |
DELETE /library/:md5 | Remove a book from the library |
Reader (PDF content)
| Endpoint | Description |
|---|---|
GET /reader/:md5/status | Book status: pages extracted/embedded, chapters, search readiness |
POST /reader/:md5/index | Extract text from every PDF page (auto-runs on download) |
GET /reader/:md5/page/:page | Get extracted text for a specific page |
POST /reader/:md5/embed | Create vector embeddings for all pages (auto-runs on download) |
GET /reader/:md5/search?q=...&limit=5 | Semantic search within a book |
GET /reader/:md5/page/:page/image | Render a page as PNG image |
Search
GET /search?q=<query>&author=&publisher=&language=&year=&ext=pdf&dedupe=true&limit=20&offset=0
Returns Zlib3 records: title, author, publisher, language, year, extension, filesize, pages, md5, isbn, series.
Either q or at least one filter is required. All params are optional and can be combined:
q— full-text search across title, author, publisher, description, ISBNauthor— filter by author (partial match, e.g.author=Tolkien)publisher— filter by publisher (partial match, e.g.publisher=No Starch)language— filter by language (exact match, e.g.language=english)year— filter by publication year (exact match, e.g.year=2024)ext— filter by file format (e.g.ext=pdf,ext=epub)dedupe— deduplicate results by title+author, keeping the best format (pdf > epub > other). Default:true.
When q is provided, results are sorted by relevance. Without q, sorted by newest first.
Search Goodreads (optional)
Goodreads data is a static snapshot from September 2024 — useful for ratings, genres, descriptions, and semantic vector search, but not regularly updated by Anna's Archive.
GET /search/goodreads?q=<query>&author=&year=&genre=&search_type=&limit=20&offset=0
Returns Goodreads entries: title, author, rating, ratings_count, description, genres, isbn, pages, year. Vector results also include similarity (0-1).
Either q or at least one filter is required:
q— full-text search (uses vector search when available, otherwise FTS)author— filter by author (partial match)year— filter by publication year (exact match)genre— filter by genre (partial match, e.g.genre=fantasy)search_type— force search method:fts(full-text only) orvector(vector only). Default: auto (vector for plainqwith no filters, FTS otherwise). Returns400ifvectoris requested but unavailable.
By default, vector search is used for plain q queries with no filters, falling back to FTS. When filters are present, FTS is always used. Without q, results are sorted by rating (highest first).
Similar
GET /similar?q=<isbn_or_title>&limit=10&min_rating=0&min_reviews=0
Find books similar to a given book using vector embeddings. The q parameter accepts an ISBN (preferred) or an exact book title.
How matching works:
- ISBN (10-13 digits): Direct lookup — fastest and most reliable
- Title: FTS search + strict word-level matching against the main title (before any
:subtitle). The query words must cover at least 60% of the title words. Partial or vague titles will returnfound: false.
Best practice: Always pass an ISBN when available. Title matching is strict by design (this API is built for AI agents, not fuzzy human queries).
Response shape:
found: true— matched a Goodreads entry. Returnssource(the matched book) andresults(similar books). Each result hassimilarity(0-1) andavailable(true/false for downloadable copies).found: false— no Goodreads match. May include adownloadobject if the book exists in the books table (Zlib3).
Optional parameters:
min_rating— minimum Goodreads rating (e.g.min_rating=3.5). Default:0(no filter).min_reviews— minimum number of ratings (e.g.min_reviews=100). Default:0(no filter).
Returns 503 if vector search is not configured.
Lookup
GET /lookup/md5?md5=<hash>— returns book recordGET /lookup/isbn?isbn=<isbn>— returns{ book, goodreads }(either may benull)
Download
GET /download?md5=<hash>
Proxies the Anna's Archive fast download API. Requires ANNAS_API_KEY in the server's .env. Returns 503 if not configured.
The response includes account_fast_download_info with downloads_left, downloads_per_day, and downloads_done_today. Check these to avoid exceeding the daily limit.
Library
The library stores downloaded books locally for offline access. Books are downloaded via Anna's Archive API and stored on disk.
GET /library?limit=20&offset=0
Returns downloaded books with metadata and download timestamps, ordered by most recently downloaded.
POST /library/download?md5=<hash>
Downloads a book file from Anna's Archive and stores it locally. For PDF books, this automatically triggers text extraction (indexing) and embedding creation in the background — no need to call /reader/:md5/index or /reader/:md5/embed manually.
Reader
The reader provides PDF content access: text extraction, page rendering, chapter detection, and semantic search within a book.
Status — GET /reader/:md5/status returns:
pages_extracted/pages_embedded— progress countersready_for_search— true when all pages are embeddedchapters— table of contents extracted from PDF bookmarks (hierarchical, withtitle,page, and optionalchildren)
Text extraction — POST /reader/:md5/index extracts text from every page using pdftotext. Also extracts the PDF outline/bookmarks as the chapter tree. Auto-runs on download; returns a skip message if already indexed (use ?force=true to re-extract).
Page text — GET /reader/:md5/page/:page returns the extracted text for one page (1-based). Note: pages that are purely images (like covers) return only a form-feed character — use the image endpoint for those.
Page image — GET /reader/:md5/page/:page/image renders a page as a PNG at 150 DPI. Useful for cover pages, diagrams, or any page where text extraction is insufficient.
Embeddings — POST /reader/:md5/embed creates vector embeddings for all extracted pages via Ollama. Required for semantic search. Auto-runs on download if Ollama is configured.
Search — GET /reader/:md5/search?q=...&limit=5 performs semantic search within a book. Returns the most relevant pages ranked by vector distance. The book must be fully indexed and embedded.
Choosing the Right Endpoint
| User intent | Endpoint | Why |
|---|---|---|
| Specific title/author ("Do you have Dune?") | /search?q=dune+frank+herbert | FTS keyword match, returns downloadable files |
| Publisher browsing ("No Starch Press books") | /search?publisher=no+starch | Direct publisher filter, no FTS needed |
| Author catalog ("books by Kernighan") | /search?author=kernighan | Direct author filter |
| Filtered search ("Python books in English") | /search?q=python&language=english | FTS + language filter |
| Topical discovery ("books about stoicism") | /search/goodreads?q=stoicism | Semantic vec search across Goodreads catalog |
| Genre browsing ("fantasy books") | /search/goodreads?genre=fantasy | Direct genre filter on Goodreads |
| Quality picks ("recommend a sci-fi book") | /similar with min_rating=3.5&min_reviews=100 | Vec search + rating filter |
| Similar books ("books like Project Hail Mary") | /similar?q=<isbn> | ISBN gives best match; falls back to title |
| Rating/metadata lookup | /lookup/isbn or /lookup/md5 | Direct lookup by identifier |
| Download a book | /library/download?md5=<hash> | Downloads file + auto-indexes PDF |
| List my downloaded books | /library | Shows local collection |
| Read a specific page | /reader/:md5/page/:page | Text content of one page |
| View a page visually | /reader/:md5/page/:page/image | PNG render at 150 DPI |
| See book chapters/TOC | /reader/:md5/status | Chapters in the response |
| Search inside a book | /reader/:md5/search?q=... | Semantic search across pages |
Agent Workflows
"Do you have Dune?"
- Call
/search?q=dune+frank+herbert— returns one result per book (PDF preferred) - If ISBN is known, call
/lookup/isbn?isbn=<isbn>for Goodreads data too - Report format, size, and rating
- If user wants a different format, use
dedupe=falseto see all available formats
"Find me books like Project Hail Mary"
- Call
/lookup/isbn?isbn=<isbn>or/search/goodreads?q=project+hail+maryto get the ISBN - Call
/similar?q=<isbn>— ISBN gives the most reliable match - If no ISBN available, fall back to
/similar?q=project+hail+mary(exact title match required) - Results include a
sourcefield (the matched book) and semantically similar books withavailable: true/false - If user wants a file and
availableis true, look up the ISBN via/lookup/isbnthen/download?md5=<hash>
"Find me a good science fiction book"
- Call
/search/goodreads?q=science+fictionfor semantic search across the catalog - Present top results with ratings
- For similar-to recommendations, pick a book and call
/similar?q=<isbn>
"Books about stoicism" (topical/vague query)
- Call
/search/goodreads?q=stoicism— uses vector search for semantic matching - Present results with ratings and descriptions
- Do NOT use
/similarfor vague queries — it requires an exact book title or ISBN
"What's the rating for this book?" (given an MD5)
- Call
/lookup/md5?md5=<hash>to get ISBN - Call
/lookup/isbn?isbn=<isbn>for Goodreads rating
"Download this book and read it"
- Call
/reader/<md5>/statusfirst — ifdownloadedis true, the book is already in the library (skip to step 3) - Call
/library/download?md5=<hash>— downloads file and auto-indexes PDF pages + creates embeddings - Call
/reader/<md5>/status— check that indexing is complete (pages_extracted > 0) - Use
chaptersfrom the status response to navigate the book by chapter - Call
/reader/<md5>/page/<page>for text or/reader/<md5>/page/<page>/imagefor visual rendering
"What does this book say about X?"
- Call
/reader/<md5>/status— ifdownloadedis false, download it first via/library/download - Verify
ready_for_searchis true (if not, wait for auto-indexing or trigger manually) - Call
/reader/<md5>/search?q=X— returns most relevant pages ranked by semantic similarity - Read the returned page content, or fetch specific pages with
/reader/<md5>/page/<page> - Use the
chaptersfrom status to provide context about which chapter the result is in
"Show me the table of contents"
- Call
/reader/<md5>/status— ifdownloadedis false, download it first via/library/download - The
chaptersfield contains the full hierarchical table of contents - Each entry has
title,page, and optionalchildren(sub-sections) - Use page numbers to navigate directly to specific sections
General: always check before downloading
Before calling /library/download, check if the book is already downloaded:
- Call
/reader/<md5>/status— ifdownloadedis true, skip the download - Or call
/libraryto see the full library and check if the md5 is already there
This avoids re-downloading and wasting the daily download quota (shown in /download responses as downloads_left).
Handling Domain Errors
If /download returns a 502, or any Anna's Archive request fails, the configured domain is likely dead. Only in this case, resolve a new domain:
- Run
bun run scripts/resolve-domain.tsfrom the project root — it fetches the Anna's Archive Wikipedia page, extracts candidateannas-archive.*domains, tests each against/dyn/torrents.json, and prints the first working one - Update
ANNAS_BASE_URLin the project's.envfile to the working domain - Restart the API and updater containers to pick up the change
Do not proactively check the domain — only resolve when an actual request fails.
Error Codes
400: missing required parameter404: no matching record500: search/database error502: upstream Anna's Archive request failed503:ANNAS_API_KEYnot configured, or vector search not available (for/similar)
Additional Resources
references/annas-archive-api.md— Full Anna's Archive API docs, dataset formats, and authentication details
Source
git clone https://github.com/KevinBatdorf/anna/blob/main/skills/book-search/SKILL.mdView on GitHub Overview
This skill exposes a locally hosted REST API that indexes Anna's Archive (Zlib3) book records, enabling fast search, recommendations, and library management. It can optionally include Goodreads ratings from a static Sep 2024 snapshot and supports reading PDF content page-by-page with semantic search. Use /stats to view current record counts at the local base URL.
How This Skill Works
Run the API locally (default http://localhost:3100). Use endpoints like /search, /lookup/isbn, /lookup/md5, /download, /library, and /reader to locate, fetch, download, and read books; the reader endpoints provide page-level text, images, and embedding for semantic search. Goodreads data is a one-time snapshot and not continuously updated, while vector search supports similarity-based discovery and /stats reports counts.
When to Use It
- User asks to find or search for books, get recommendations, or browse by metadata (author, publisher, language, year).
- User wants to look up a book by ISBN or MD5 hash to fetch file details or Goodreads data.
- User wants to download a book to a local library or serve a downloaded file for reading.
- User wants to read a book, view a page image, see chapters, or search within a book (semantic or page-level).
- User references Anna's Archive, Goodreads ratings, library, or reader and needs integrated access to those data points.
Quick Start
- Step 1: Ensure the local API is running at http://localhost:3100
- Step 2: Locate books with /search or look up by ISBN/MD5 using /lookup/isbn or /lookup/md5
- Step 3: Download to library with /library/download?md5=... and read with /reader/:md5 (page, image, or search)
Best Practices
- Build precise queries with q plus filters (author, publisher, language, year, ext) to narrow results.
- Enable deduplication (dedupe=true) to keep the best format (PDF preferred).
- Use the /library endpoints to manage local copies and the /reader endpoints for page text, indexing, and embeddings.
- Leverage /search/goodreads for ratings when needed, noting it is a static snapshot as of Sep 2024.
- Check /stats regularly to understand record counts and data freshness of the local index.
Example Use Cases
- Search for a book: GET /search?q=The%20 Pragmatic%20 Programmer&author=Andrew%20Hunt
- Lookup by ISBN: GET /lookup/isbn?isbn=9780201616224
- Download to library: POST /library/download?md5=<md5hash>
- Read a page image: GET /reader/<md5>/page/4/image
- Find related titles: GET /similar?q=programming%20best%20practices&limit=5