Files
initiative/.claude/skills/bundle-bestiary/SKILL.md
T
Lukas c343fd3cd0 Add bundled-bestiary mechanism for shipping creatures with the app
D&D creatures listed in data/bestiary/dnd-bundled.json are now merged into
the search index and pre-loaded into creatureMap, so they appear alongside
5etools creatures with no "Load source" step. Source codes are derived from
the JSON itself (each creature carries source + sourceDisplayName), so adding
a new book is a pure data change. Bundled sources are excluded from
getAllSourceCodes() so bulk-import skips them, and they never appear in the
source manager (which only lists cached sources).

Includes a reference extractor (scripts/extract-great-labors.py) for the
5.5e revised stat-block format and a /bundle-bestiary skill that future
agents can follow to add monsters from other PDF books.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 15:49:34 +02:00

149 lines
8.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
name: bundle-bestiary
description: Bundle creatures from a third-party PDF into the app's D&D bestiary so they appear in search alongside 5etools creatures, with no "Load source" step. Use when the user asks to add monsters from a PDF book / adventure / supplement to the bundled bestiary.
---
## Instructions
Add the creatures from a PDF to `data/bestiary/dnd-bundled.json` so they appear in the D&D search index and render as normal stat blocks. Bundled creatures bypass the fetch/cache flow — they're shipped in the JS bundle and pre-loaded into `creatureMap` on startup.
### How the bundling works
- `data/bestiary/dnd-bundled.json` is an array of normalized `Creature` objects (the same shape produced by `bestiary-adapter.ts` for 5etools creatures).
- `apps/web/src/adapters/dnd-bundled-adapter.ts` static-imports the JSON and derives:
- `loadBundledDndCreatures()` — full stat blocks for the in-memory creature map
- `loadBundledDndIndexEntries()` — compact summaries for the search index
- `getBundledDndSources()` — source code → display name map, **derived from the JSON itself** (each creature carries its own `source` + `sourceDisplayName`)
- `bestiary-index-adapter.ts` merges the bundled entries into the search index and excludes bundled sources from `getAllSourceCodes()` (so bulk-import skips them).
- `use-bestiary.ts` merges bundled full creatures into `creatureMap` on init/refresh.
This means **adding a new bundled book is purely a data change**: append creatures to `dnd-bundled.json` with the new source's code and display name. No adapter or index code needs editing.
### Step 1 — Confirm scope and source code
Ask the user (don't guess):
1. **PDF path** and the **page range** containing the stat blocks. Many PDFs have hundreds of pages; only a slice has the bestiary.
2. **Source code abbreviation** — short uppercase letters, e.g., `TGL` for *The Great Labors*. Used in creature IDs and the index.
3. **Display name** — the human-readable book title shown in the source column.
4. **Edition / system** — confirm this is D&D (5e or 5.5e). Bundled creatures show in both 5e and 5.5e modes (the bestiary index only differentiates pf2e vs not). PF2e isn't currently supported by the bundled flow — if requested, this would need a parallel `pf2e-bundled-adapter.ts`.
5. **Licensing** — verify the user has the right to bundle the book's content. Don't make assumptions.
### Step 2 — Inspect the PDF
Check Python's PyPDF2 is available:
```bash
python3 -c "from PyPDF2 import PdfReader; print('ok')"
```
If not, the user has `pdftotext`-equivalent tooling configured at `~/Nextcloud/dnd/D&D/PROMPT_prep.md` worth checking.
Then dump and skim the target pages to learn the stat-block format:
```bash
python3 - <<'EOF'
from PyPDF2 import PdfReader
import os
r = PdfReader(os.path.expanduser('PATH/TO/PDF'))
for i in range(START-1, END):
print(f"\n===PAGE {i+1}===\n{r.pages[i].extract_text()}")
EOF
```
Look for the layout — the existing extractor (`scripts/extract-great-labors.py`) assumes the 5.5e/2024 revised format:
- `<Name>` line, then
- `<Size> <Type>(optional subtype), <Alignment>`, then
- `AC X Initiative ±Y (Z)`, then
- `HP N (NdN + N)`, then
- `Speed X ft., …`, then
- A `MOD SAVE MOD SAVE MOD SAVE` header followed by two ability-score rows, then
- Optional meta lines: `Skills`, `Saving Throws`, `Resistances`, `Immunities`, `Vulnerabilities`, `Senses`, `Languages`, then
- `Challenge X (NN XP; PB +N)`, then
- Section blocks: `Traits` / `Actions` / `Bonus Actions` / `Reactions` / `Legendary Actions`, each containing entries shaped like `Name. body...`.
If the PDF format matches, adapt the existing extractor. If it's a different format (5e 2014 with `STR DEX CON …` column layout, an older publisher's layout, a homebrew layout), expect to rework the parser more substantively.
### Step 3 — Adapt or extend the extractor
Copy `scripts/extract-great-labors.py` to a new script per book (e.g., `scripts/extract-<book-slug>.py`) and update:
- `SOURCE_CODE`, `SOURCE_DISPLAY`, `PAGE_START`, `PAGE_END` constants.
- The output path (`data/bestiary/dnd-bundled.json`). **Don't overwrite — merge.** The simplest pattern: read the existing file, drop any entries with the same `source`, then append the new ones.
- The `PROSE_TAIL_PATTERNS` list — every book has its own running headers (`<PageNumber>APPENDIX B … MONSTERS`-style), section-header phrases, and quote-attribution dashes. Run the extractor, audit the output (see Step 4), and add curated trim patterns for any prose tails that bleed in.
Run it:
```bash
python3 scripts/extract-<book-slug>.py PATH/TO/PDF
```
### Step 4 — Audit the output
PyPDF text extraction is messy. Always audit before claiming done:
```bash
python3 - <<'EOF'
import json, re
data = json.load(open('data/bestiary/dnd-bundled.json'))
new = [c for c in data if c['source'] == 'XXX'] # replace XXX with your code
for c in new:
print(f"{c['name']}: CR {c['cr']}, AC {c['ac']}, HP {c['hp']['average']} ({c['hp']['formula']})")
abs_ = c['abilities']
print(f" STR {abs_['str']} DEX {abs_['dex']} CON {abs_['con']} INT {abs_['int']} WIS {abs_['wis']} CHA {abs_['cha']}, PP {c['passive']}")
# Then audit bodies for prose-tail bleed and weird splits.
for c in new:
for sec in ('traits', 'actions', 'bonusActions', 'reactions'):
for e in c.get(sec, []):
body = e['segments'][0]['value']
issues = []
if len(body) > 600: issues.append(f"long({len(body)})")
if re.search(r'\.[A-Z][a-z]', body): issues.append("dot-Capital")
if 'APPENDIX' in body: issues.append("APPENDIX")
if re.search(r'—\s*[A-Z]\w+,\s', body): issues.append("attribution")
if issues:
print(f" {c['name']} [{sec}] {e['name']}: {', '.join(issues)}")
print(f" ...{body[-200:]}")
EOF
```
Common PDF extraction problems to fix in the parser:
- **PDF kerning quirks**: multi-digit values rendered with spaces (e.g., "Passive Perception 1 1" → 11, "Wis 81 1" with no space before negative). The existing parser handles most; check for new ones.
- **Smushed section headers**: lines like `...plants.Actions` where the section header for the next block was concatenated. Handle via `SECTION_HEADER_SMUSH_RE` preprocessing.
- **Cross-page prose bleed**: text from the next page's flavor prose absorbed into the last entry's body. Catch via `PROSE_TAIL_PATTERNS` — add curated phrases observed in this specific book.
- **Sibling-entry inline smush**: `damage.Ram. Melee Attack Roll: …` where two entries got concatenated. Already handled by the mid-line entry boundary regex in the existing parser.
- **Title-cased false positives**: words like `Bloodied.`, `Restrained.`, `Frightened.` at sentence ends would otherwise match the entry-name pattern. Filtered via `NAME_FALSE_POSITIVES` — add to it if the new book uses condition names you haven't seen yet.
### Step 5 — Verify in the app
```bash
pnpm check
```
Then start the dev server and search for one of the new creatures by name:
```bash
pnpm --filter web dev
```
Confirm in the browser:
1. Search finds the creature with the right book name as the source label.
2. Clicking it shows the full stat block immediately — **no "Load source" prompt**.
3. The source manager UI does **not** list the bundled book (it only shows cached sources).
4. Bulk import skips the bundled book.
### Notes for future agents
- **No need to edit `dnd-bundled-adapter.ts` or `bestiary-index-adapter.ts`** when adding a new book — the adapter derives source codes from the JSON.
- `data/bestiary/index.json` is regenerated from 5etools and should **not** be edited to add bundled entries. The merge happens at runtime in `bestiary-index-adapter.ts`.
- Each bundled creature must have:
- A unique `id` like `<sourcecode>:<slug>` (e.g., `tgl:anarch-boar`).
- `source` field matching the source code (e.g., `"TGL"`).
- `sourceDisplayName` field matching the book's display name (e.g., `"The Great Labors"`).
- All the required `Creature` fields from `packages/domain/src/creature-types.ts`.
- The script approach is preferred over hand-editing JSON for >5 creatures. For a single creature or two, hand-editing the JSON is reasonable; just match an existing entry's shape exactly.
- After any change to `dnd-bundled.json`, run `pnpm typecheck` — the static import in the adapter will catch shape mismatches at compile time.