initiative/.claude/skills/bundle-bestiary/SKILL.md at 3b2fb99b37220fa9ddb197b26f1f0cb4712e9eba

Files

T

Lukas c343fd3cd0 Add bundled-bestiary mechanism for shipping creatures with the app

D&D creatures listed in data/bestiary/dnd-bundled.json are now merged into
the search index and pre-loaded into creatureMap, so they appear alongside
5etools creatures with no "Load source" step. Source codes are derived from
the JSON itself (each creature carries source + sourceDisplayName), so adding
a new book is a pure data change. Bundled sources are excluded from
getAllSourceCodes() so bulk-import skips them, and they never appear in the
source manager (which only lists cached sources).

Includes a reference extractor (scripts/extract-great-labors.py) for the
5.5e revised stat-block format and a /bundle-bestiary skill that future
agents can follow to add monsters from other PDF books.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-27 15:49:34 +02:00

8.3 KiB

Raw Blame History

name, description

name	description
bundle-bestiary	Bundle creatures from a third-party PDF into the app's D&D bestiary so they appear in search alongside 5etools creatures, with no "Load source" step. Use when the user asks to add monsters from a PDF book / adventure / supplement to the bundled bestiary.

Instructions

Add the creatures from a PDF to data/bestiary/dnd-bundled.json so they appear in the D&D search index and render as normal stat blocks. Bundled creatures bypass the fetch/cache flow — they're shipped in the JS bundle and pre-loaded into creatureMap on startup.

How the bundling works

data/bestiary/dnd-bundled.json is an array of normalized Creature objects (the same shape produced by bestiary-adapter.ts for 5etools creatures).
apps/web/src/adapters/dnd-bundled-adapter.ts static-imports the JSON and derives:
- loadBundledDndCreatures() — full stat blocks for the in-memory creature map
- loadBundledDndIndexEntries() — compact summaries for the search index
- getBundledDndSources() — source code → display name map, derived from the JSON itself (each creature carries its own source + sourceDisplayName)
bestiary-index-adapter.ts merges the bundled entries into the search index and excludes bundled sources from getAllSourceCodes() (so bulk-import skips them).
use-bestiary.ts merges bundled full creatures into creatureMap on init/refresh.

This means adding a new bundled book is purely a data change: append creatures to dnd-bundled.json with the new source's code and display name. No adapter or index code needs editing.

Step 1 — Confirm scope and source code

Ask the user (don't guess):

PDF path and the page range containing the stat blocks. Many PDFs have hundreds of pages; only a slice has the bestiary.
Source code abbreviation — short uppercase letters, e.g., TGL for The Great Labors. Used in creature IDs and the index.
Display name — the human-readable book title shown in the source column.
Edition / system — confirm this is D&D (5e or 5.5e). Bundled creatures show in both 5e and 5.5e modes (the bestiary index only differentiates pf2e vs not). PF2e isn't currently supported by the bundled flow — if requested, this would need a parallel pf2e-bundled-adapter.ts.
Licensing — verify the user has the right to bundle the book's content. Don't make assumptions.

Step 2 — Inspect the PDF

Check Python's PyPDF2 is available:

python3 -c "from PyPDF2 import PdfReader; print('ok')"

If not, the user has pdftotext-equivalent tooling configured at ~/Nextcloud/dnd/D&D/PROMPT_prep.md worth checking.

Then dump and skim the target pages to learn the stat-block format:

python3 - <<'EOF'
from PyPDF2 import PdfReader
import os
r = PdfReader(os.path.expanduser('PATH/TO/PDF'))
for i in range(START-1, END):
    print(f"\n===PAGE {i+1}===\n{r.pages[i].extract_text()}")
EOF

Look for the layout — the existing extractor (scripts/extract-great-labors.py) assumes the 5.5e/2024 revised format:

<Name> line, then
<Size> <Type>(optional subtype), <Alignment>, then
AC X Initiative ±Y (Z), then
HP N (NdN + N), then
Speed X ft., …, then
A MOD SAVE MOD SAVE MOD SAVE header followed by two ability-score rows, then
Optional meta lines: Skills, Saving Throws, Resistances, Immunities, Vulnerabilities, Senses, Languages, then
Challenge X (NN XP; PB +N), then
Section blocks: Traits / Actions / Bonus Actions / Reactions / Legendary Actions, each containing entries shaped like Name. body....

If the PDF format matches, adapt the existing extractor. If it's a different format (5e 2014 with STR DEX CON … column layout, an older publisher's layout, a homebrew layout), expect to rework the parser more substantively.

Step 3 — Adapt or extend the extractor

Copy scripts/extract-great-labors.py to a new script per book (e.g., scripts/extract-<book-slug>.py) and update:

SOURCE_CODE, SOURCE_DISPLAY, PAGE_START, PAGE_END constants.
The output path (data/bestiary/dnd-bundled.json). Don't overwrite — merge. The simplest pattern: read the existing file, drop any entries with the same source, then append the new ones.
The PROSE_TAIL_PATTERNS list — every book has its own running headers (<PageNumber>APPENDIX B … MONSTERS-style), section-header phrases, and quote-attribution dashes. Run the extractor, audit the output (see Step 4), and add curated trim patterns for any prose tails that bleed in.

Run it:

python3 scripts/extract-<book-slug>.py PATH/TO/PDF

Step 4 — Audit the output

PyPDF text extraction is messy. Always audit before claiming done:

python3 - <<'EOF'
import json, re
data = json.load(open('data/bestiary/dnd-bundled.json'))
new = [c for c in data if c['source'] == 'XXX']  # replace XXX with your code
for c in new:
    print(f"{c['name']}: CR {c['cr']}, AC {c['ac']}, HP {c['hp']['average']} ({c['hp']['formula']})")
    abs_ = c['abilities']
    print(f"  STR {abs_['str']} DEX {abs_['dex']} CON {abs_['con']} INT {abs_['int']} WIS {abs_['wis']} CHA {abs_['cha']}, PP {c['passive']}")
# Then audit bodies for prose-tail bleed and weird splits.
for c in new:
    for sec in ('traits', 'actions', 'bonusActions', 'reactions'):
        for e in c.get(sec, []):
            body = e['segments'][0]['value']
            issues = []
            if len(body) > 600: issues.append(f"long({len(body)})")
            if re.search(r'\.[A-Z][a-z]', body): issues.append("dot-Capital")
            if 'APPENDIX' in body: issues.append("APPENDIX")
            if re.search(r'—\s*[A-Z]\w+,\s', body): issues.append("attribution")
            if issues:
                print(f"  {c['name']} [{sec}] {e['name']}: {', '.join(issues)}")
                print(f"    ...{body[-200:]}")
EOF

Common PDF extraction problems to fix in the parser:

PDF kerning quirks: multi-digit values rendered with spaces (e.g., "Passive Perception 1 1" → 11, "Wis 8–1 –1" with no space before negative). The existing parser handles most; check for new ones.
Smushed section headers: lines like ...plants.Actions where the section header for the next block was concatenated. Handle via SECTION_HEADER_SMUSH_RE preprocessing.
Cross-page prose bleed: text from the next page's flavor prose absorbed into the last entry's body. Catch via PROSE_TAIL_PATTERNS — add curated phrases observed in this specific book.
Sibling-entry inline smush: damage.Ram. Melee Attack Roll: … where two entries got concatenated. Already handled by the mid-line entry boundary regex in the existing parser.
Title-cased false positives: words like Bloodied., Restrained., Frightened. at sentence ends would otherwise match the entry-name pattern. Filtered via NAME_FALSE_POSITIVES — add to it if the new book uses condition names you haven't seen yet.

Step 5 — Verify in the app

pnpm check

Then start the dev server and search for one of the new creatures by name:

pnpm --filter web dev

Confirm in the browser:

Search finds the creature with the right book name as the source label.
Clicking it shows the full stat block immediately — no "Load source" prompt.
The source manager UI does not list the bundled book (it only shows cached sources).
Bulk import skips the bundled book.

Notes for future agents

No need to edit dnd-bundled-adapter.ts or bestiary-index-adapter.ts when adding a new book — the adapter derives source codes from the JSON.
data/bestiary/index.json is regenerated from 5etools and should not be edited to add bundled entries. The merge happens at runtime in bestiary-index-adapter.ts.
Each bundled creature must have:
- A unique id like <sourcecode>:<slug> (e.g., tgl:anarch-boar).
- source field matching the source code (e.g., "TGL").
- sourceDisplayName field matching the book's display name (e.g., "The Great Labors").
- All the required Creature fields from packages/domain/src/creature-types.ts.
The script approach is preferred over hand-editing JSON for >5 creatures. For a single creature or two, hand-editing the JSON is reasonable; just match an existing entry's shape exactly.
After any change to dnd-bundled.json, run pnpm typecheck — the static import in the adapter will catch shape mismatches at compile time.

8.3 KiB Raw Blame History Unescape Escape