50 lines
3.2 KiB
Markdown
50 lines
3.2 KiB
Markdown
# Research: Copy-Paste Detection Quality Gate
|
|
|
|
## jscpd Configuration Strategy
|
|
|
|
**Decision**: Use `.jscpd.json` at the repo root for configuration.
|
|
|
|
**Rationale**: jscpd supports a dedicated JSON config file (`.jscpd.json`) which is the conventional approach. This keeps configuration discoverable alongside other tool configs (`biome.json`, `knip.json`). Command-line flags could work but are less maintainable and harder to share across scripts.
|
|
|
|
**Alternatives considered**:
|
|
- CLI flags in the `pnpm jscpd` script: Less discoverable, harder to maintain threshold changes.
|
|
- `package.json` `jscpd` key: Supported but clutters package.json; separate file preferred for consistency with project conventions.
|
|
|
|
## Threshold Configuration
|
|
|
|
**Decision**: Use jscpd defaults with minor tuning — minimum 5 lines, minimum 50 tokens for duplicate detection. Set a percentage threshold (e.g., 5% max duplication) to fail the check.
|
|
|
|
**Rationale**: jscpd's defaults (5 lines, 50 tokens) are well-established industry standards that avoid flagging trivially similar code (imports, short utility patterns) while catching meaningful copy-paste blocks. The percentage threshold provides a clear pass/fail gate.
|
|
|
|
**Alternatives considered**:
|
|
- Stricter thresholds (3 lines, 30 tokens): Too aggressive, would flag structural similarities common in TypeScript (type declarations, import blocks).
|
|
- No percentage threshold (fail on any duplicate): Too strict for an existing codebase that may have some acceptable duplication.
|
|
|
|
## File Inclusion/Exclusion Strategy
|
|
|
|
**Decision**: Scan TypeScript and TSX files (`**/*.ts`, `**/*.tsx`). Exclude `node_modules`, `dist`, `build`, `coverage`, `.specify`, `specs`, and lock files via jscpd's `ignore` configuration.
|
|
|
|
**Rationale**: The project is TypeScript-only. Excluding build artifacts, vendored dependencies, and non-source files prevents false positives and keeps scan times fast.
|
|
|
|
**Alternatives considered**:
|
|
- Scan all file types: Unnecessary — the project contains no other source languages.
|
|
- Exclude test files: Rejected per spec assumption — test duplication is also worth catching.
|
|
|
|
## Integration with pnpm check
|
|
|
|
**Decision**: Add `jscpd` to the `pnpm check` script chain in `package.json`, running it alongside knip, biome, typecheck, and vitest.
|
|
|
|
**Rationale**: The existing `pnpm check` script is already the pre-commit gate via Lefthook (`lefthook.yml` runs `pnpm check`). Adding jscpd to this chain automatically integrates it into the pre-commit workflow with zero Lefthook config changes.
|
|
|
|
**Alternatives considered**:
|
|
- Separate Lefthook job for jscpd: Would work but deviates from the existing pattern where `pnpm check` is the single merge gate.
|
|
- Run jscpd only in CI: Misses the pre-commit enforcement requirement from the spec.
|
|
|
|
## Knip Integration
|
|
|
|
**Decision**: Ensure jscpd is recognized by Knip as a used dev dependency. Knip may need the binary referenced in a script to avoid being flagged as unused.
|
|
|
|
**Rationale**: The project runs `knip` as part of `pnpm check`. Adding `jscpd` as a devDependency and referencing it in a package.json script ensures Knip won't report it as unused.
|
|
|
|
**Alternatives considered**: None — this is a necessary housekeeping step given the project's use of Knip.
|