Files
initiative/specs/015-add-jscpd-gate/spec.md

6.9 KiB

Feature Specification: Copy-Paste Detection Quality Gate

Feature Branch: 015-add-jscpd-gate Created: 2026-03-05 Status: Draft Input: User description: "introduce jscpd to detect copy/pasted code and add it as a quality gate for pre commit"

User Scenarios & Testing (mandatory)

User Story 1 - Automated Copy-Paste Detection on Commit (Priority: P1)

A developer working on the codebase makes changes and attempts to commit. Before the commit completes, the system automatically scans the codebase for duplicated code blocks. If duplication exceeds the configured threshold, the commit is blocked and the developer sees a clear report identifying the duplicated sections, enabling them to refactor before committing.

Why this priority: This is the core value proposition — preventing code duplication from entering the codebase by catching it at the earliest possible point in the development workflow.

Independent Test: Can be fully tested by introducing a known duplicated code block and attempting to commit. The commit should be blocked with a report showing the duplication.

Acceptance Scenarios:

  1. Given a codebase with no duplicated code above threshold, When a developer commits changes that do not introduce duplication, Then the commit succeeds without copy-paste warnings.
  2. Given a developer has staged changes containing duplicated code blocks, When they attempt to commit, Then the commit is blocked and a report lists the duplicated files and line ranges.
  3. Given a developer has staged changes with duplication, When the pre-commit check fails, Then the output clearly identifies which code blocks are duplicated and where.

User Story 2 - On-Demand Duplication Scanning (Priority: P2)

A developer wants to proactively check the codebase for duplicated code without committing. They run a dedicated command that scans the project and produces a duplication report, allowing them to identify and address copy-paste issues at any time.

Why this priority: Supports proactive code quality improvement outside the commit workflow, but the pre-commit gate (P1) is the primary enforcement mechanism.

Independent Test: Can be tested by running the duplication scan command on a codebase with known duplicated sections and verifying the report output.

Acceptance Scenarios:

  1. Given a codebase with duplicated code, When the developer runs the duplication scan command, Then a report is produced listing all duplicated blocks with file paths and line numbers.
  2. Given a codebase with no duplication above threshold, When the developer runs the scan, Then the command exits successfully with a clean result.

User Story 3 - Integration with Existing Quality Gate (Priority: P3)

The copy-paste detection check is integrated into the existing unified quality gate command so that it runs alongside linting, formatting, type checking, and tests as part of the single merge gate.

Why this priority: Ensures consistency with the existing developer workflow and CI expectations, but depends on the core detection (P1) being implemented first.

Independent Test: Can be tested by running the unified quality gate and verifying that duplication scanning is included in the output alongside other checks.

Acceptance Scenarios:

  1. Given the existing unified quality gate command, When a developer runs it, Then copy-paste detection runs as one of the checks.
  2. Given a codebase with duplication above threshold, When the unified quality gate is run, Then the overall command fails and the duplication report is visible in the output.

Edge Cases

  • What happens when the codebase has zero source files matching the scan pattern? The scan should succeed with a clean result.
  • How does the system handle generated code or build artifacts that may contain legitimate repetition? These are excluded from scanning by default via configuration.
  • What happens if the detection tool is not installed? The pre-commit hook and quality gate should fail with a clear error message indicating the missing dependency.
  • How does the system behave when duplication exists only in test files? Test files are scanned by default, as duplication in tests also indicates refactoring opportunities.

Requirements (mandatory)

Functional Requirements

  • FR-001: The system MUST scan source code files for duplicated code blocks before each commit via the pre-commit hook.
  • FR-002: The system MUST block commits when code duplication exceeds the configured threshold.
  • FR-003: The system MUST produce a human-readable report identifying duplicated code blocks, including file paths and line ranges.
  • FR-004: The system MUST provide a standalone command for running the duplication scan independently of the commit workflow.
  • FR-005: The system MUST integrate the duplication check into the existing unified quality gate command.
  • FR-006: The system MUST allow configuration of duplication detection thresholds (minimum token count and minimum number of lines to qualify as a duplicate).
  • FR-007: The system MUST support excluding specific files or directories from duplication scanning (e.g., generated code, build output, lock files).
  • FR-008: The system MUST scan project source files relevant to the codebase (TypeScript/TSX) and exclude non-source artifacts by default.

Success Criteria (mandatory)

Measurable Outcomes

  • SC-001: 100% of commits pass through the copy-paste detection gate before being accepted.
  • SC-002: Developers receive duplication feedback within 10 seconds for a typical project-sized codebase.
  • SC-003: Zero false positives on import statements, type declarations, or other structurally necessary repetitions when using default thresholds.
  • SC-004: The standalone scan command completes successfully and reports all duplicated blocks in the codebase.
  • SC-005: The unified quality gate includes and enforces the duplication check alongside existing checks.

Assumptions

  • The project already uses Lefthook for pre-commit hooks, so the new check will be added to the existing hook configuration.
  • The duplication detection tool will be added as a development dependency.
  • Default thresholds will follow industry-standard defaults (typically 5+ lines or 50+ tokens) and can be adjusted via configuration.
  • Test files are included in the scan by default, as duplication in tests can also indicate refactoring opportunities.
  • Generated files (e.g., build output, lock files) are excluded by default.

Scope Boundaries

  • MVP baseline does not include: CI/CD pipeline integration beyond what the unified quality gate already provides.
  • MVP baseline does not include: Historical duplication trend tracking or reporting dashboards.
  • MVP baseline does not include: Auto-fix or refactoring suggestions for detected duplications.
  • MVP baseline does not include: Per-file or per-directory threshold overrides.