addon-header-footer-cleanup
Add-on: Header/Footer Cleanup (Repeated Lines)
Use this skill to implement the cleanup stage that removes repeated header/footer noise from LLM-facing text while preserving raw page content for traceability.
Inputs
Collect:
REPEAT_THRESHOLD_RATIO: default0.6(line must appear on >= 60% of pages to be a removal candidate).TOP_BAND_LINES: default3(lines considered “header band”).BOTTOM_BAND_LINES: default3(lines considered “footer band”).MIN_LINE_LEN: default6(ignore tiny lines for repetition counting).
Algorithm (Deterministic)
- For each page, split into lines.
- Normalize each line for comparison:
- NFKC normalize
- lowercase
- collapse whitespace
- strip common page-number patterns (
page 3 of 12,3/12, etc.) for comparison only
- Count how often each normalized line appears across pages.
- Mark a line as a removal candidate if:
- it appears on >=
REPEAT_THRESHOLD_RATIOof pages, and - it appears mostly within the top/bottom bands
- it appears on >=
- Remove candidates from the cleaned page text.
- Persist:
document_pages.clean_markdowndocument_pages.metadata_jsonb.cleanupwith:- removed normalized lines
- per-page removals (original text)
- thresholds used
Storage Contract
Never overwrite raw_markdown. Always store:
raw_markdownclean_markdown
Guardrails
- Avoid deleting content that looks like a section heading or clause text; if uncertain, keep it.
- Record exactly what was removed and why in metadata for auditing.
- Keep cleanup deterministic given the same inputs and parameters.
Decision Justification Rule
- Every non-trivial decision must include a concrete justification.
- Capture the alternatives considered and why they were rejected.
- State tradeoffs and residual risks for the chosen option.
- If justification is missing, treat the task as incomplete and surface it as a blocker.
More from ajrlewis/ai-skills
architect-python-uv-fastapi-sqlalchemy
Use when scaffolding production-ready FastAPI services with uv, SQLAlchemy, Alembic, Postgres, Docker, and CI gates.
11addon-rag-ingestion-pipeline
Use when adding multi-format RAG ingest, chunk, embed, and retrieval pipelines; pair with architect-python-uv-batch or architect-python-uv-fastapi-sqlalchemy.
11addon-docling-legal-chunk-embed
Use when you need legal PDF to markdown extraction plus clause chunking and embedding prep; pair with addon-rag-ingestion-pipeline and architect-python-uv-batch.
10addon-llm-ancient-greek-translation
Use when adding Koine or Attic Greek translation to Next.js content flows; pair with ui-editorial-writing-surface and addon-nostr-nip23-longform.
10architect-python-uv-batch
Use when scaffolding production-ready Python uv batch or worker projects with Docker required by default.
10addon-human-pr-review-gate
Use when agent-generated code must pass a human PR review gate with trusted checks and merge blocks; pair with addon-decision-justification-ledger and architect-stack-selector.
9