addon-header-footer-cleanup
SKILL.md
Add-on: Header/Footer Cleanup (Repeated Lines)
Use this skill to implement the cleanup stage that removes repeated header/footer noise from LLM-facing text while preserving raw page content for traceability.
Inputs
Collect:
REPEAT_THRESHOLD_RATIO: default0.6(line must appear on >= 60% of pages to be a removal candidate).TOP_BAND_LINES: default3(lines considered “header band”).BOTTOM_BAND_LINES: default3(lines considered “footer band”).MIN_LINE_LEN: default6(ignore tiny lines for repetition counting).
Algorithm (Deterministic)
- For each page, split into lines.
- Normalize each line for comparison:
- NFKC normalize
- lowercase
- collapse whitespace
- strip common page-number patterns (
page 3 of 12,3/12, etc.) for comparison only
- Count how often each normalized line appears across pages.
- Mark a line as a removal candidate if:
- it appears on >=
REPEAT_THRESHOLD_RATIOof pages, and - it appears mostly within the top/bottom bands
- it appears on >=
- Remove candidates from the cleaned page text.
- Persist:
document_pages.clean_markdowndocument_pages.metadata_jsonb.cleanupwith:- removed normalized lines
- per-page removals (original text)
- thresholds used
Storage Contract
Never overwrite raw_markdown. Always store:
raw_markdownclean_markdown
Guardrails
- Avoid deleting content that looks like a section heading or clause text; if uncertain, keep it.
- Record exactly what was removed and why in metadata for auditing.
- Keep cleanup deterministic given the same inputs and parameters.
Decision Justification Rule
- Every non-trivial decision must include a concrete justification.
- Capture the alternatives considered and why they were rejected.
- State tradeoffs and residual risks for the chosen option.
- If justification is missing, treat the task as incomplete and surface it as a blocker.
Weekly Installs
1
Repository
ajrlewis/ai-skillsFirst Seen
4 days ago
Security Audits
Installed on
amp1
cline1
opencode1
cursor1
kimi-cli1
codex1