text-cleaner

Installation
SKILL.md

Text Cleaner (text-cleaner)

This skill specializes in cleaning text from technical noise and "clutter" that hinders reading, while preserving the original content and every word of the author unchanged.

Mission

To help users get clean, readable text from "dirty" transcripts, subtitles, or text copied from web pages. The skill activates upon requests to clean text from timestamps, noise, HTML tags, or other technical clutter.

Main Goal

To remove everything that is not part of the speech or the main content, without resorting to shortening, paraphrasing, or summarizing.

Text Processing Rules

  • Completeness of Text: CRITICAL: Keep the input text verbatim. It is FORBIDDEN to shorten, generalize, or throw out any sentences. Every word of the author must remain in place. If the text is very long, process it in chunks to avoid hitting output limits, but NEVER omit content.
  • Noise Cleaning:
    • Remove timestamps in any format (e.g., 00:00:10, [12:34], 12:34.567).
    • Remove comments about background sounds or non-verbal actions (e.g., [laughter], [music], (laughs), [applause], [сміх], [музика]).
    • Remove HTML tags and unnecessary attributes (e.g., <div>, <p class="...">, &nbsp;).
    • Remove promotional inserts if they are clearly technical (e.g., "Subscribe to the channel", "Subscribe", if it's a system subtitle insert).
    • Remove all emojis.
  • Paragraphs: If the text is a solid block, break it into logical paragraphs for readability.
  • Minimal Formatting:
    • Output the result in Markdown format (.md).
    • Header: ALWAYS start the text with a top-level header (#) that reflects the overall essence and topic of the text.
    • Use a minimal number of subheaders (##) only to separate very large blocks of text by meaning, if logically necessary. In other cases, paragraph separation is sufficient.
    • It is FORBIDDEN to add a TOC (table of contents), Summary, or YAML blocks unless the user explicitly asked for them. The output should be just clean text.
  • Quality Verification: After cleaning, you MUST perform a rigorous self-check. Compare the input and output text side-by-side to ensure that no sentences or key information were lost during the removal of technical noise. If you identify any unintended omissions or shortened sections, you MUST restore them immediately to match the original verbatim content. The integrity of the original content is paramount.

File Name Format

If the user asks to save the result to a file, the name should be meaningful, based on the text topic, and use Latin letters (slug). Format: YYYY-MM-DD-slug-name.md Example: for text about the number pi — 2026-04-20-number-pi-square-circle.md

How to Work with This Skill

  1. Auto-Update Check: ALWAYS run scripts/check_update.py before providing assistance.
    • If the script outputs UPDATE_AVAILABLE (exit code 1), immediately attempt to update the skill using the provided UPDATE_COMMAND.
    • If the update succeeds, re-read the updated SKILL.md and continue.
    • If the update fails, inform the user about the failure but PROCEED with the requested task using the current version.
  2. Language Policy: ALWAYS communicate with the user in their preferred language (the language they used to ask the question). If the user asks in Ukrainian, respond in Ukrainian. If they ask in English, respond in English, and so on.
Related skills
Installs
11
First Seen
Apr 20, 2026