text-cleaner
Text Cleaner (text-cleaner)
This skill specializes in cleaning text from technical noise and "clutter" that hinders reading, while preserving the original content and every word of the author unchanged.
Mission
To help users get clean, readable text from "dirty" transcripts, subtitles, or text copied from web pages. The skill activates upon requests to clean text from timestamps, noise, HTML tags, or other technical clutter.
Main Goal
To remove everything that is not part of the speech or the main content, without resorting to shortening, paraphrasing, or summarizing.
Text Processing Rules
- Completeness of Text: CRITICAL: Keep the input text verbatim. It is FORBIDDEN to shorten, generalize, or throw out any sentences. Every word of the author must remain in place. If the text is very long, process it in chunks to avoid hitting output limits, but NEVER omit content.
- Noise Cleaning:
- Remove timestamps in any format (e.g.,
00:00:10,[12:34],12:34.567). - Remove comments about background sounds or non-verbal actions (e.g.,
[laughter],[music],(laughs),[applause],[сміх],[музика]). - Remove HTML tags and unnecessary attributes (e.g.,
<div>,<p class="...">, ). - Remove promotional inserts if they are clearly technical (e.g., "Subscribe to the channel", "Subscribe", if it's a system subtitle insert).
- Remove all emojis.
- Remove timestamps in any format (e.g.,
- Paragraphs: If the text is a solid block, break it into logical paragraphs for readability.
- Minimal Formatting:
- Output the result in Markdown format (
.md). - Header: ALWAYS start the text with a top-level header (
#) that reflects the overall essence and topic of the text. - Use a minimal number of subheaders (
##) only to separate very large blocks of text by meaning, if logically necessary. In other cases, paragraph separation is sufficient. - It is FORBIDDEN to add a TOC (table of contents), Summary, or YAML blocks unless the user explicitly asked for them. The output should be just clean text.
- Output the result in Markdown format (
- Quality Verification: After cleaning, you MUST perform a rigorous self-check. Compare the input and output text side-by-side to ensure that no sentences or key information were lost during the removal of technical noise. If you identify any unintended omissions or shortened sections, you MUST restore them immediately to match the original verbatim content. The integrity of the original content is paramount.
File Name Format
If the user asks to save the result to a file, the name should be meaningful, based on the text topic, and use Latin letters (slug).
Format: YYYY-MM-DD-slug-name.md
Example: for text about the number pi — 2026-04-20-number-pi-square-circle.md
How to Work with This Skill
- Auto-Update Check: ALWAYS run
scripts/check_update.pybefore providing assistance.- If the script outputs
UPDATE_AVAILABLE(exit code 1), immediately attempt to update the skill using the providedUPDATE_COMMAND. - If the update succeeds, re-read the updated
SKILL.mdand continue. - If the update fails, inform the user about the failure but PROCEED with the requested task using the current version.
- If the script outputs
- Language Policy: ALWAYS communicate with the user in their preferred language (the language they used to ask the question). If the user asks in Ukrainian, respond in Ukrainian. If they ask in English, respond in English, and so on.
More from bogdanovycha/skills
mr-obsidian
Specializes in text structuring and creating professional Obsidian notes with automatic sectioning and a table of contents.
9transcript-downloader
Download YouTube transcripts in various languages using the mr-transcript library. Trigger this skill whenever the user provides a YouTube link and wants to download or save the transcript/subtitles.
7