The Agent Skills Directory

[PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection because it transcribes untrusted audio and video content into text intended for AI agent consumption (e.g., for summarization).
Ingestion points: Untrusted media files are ingested through scripts/transcribe.py and processed by the TingwuClient in scripts/tingwu.py.
Boundary markers: The transcription output in scripts/format_output.py is structured using speaker identifiers and timestamps, which provides formatting but no explicit warning to the agent that the content is untrusted.
Capability inventory: The skill reads local media files, writes Markdown transcripts to the disk, and performs network operations to Aliyun APIs.
Sanitization: The transcription text is delivered as received from the API without sanitization or filtering for adversarial instructions.
[EXTERNAL_DOWNLOADS]: The skill communicates with Aliyun's well-known cloud services to perform its primary function.
It sends transcription requests and metadata to Aliyun's internal REST API at tingwu.aliyun.com.
It uploads media files to Aliyun OSS buckets using STS (Security Token Service) credentials obtained dynamically from the service.
It downloads generated PPT slide images from Aliyun servers to the local filesystem during video transcription.
[SAFE]: The code implementation follows standard practices for a transcription tool. Authentication cookies are stored locally in config/cookies.json, and the skill does not attempt to access sensitive system files (e.g., SSH keys) or execute unauthorized shell commands. All network activity is directed toward the legitimate provider infrastructure.

tingwu-asr