The Agent Skills Directory

[CREDENTIALS_UNSAFE] (HIGH): Hardcoded session cookie detected in scripts/crawlers/toutiao.py.
Evidence: The variable FIXED_COOKIE contains a large, valid-looking session string including passport_auth_status_ss, ssid_ucp_sso_v1, ttwid, and toutiao_sso_user_ss. Hardcoding credentials in source code exposes them to anyone with access to the skill files.
[DATA_EXPOSURE_AND_EXFILTRATION] (MEDIUM): The skill performs network requests to various Chinese news platforms using user-provided URLs.
Evidence: Crawlers in scripts/crawlers/ use RequestsFetcher and CurlCffiFetcher to fetch HTML from external domains. While the domains are consistent with the skill's purpose, there is a risk of SSRF or data leakage if the agent is coerced into fetching internal or sensitive URLs that match the platform regex patterns.
[INDIRECT_PROMPT_INJECTION] (MEDIUM): The skill ingests untrusted content from the web and formats it for agent consumption.
Evidence:
Ingestion points: extract_news.py and crawlers in scripts/crawlers/ ingest HTML content from external news URLs.
Boundary markers: No explicit boundary markers or instructions to the agent to ignore embedded commands are present in the output generation logic (formatter.py).
Capability inventory: The skill can write files to the local file system (save_as_json in base.py and extract_news.py).
Sanitization: Content is parsed for text and media but not sanitized for instruction-like patterns. If an attacker controls a news article, they could embed malicious instructions that the agent might execute when processing the extracted Markdown/JSON.

news-extractor