data-base
SKILL.md
Mental Model
Data acquisition is converting unstructured web content into structured data. Choose tool based on page complexity: JS-heavy → chrome-devtools MCP, static → Python requests.
Tool Selection
| Page Type | Tool | When to Use |
|---|---|---|
| Dynamic (JS-rendered, SPAs) | chrome-devtools MCP | React/Vue apps, infinite scroll, login gates |
| Static HTML | Python requests | Blogs, news sites, simple pages |
| Complex/reusable logic | Python script | Multi-step scraping, rate limiting, proxies |
Anti-Patterns (NEVER)
- Don't scrape without checking robots.txt
- Don't overload servers (default: 1 req/sec)
- Don't scrape personal data without consent
- Don't use Chinese characters in output filenames (ASCII only)
- Don't forget to identify bot with User-Agent
Output Format
- JSON: Nested/hierarchical data
- CSV: Tabular data
- Filename:
{source}_{timestamp}.{ext}(ASCII only, e.g.,news_20250115.csv)
Workflow
- Ask: What data? Which sites? How much?
- Select tool based on page type
- Extract and save structured data
- Deliver file path to user or pass to data-analysis
Python Environment
Auto-initialize virtual environment if needed, then execute:
cd skills/data-base
if [ ! -f ".venv/bin/python" ]; then
echo "Creating Python environment..."
./setup.sh
fi
.venv/bin/python your_script.py
The setup script auto-installs: requests, beautifulsoup4, pandas, web scraping tools.
References (load on demand)
For detailed APIs and templates, load: references/REFERENCE.md, references/templates.md
Weekly Installs
21
Repository
jinfanzheng/kod…k-csharpGitHub Stars
56
First Seen
Jan 23, 2026
Security Audits
Installed on
claude-code17
opencode13
gemini-cli11
cursor10
codex10
antigravity9