asu-discover
ASU Domain Discovery Skill (v2)
Semantic search across Arizona State University's GitHub organization using hybrid RAG (Retrieval-Augmented Generation).
Announce at start: "I'm using the asu-discover skill to search ASU repositories."
Architecture
This skill is a client to the GitHub RAG backend that indexes 760+ ASU repositories:
- Embedding Model: Jina v2 (768 dimensions) - runs locally
- Keyword Extraction: Server-side TF-IDF via
/keywordsendpoint - Search: Hybrid RRF (70% semantic + 30% keyword)
- Backend: Lambda API with sqlite-vec + FTS5
Quick Reference
| Command | Description |
|---|---|
ask "<question>" |
Natural language search |
search --query "<terms>" |
Structured search with filters |
health |
Check backend status |
clear-cache |
Clear cached results |
cache-stats |
Show cache statistics |
How to Use
Natural Language (Conversational)
cd {base_directory}
# Ask questions in natural language
./scripts/discover.sh ask "How do I publish events to EEL?"
./scripts/discover.sh ask "Show me EDNA authorization patterns in TypeScript"
./scripts/discover.sh ask "What's the pattern for PeopleSoft to DPL sync?"
Structured Search
# Search with filters
./scripts/discover.sh search --query "checkAccess" --type function
./scripts/discover.sh search --query "terraform aurora" --type config
./scripts/discover.sh search --query "kafka publisher" --repo evbr-enterprise-event-lake
# Output as JSON
./scripts/discover.sh ask "vault secrets" --json
Health Check
./scripts/discover.sh health
# Shows: repos indexed, chunks indexed, last update time, cache stats
Options
| Option | Description |
|---|---|
-l, --limit <n> |
Maximum results (default: 10, max: 50) |
-t, --type <types> |
Filter by chunk type: function, class, module, readme, terraform, config |
-r, --repo <repos> |
Filter to specific repositories |
--no-cache |
Skip result cache |
--json |
Output as JSON |
First-Time Setup
The embedding model (~500MB) downloads on first use. To pre-download:
./scripts/setup.sh
Domains Covered
| Domain | Examples |
|---|---|
| PeopleSoft | Integration Broker, ServiceOperation, IBRequest |
| EDNA | checkAccess, hasPermission, entitlements |
| DPL | Data Potluck, principal lookup, emplid |
| EEL | Kafka, Confluent, Avro, event publishing |
| Terraform | dco-terraform modules, vpc-core, aurora |
| Vault | hvac, secrets, AppRole, AWS IAM auth |
| CI/CD | Jenkins shared library, GitHub Actions |
Caching
Results are cached locally for 24 hours to improve repeat query performance:
- Cache location:
~/.cache/opencode/asu-discover/cache.json - Use
--no-cacheto bypass - Use
clear-cachecommand to clear
Files
skills/asu-discover/
├── SKILL.md # This file
├── package.json # Dependencies
├── src/
│ ├── client/ # RAG client implementation
│ │ ├── api.ts # API client
│ │ ├── embedder.ts # Jina v2 embedding
│ │ ├── cache.ts # Result caching
│ │ └── config.ts # Config loader
│ └── cli.ts # Commander CLI
├── scripts/
│ ├── discover.sh # Entry point
│ └── setup.sh # Model download
├── config/
│ └── settings.yaml # API endpoint config
└── tests/
└── smoke.sh # Smoke tests
Troubleshooting
"Model not found" or slow first query
The embedding model downloads on first use (~500MB). Run setup to pre-download:
./scripts/setup.sh
API errors
Check backend health:
./scripts/discover.sh health
Stale results
Clear the cache:
./scripts/discover.sh clear-cache
Network timeouts
The backend may have cold starts. Try again after a few seconds, or increase timeout in config/settings.yaml.
Note: This skill provides semantic search across ASU's GitHub organization. For general GitHub operations (issues, PRs, etc.), use the github-ops skill.