cli-vstash
vstash CLI
Local document memory with instant semantic search. Drop any file, ask anything.
Core Commands
# Ingest documents (PDF, DOCX, PPTX, XLSX, Markdown, code, URLs)
vstash add paper.pdf notes.md https://example.com/article
vstash add ./docs --collection research --project ml-survey
# Semantic search (free, no LLM needed)
vstash search "what's the main argument about X?"
# Ask with LLM (requires inference backend)
vstash ask "summarize the key findings"
# Interactive chat session
vstash chat
# Document management
vstash list # Show all ingested documents
vstash stats # DB statistics (docs, chunks, size)
vstash forget paper.pdf # Remove document from memory
vstash reindex # Switch embedding model without re-ingesting
# Auto-ingest on file changes
vstash watch ./docs
# Export for training data curation
vstash export --project ml-survey --format jsonl
# Show current configuration
vstash config
Ingestion Pipeline
file/URL → parse → chunk → embed → store vectors → index text
Parsing:
- Non-code: markitdown (preserves structure)
- Code (
code_aware=true): raw UTF-8 to preserve syntax
Chunking strategies:
| Mode | Files | Strategy |
|---|---|---|
| Semantic | MD, PDF, DOCX | Headers → paragraphs → fixed-window → merge small |
| Code-aware | Python, JS/TS, Go, Rust, Java | Split at top-level def/class/func/fn |
Search Pipeline
query → embed → vector search (top-k×10) → keyword search (top-k×10) → RRF fusion → memory scoring → dedup → relevance signal → top-k
Reciprocal Rank Fusion (RRF): Merges vector + keyword rankings without comparable scores.
Relevance signal: Distance-based confidence tiers:
- ≤ 0.95: high (full confidence)
- 0.95–0.98: medium (uncertain)
-
0.98: low (results may not match)
Context expansion: ±1 adjacent chunks included for LLM answers (2.64× more context).
Configuration
Create vstash.toml in current directory or ~/.vstash/vstash.toml:
[inference]
backend = "ollama" # or "cerebras", "openai"
[ollama]
host = "http://localhost:11434"
model = "llama3.2"
[embeddings]
model = "BAAI/bge-small-en-v1.5" # multilingual: "BAAI/bge-m3"
[chunking]
size = 1024
overlap = 128
code_aware = true
[scoring]
enabled = true # frequency + temporal decay re-ranking
Run vstash config to see active settings.
Metadata Filtering
vstash add notes.md --collection research --project ml-survey --tags "attention,transformers"
vstash list --project ml-survey
vstash ask "what architectures were compared?" --project ml-survey
Documents with YAML frontmatter are parsed automatically.
Privacy
| Component | Local? |
|---|---|
| Embeddings (FastEmbed ONNX) | Yes |
| Vector store (sqlite-vec) | Yes |
| Semantic search | Yes |
| Inference (Ollama) | Yes |
| Inference (Cerebras/OpenAI) | No — chunks sent to API |
For full privacy, use backend = "ollama" or use vstash search instead of vstash ask.
Reference
See references/cli-reference.md for full option details.
More from jr2804/prompts
python-ultimate
>-
33output-quality
Detect and eliminate generic, low-quality "AI slop" patterns in natural language, code, and design. Use when REVIEWING existing content (text, code, or visual designs) for quality issues, cleaning up generic patterns, or establishing quality standards. Focuses on pattern detection—not content creation.
8coding-discipline
Language-agnostic behavioral guidelines to reduce common LLM coding mistakes. Use for ANY coding task (all languages) to avoid overcomplication, make surgical changes, surface assumptions before coding, and define verifiable success criteria. Applies behavioral rigor—separate from language-specific technical standards.
8sqlmodel
Comprehensive guide for working with SQLModel, PostgreSQL, and SQLAlchemy in FastAPI projects. Use when working with database operations in FastAPI including: (1) Defining SQLModel models and relationships, (2) Database connection and session management, (3) CRUD operations, (4) Query patterns and filtering, (5) Database migrations with Alembic, (6) Testing with SQLite, (7) Performance optimization and connection pooling, (8) Transaction management and error handling, (9) Advanced features like cascading deletes, soft deletes, and event listeners, (10) FastAPI integration patterns. Covers both basic and advanced database patterns for production-ready FastAPI applications.
1testing-strategy
Universal testing strategies and best practices for software projects
1cli-cytoscnpy
CLI tool for CytoScnPy - code metrics analysis for Python projects. Use when running code quality scans, calculating cyclomatic complexity, Halstead metrics, maintainability index, or generating project statistics. Triggers on: cytoscnpy commands, code metrics analysis, complexity reports, maintainability scoring, or MCP server setup for LLM integration.
1