ollama-optimizer
Ollama Optimizer
Optimize Ollama configuration based on system hardware analysis.
Workflow
Phase 1: System Detection
Run the detection script to gather hardware information:
python3 scripts/detect_system.py
Parse the JSON output to identify:
- OS and version
- CPU model and core count
- Total RAM / unified memory
- GPU type, VRAM, and driver version
- Current Ollama installation and environment variables
Phase 2: Analyze and Recommend
Based on detected hardware, determine the optimization profile:
Hardware Tier Classification:
| Tier | Criteria | Max Model | Key Optimizations |
|---|---|---|---|
| CPU-only | No GPU detected | 3B | num_thread tuning, Q4_K_M quant |
| Low VRAM | <6GB VRAM | 3B | Flash attention, KV cache q4_0 |
| Entry | 6-8GB VRAM | 8B | Flash attention, KV cache q8_0 |
| Prosumer | 10-12GB VRAM | 14B | Flash attention, full offload |
| Workstation | 16-24GB VRAM | 32B | Standard config, Q5_K_M option |
| High-end | 48GB+ VRAM | 70B+ | Multiple models, Q5/Q6 quants |
Apple Silicon Special Case:
- Unified memory = shared CPU/GPU RAM
- 8GB Mac → treat as 6GB VRAM tier
- 16GB Mac → treat as 12GB VRAM tier
- 32GB+ Mac → treat as workstation tier
Phase 3: Generate Optimization Plan
Create a structured optimization guide with these sections:
1. System Overview
Present detected hardware specs and highlight constraints (e.g., "8GB unified memory limits to 7B models").
2. Dependency Assessment
List what's needed based on the platform:
- macOS: Ollama only (Metal automatic)
- Linux NVIDIA: Ollama + NVIDIA driver 450+
- Linux AMD: Ollama + ROCm 5.0+
- Windows: Ollama + NVIDIA driver 452+
3. Configuration Recommendations
Essential environment variables:
# Always recommended
export OLLAMA_FLASH_ATTENTION=1
# Memory-constrained systems (<12GB)
export OLLAMA_KV_CACHE_TYPE=q8_0 # or q4_0 for severe constraints
Model selection guidance:
- Recommend specific models from
ollama listoutput - Suggest appropriate quantization (Q4_K_M default, Q5_K_M if headroom exists)
- Warn if current models exceed hardware capacity
Modelfile tuning (when needed):
PARAMETER num_gpu <layers> # Partial offload for limited VRAM
PARAMETER num_thread <cores> # CPU threads (physical cores, not hyperthreads)
PARAMETER num_ctx <size> # Reduce context for memory savings
4. Execution Checklist
Provide copy-paste commands in order:
- Set environment variables
- Restart Ollama service
- Pull recommended models
- Test with
ollama run <model> --verbose
5. Verification Commands
# Benchmark current performance
python3 scripts/benchmark_ollama.py --model <model>
# Check GPU memory usage (NVIDIA)
nvidia-smi
# Verify config is applied
ollama run <model> "test" --verbose 2>&1 | head -20
Reference Files
- VRAM Requirements - Model sizing and quantization guide
- Environment Variables - Complete env var reference
- Platform-Specific Setup - OS-specific installation and configuration
Output Format
Generate an ollama-optimization-guide.md file in the current directory with:
# Ollama Optimization Guide
**Generated:** <timestamp>
**System:** <OS> | <CPU> | <RAM>GB RAM | <GPU>
## System Overview
<hardware summary and constraints>
## Current Configuration
<existing Ollama setup and env vars>
## Recommendations
### Environment Variables
<shell commands to set vars>
### Model Selection
<recommended models with rationale>
### Performance Tuning
<Modelfile adjustments if needed>
## Execution Checklist
- [ ] <step 1>
- [ ] <step 2>
...
## Verification
<benchmark commands and expected results>
## Rollback
<commands to revert changes if needed>
Quick Optimization Commands
For users who want immediate results without full analysis:
macOS (Apple Silicon):
export OLLAMA_FLASH_ATTENTION=1
export OLLAMA_KV_CACHE_TYPE=q8_0
ollama pull llama3.2:3b # Safe for 8GB, fast
Linux/Windows with 8GB NVIDIA GPU:
export OLLAMA_FLASH_ATTENTION=1
export OLLAMA_KV_CACHE_TYPE=q8_0
ollama pull llama3.1:8b-instruct-q4_K_M
CPU-only systems:
export CUDA_VISIBLE_DEVICES=-1
ollama pull llama3.2:3b
# Create Modelfile with: PARAMETER num_thread 4
More from montimage/skills
skill-auditor
Analyze agent skills for security risks, malicious patterns, and potential dangers before installation. Use when asked to "audit a skill", "check if a skill is safe", "analyze skill security", "review skill risk", "should I install this skill", "is this skill safe", "scan this skill", or when evaluating any skill directory for trust and safety. Also triggers when the user pastes a skill install command like "npx skills add https://github.com/org/repo --skill name". Produces a comprehensive security report with a clear install/reject verdict. Trigger this skill proactively whenever the user is about to install a third-party skill or mentions concerns about skill safety.
30code-review
Perform code reviews following best practices from Code Smells and The Pragmatic Programmer. Use when asked to "review this code", "check for code smells", "review my PR", "audit the codebase", "find bugs", "check code quality", "what's wrong with this code", "is this code good", or any request for quality feedback on code changes. Supports both full codebase audits and focused PR/diff reviews. Outputs structured markdown reports grouped by severity. Trigger this skill whenever the user wants a second opinion on code, even if they don't explicitly say "review".
15skill-creator
Create new skills, modify and improve existing skills, and measure skill performance. Use when users want to create a skill from scratch, update or optimize an existing skill, package a skill for distribution, or iterate on skill quality. Trigger this skill whenever the user says "create a skill", "build a skill", "make a skill for X", "update this skill", "improve this skill", "package this skill", or mentions wanting to extend Claude's capabilities with specialized workflows or tools.
9oss-ready
Transform projects into professional open-source repositories with standard components. Use when users ask to "make this open source", "add open source files", "setup OSS standards", "create contributing guide", "add license", "prepare for public release", "add CODE_OF_CONDUCT", "add SECURITY.md", "GitHub templates", or want to prepare a project for public release with README, CONTRIBUTING, LICENSE, and GitHub templates. Trigger this skill whenever the user mentions open-sourcing, public repos, community standards, or making a project contribution-ready — even if they just say "let's open source this".
7test-coverage
Expand unit test coverage by targeting untested branches and edge cases. Use when users ask to "increase test coverage", "add more tests", "expand unit tests", "cover edge cases", "improve test coverage", "find untested code", "what's not tested", "run coverage report", "write missing tests", or want to identify and fill gaps in existing test suites. Adapts to project's testing framework. Trigger this skill whenever the user mentions test gaps, untested code, coverage percentages, or wants to harden their test suite.
7devops-pipeline
Implement pre-commit hooks and GitHub Actions for quality assurance. Use when asked to "setup CI/CD", "add pre-commit hooks", "create GitHub Actions", "setup quality gates", "automate testing", "add linting to CI", "setup code quality checks", "configure CI pipeline", "add automated checks", or any DevOps automation for code quality. Detects project type and configures appropriate tools. Trigger this skill whenever the user mentions CI, CD, pre-commit, GitHub Actions, linting automation, or quality gates — even if they don't use those exact terms.
7