itemized-functions
Overview
This skill analyzes architecture files to identify all external integrations (APIs, services, databases, etc.), then generates:
- Function wrappers (
function_*.py) — Clean, tested functions that use each integration exactly as specified in the architecture - Individual test files (
test_*.py) — Comprehensive tests for each function covering success cases, failure modes, edge cases, and diverse input types - Heavy API test suites (when needed) — Separate test files for integrations requiring extensive data or API calls
- Integration test runner (
run_all_tests.py) — Master test orchestrator that runs all tests, collects results, and generates the final report - Debug log (
integrations.debug.log) — Complete execution trace for troubleshooting - Summary report (
ITEMIZED_FUNCTIONS_REPORT.md) — Detailed findings: function signatures, actual API responses (sanitized), latency metrics, test results, learnings
Purpose: Understand the exact behavior of every 3rd-party integration before building the full project. No assumptions. Real execution. Real data.
Activation
Trigger phrases:
- "generate itemized functions"
- "create integration tests"
- "itemized functions from architecture"
- "test all integrations"
Required input: Architecture files must be provided or referenced. The skill reads the full architecture to understand all integrations, their purpose, and how they're used.
Output: All generated files are created in an integration_tests/ directory at the repository root.
Workflow
Phase 1: Architecture Analysis
- Read all provided architecture files (scan thoroughly, don't ask questions)
- Identify all external integrations:
- API services (Ollama, OpenAI, Anthropic, etc.)
- Database systems (PostgreSQL, MongoDB, Redis, etc.)
- File processing tools (GitHub Linguist, ImageMagick, etc.)
- Message queues, cache systems, external webhooks, etc.
- For each integration, extract:
- Primary purpose (what the architecture says it's used for)
- Specific use cases (e.g., "Ollama for tool calling" vs "Ollama for chat")
- Expected inputs/outputs
- Failure modes to test
- Performance constraints or requirements
- Determine test complexity:
- Standard test: ≤10 API calls, <50MB data, simple responses
- Heavy test: >10 API calls, >50MB data, streaming responses, or complex orchestration
Phase 2: Credential Setup
Generate .env.dev at repository root with template entries for all required credentials:
# Ollama
OLLAMA_API_URL=http://localhost:11434
OLLAMA_MODEL=neural-chat
# GitHub Linguist
GITHUB_LINGUIST_PATH=/path/to/github-linguist
# PostgreSQL
DB_HOST=localhost
DB_PORT=5432
DB_NAME=testdb
DB_USER=testuser
DB_PASSWORD=
# [Other integrations...]
Note: User must fill in actual values before running tests.
Phase 3: Generate Function Wrappers
For each integration, create integration_tests/function_[service].py:
Requirements:
- Clean, production-ready function signatures
- Proper type hints (Python 3.8+)
- Error handling with meaningful error messages
- Timeout handling (appropriate per service)
- Input validation where needed
- Logging to
integrations.debug.log - Return consistent, testable response objects
Example structure:
import os
import logging
from typing import Any, Dict, List
import requests
from datetime import datetime
logger = logging.getLogger(__name__)
def call_ollama_chat(prompt: str, model: str = None, temperature: float = 0.7, timeout: int = 30) -> Dict[str, Any]:
"""
Call Ollama API for chat completion.
Args:
prompt: The user prompt
model: Model name (uses OLLAMA_MODEL env var if not provided)
temperature: Sampling temperature (0.0-1.0)
timeout: Request timeout in seconds
Returns:
Dict with keys: response, model, created_at, latency_ms
Raises:
ValueError: If credentials/config missing
requests.Timeout: If request exceeds timeout
requests.RequestException: For API errors
"""
try:
start_time = datetime.now()
api_url = os.getenv("OLLAMA_API_URL", "http://localhost:11434")
model = model or os.getenv("OLLAMA_MODEL")
if not model:
raise ValueError("OLLAMA_MODEL not set in environment")
response = requests.post(
f"{api_url}/api/chat",
json={
"model": model,
"messages": [{"role": "user", "content": prompt}],
"temperature": temperature,
"stream": False
},
timeout=timeout
)
response.raise_for_status()
latency_ms = (datetime.now() - start_time).total_seconds() * 1000
result = response.json()
result["latency_ms"] = latency_ms
logger.debug(f"Ollama chat call successful. Latency: {latency_ms}ms")
return result
except requests.Timeout:
logger.error(f"Ollama API timeout after {timeout}s")
raise
except requests.RequestException as e:
logger.error(f"Ollama API error: {str(e)}")
raise
except Exception as e:
logger.error(f"Unexpected error calling Ollama: {str(e)}")
raise
Phase 4: Generate Test Files
For each function, create integration_tests/test_[service].py:
Requirements:
- Use pytest framework
- Test success cases with typical inputs
- Test success cases with diverse/edge case inputs (smart per-service)
- Test failure modes: auth failure, timeout, malformed response, rate limiting, service down
- Capture actual API responses (sanitize credentials)
- Measure latency
- Each test logs to
integrations.debug.log - Each test validates response structure
Example structure:
import pytest
import os
from unittest.mock import patch, MagicMock
import requests
from function_ollama import call_ollama_chat
@pytest.fixture
def setup_env(monkeypatch):
"""Setup environment variables for testing."""
monkeypatch.setenv("OLLAMA_API_URL", "http://localhost:11434")
monkeypatch.setenv("OLLAMA_MODEL", "neural-chat")
class TestOllamaChatSuccess:
"""Test successful Ollama chat calls."""
def test_basic_chat(self, setup_env):
"""Test basic chat completion."""
response = call_ollama_chat("What is 2+2?")
assert "response" in response
assert response["model"] == "neural-chat"
assert "latency_ms" in response
assert response["latency_ms"] > 0
def test_chat_with_temperature(self, setup_env):
"""Test chat with different temperature values."""
for temp in [0.0, 0.5, 1.0]:
response = call_ollama_chat("Tell a story", temperature=temp)
assert "response" in response
assert response["latency_ms"] > 0
def test_long_prompt(self, setup_env):
"""Test with very long prompt."""
long_prompt = "What is the meaning of life? " * 100
response = call_ollama_chat(long_prompt)
assert "response" in response
class TestOllamaChatFailures:
"""Test failure modes."""
def test_auth_failure(self, setup_env, monkeypatch):
"""Test behavior when API authentication fails."""
monkeypatch.setenv("OLLAMA_API_URL", "http://invalid-url:11434")
with pytest.raises(requests.RequestException):
call_ollama_chat("test")
def test_timeout(self, setup_env, monkeypatch):
"""Test timeout handling."""
with patch('requests.post') as mock_post:
mock_post.side_effect = requests.Timeout()
with pytest.raises(requests.Timeout):
call_ollama_chat("test", timeout=1)
def test_missing_credentials(self, monkeypatch):
"""Test when required env vars are missing."""
monkeypatch.delenv("OLLAMA_MODEL", raising=False)
with pytest.raises(ValueError):
call_ollama_chat("test")
Phase 5: Heavy API Test Suites (When Applicable)
If an integration qualifies as "heavy" (>10 API calls, >50MB data, streaming, complex orchestration), create a separate integration_tests/heavy_test_[service].py:
Include in heavy tests:
- Large data processing (if applicable)
- Streaming response handling
- Multiple chained API calls
- Performance benchmarks
- Resource usage patterns
- Rate limiting behavior
Header with reasoning:
"""
Heavy API test suite for [service].
Reasoning:
- [Service] requires extensive testing due to [specific reason]:
- Streaming responses with large payloads
- Multiple chained API calls (25+ total)
- Data processing >50MB
- Complex state management across calls
- Critical performance path in architecture
These tests are separated from standard tests to avoid:
- Excessive API quota usage during CI/CD
- Extended test execution time
- Unnecessary load on rate-limited endpoints
"""
Phase 6: Integration Test Runner
Create integration_tests/run_all_tests.py:
Responsibilities:
- Import and run all test files using pytest
- Collect results: passed, failed, skipped
- For each test, capture:
- Test name
- Status (passed/failed/skipped)
- Execution time
- Error message (if failed)
- Sample response data (if success)
- Aggregate latency metrics per service
- Generate summary statistics
- Write all to
ITEMIZED_FUNCTIONS_REPORT.md
Example output structure:
Test Results Summary:
- Total: 42 tests
- Passed: 38
- Failed: 2
- Skipped: 2
Service Latency Metrics:
- Ollama Chat: avg 145ms, min 89ms, max 287ms (10 calls)
- GitHub Linguist: avg 234ms, min 156ms, max 412ms (8 calls)
- PostgreSQL: avg 12ms, min 8ms, max 31ms (10 calls)
[Detailed results for each service...]
Phase 7: Debug Logging
All generated code writes to integration_tests/integrations.debug.log:
- Timestamp, log level, service name, message
- Request/response bodies (sanitize credentials)
- Timing information
- Error stack traces
- Environment info (for debugging credential/setup issues)
Format:
[2024-01-15 14:32:15.342] DEBUG [ollama] Calling /api/chat with model=neural-chat
[2024-01-15 14:32:15.521] DEBUG [ollama] Response received: 145ms latency, 1250 chars
[2024-01-15 14:32:16.012] ERROR [github-linguist] FAILED_TO_TEST - Connection refused (auth_required, network_error, timeout, api_error, etc.)
Phase 8: Summary Report
Create ITEMIZED_FUNCTIONS_REPORT.md:
Structure:
# Itemized Functions Report
**Generated:** [timestamp]
**Architecture Analyzed:** [list of architecture files]
**Total Integrations Tested:** [count]
**Test Success Rate:** [X%]
## Executive Summary
- [count] integrations identified and tested
- [X] tests passed, [Y] failed, [Z] skipped
- Key findings and blockers (if any)
## Integration Details
### [Service Name] (e.g., Ollama)
**Purpose (from architecture):** [extracted from architecture]
**Function Signature:**
\`\`\`python
def call_ollama_chat(prompt: str, model: str = None, temperature: float = 0.7, timeout: int = 30) -> Dict[str, Any]
\`\`\`
**Test Coverage:** [count tests, all passed/mixed/failed]
**Latency:** avg X ms, min Y ms, max Z ms (10 calls)
**Sample API Response (sanitized):**
\`\`\`json
{
"response": "2 + 2 = 4",
"model": "neural-chat",
"created_at": "2024-01-15T14:32:15Z",
"latency_ms": 145
}
\`\`\`
**Failure Modes Tested:**
- ✓ Timeout (handled correctly, raises Timeout exception)
- ✓ Auth failure (handled correctly, raises RequestException)
- ✓ Malformed response (handled correctly, raises JSONDecodeError)
- ✓ Service unavailable (raises ConnectionError)
**Key Learnings:**
- [Finding 1]: [detail]
- [Finding 2]: [detail]
- [Gotcha/quirk if discovered]: [detail]
**Heavy Tests:** None
(or if applicable: `heavy_test_ollama.py` — [reason])
---
### [Next Service...]
[Same structure as above]
---
## Failed Tests & Blockers
### [Service Name] - FAILED_TO_TEST
**Reason:** [auth_required, network_error, timeout, api_error, service_down, etc.]
**Error Message:** [exact error]
**Suggestion:** [how to resolve, e.g., "Set OLLAMA_API_URL in .env.dev and ensure Ollama service is running"]
---
## Cross-Service Insights
[Any patterns, dependencies, or interactions discovered across integrations]
---
## Recommendations
- [Any critical issues or setup requirements]
- [Performance or scaling considerations]
- [Dependencies between services]
---
## Test Execution Log
[Link to or excerpt from integrations.debug.log]
Standards & Requirements
Code Generation
- Python 3.8+ compatible — Type hints, f-strings, async/await support if needed
- Error messages are meaningful — Not generic "error occurred", but "OLLAMA_API_URL not set" or "Connection refused on localhost:11434"
- Timeouts are sensible per service — LLM APIs: 60s, Database: 10s, File processing: 30s, etc.
- No hardcoded values — All config from environment
- Logging is comprehensive — Every significant action logged
Testing
- Pytest conventions —
test_*.pyfiles, fixtures, clear test names, assertions with messages - Real execution — Actually call the APIs (not mocked), unless explicitly impossible
- Diverse inputs per service — Smart, not generic:
- LLM APIs: different prompt lengths, temperatures, contexts
- File processing: different file types/sizes
- Databases: different query patterns, edge cases
- Time-series: different time ranges, aggregations
- Failure mode coverage — Auth, network, timeout, rate limit, malformed data
- Response validation — Verify structure, types, required fields
- Latency tracking — Measure every call
Report Generation
- No credentials exposed — Sanitize all env vars, passwords, tokens, API keys in report
- Full response bodies — Show actual data (sanitized) so developer understands exact format
- Timestamps throughout — When was this run, when was each test executed
- Actionable findings — Not just "failed" but "why" and "how to fix"
- Report all failures and anomalies — Do not omit unexpected behavior because it seems minor. Every quirk discovered now prevents a production incident later
- Output goes under the user's name — The report becomes their reference document. Incomplete or softened findings waste the testing effort
Special Cases
Streaming APIs (e.g., LLM chat with stream=true):
- Test both streamed and non-streamed responses
- Measure latency from first token to completion
- Validate chunk format
Databases:
- Test CRUD operations
- Test connection pooling if applicable
- Test transaction handling
- Test query timeouts
File Processing APIs:
- Test with multiple file types
- Test error handling for unsupported formats
- Test large file handling
Rate-Limited APIs:
- Detect rate limit headers
- Test behavior when rate limited
- Suggest backoff strategies in report
Webhook/Async APIs:
- If applicable, test callback handling
- Test idempotency if needed
Directory Structure (Final)
integration_tests/
├── .env.dev # Template credentials file
├── integrations.debug.log # Debug log from test execution
├── ITEMIZED_FUNCTIONS_REPORT.md # Final summary report
├── run_all_tests.py # Master test runner
├── function_ollama.py # Function wrapper
├── test_ollama.py # Standard tests
├── heavy_test_ollama.py # (if needed) Heavy tests
├── function_github_linguist.py # Another wrapper
├── test_github_linguist.py # Standard tests
├── function_postgres.py # Another wrapper
├── test_postgres.py # Standard tests
└── [More function/test pairs...]
Execution
- User provides architecture files and triggers skill
- Skill generates all files in
integration_tests/ - User fills in
.env.devwith actual credentials - User runs:
python run_all_tests.py - All tests execute, report generated
- Developer reviews
ITEMIZED_FUNCTIONS_REPORT.mdandintegrations.debug.log
Quality Assurance
- No questions asked — Read architecture, infer integrations, generate tests
- LLM confidence in test design — Trust that generated tests cover the right scenarios
- Sanitization is rigorous — Scan all report content for credentials before writing
- File encoding is UTF-8 — Handle responses with special characters correctly
- All files importable — Generated Python is syntactically correct and runs
More from harshitsinghbhandari/domain-expansion
architecture-audit
Comprehensive architecture audit that combines ruthless analysis with solution-focused improvement planning. Reads architecture Markdown files, produces a brutal audit report with file/component scores, and generates a prioritized improvements roadmap.
10code-quality-audit
Comprehensive code quality audit that combines ruthless analysis with a solution-focused refactoring roadmap. Reads source code files, produces a brutal audit report with per-file quality scores, and generates prioritized refactoring improvements.
8pr-review
Comprehensive PR review focusing on code quality, test coverage, security, backward compatibility, and what CI cannot check. Use when reviewing PRs, when asked to review code changes, or when the user mentions "review PR", "code review", or "check this PR".
7code-refactor-executor
Executes a multi-stage refactoring plan based on existing `audit.md` and `improvements.md` files. Reads the recommendations, scans the target source code, and builds an implementation roadmap before applying atomic code transformations.
7test-coverage-audit
Comprehensive test suite audit that combines ruthless analysis with a solution-focused roadmap. Reads test suites (unit, integration, e2e) and source code, produces a brutal audit report of test quality and gaps, and generates prioritized testing improvements.
7llm-council
Convene an LLM Council for high-stakes decisions requiring multiple perspectives. Use when the user says "council this", "run the council", "get council input", or asks for multi-perspective analysis on strategic decisions like pricing, positioning, pivots, hiring, or any question where being wrong is expensive. Do NOT use for simple factual questions, writing tasks, or summarization.
6