ai-engineer

SKILL.md

AI Engineering

Build production LLM applications and AI systems.

When to Use

  • Integrating LLM APIs
  • Building RAG systems
  • Creating AI agents
  • Vector database setup
  • Token optimization

LLM Integration

API Setup

from anthropic import Anthropic

client = Anthropic()

def chat(messages: list[dict], system: str = None) -> str:
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        system=system or "You are a helpful assistant.",
        messages=messages
    )
    return response.content[0].text

# With retry and error handling
from tenacity import retry, stop_after_attempt, wait_exponential

@retry(stop=stop_after_attempt(3), wait=wait_exponential(min=1, max=10))
def safe_chat(messages, system=None):
    try:
        return chat(messages, system)
    except Exception as e:
        logger.error(f"LLM call failed: {e}")
        raise

Structured Output

import json

def extract_structured(text: str, schema: dict) -> dict:
    prompt = f"""Extract information from the text according to this schema:
{json.dumps(schema, indent=2)}

Text: {text}

Return valid JSON only."""

    response = chat([{"role": "user", "content": prompt}])
    return json.loads(response)

RAG System

Document Processing

from langchain.text_splitter import RecursiveCharacterTextSplitter

def chunk_documents(docs: list[str], chunk_size=1000, overlap=200):
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=overlap,
        separators=["\n\n", "\n", ". ", " "]
    )
    return splitter.split_documents(docs)

Vector Store

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams

client = QdrantClient(":memory:")  # or url="http://localhost:6333"

# Create collection
client.create_collection(
    collection_name="docs",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE)
)

# Upsert vectors
client.upsert(
    collection_name="docs",
    points=[
        {"id": i, "vector": embed(chunk), "payload": {"text": chunk}}
        for i, chunk in enumerate(chunks)
    ]
)

# Search
results = client.search(
    collection_name="docs",
    query_vector=embed(query),
    limit=5
)

RAG Query

def rag_query(question: str, top_k=5) -> str:
    # Retrieve relevant chunks
    results = client.search(
        collection_name="docs",
        query_vector=embed(question),
        limit=top_k
    )

    context = "\n\n".join([r.payload["text"] for r in results])

    prompt = f"""Answer based on the context below.

Context:
{context}

Question: {question}

Answer:"""

    return chat([{"role": "user", "content": prompt}])

Cost Optimization

  • Cache frequent queries
  • Use smaller models for simple tasks
  • Batch requests when possible
  • Track token usage per feature
  • Set max_tokens appropriately

Examples

Input: "Add AI chat to this app" Action: Set up LLM client, create chat endpoint, add error handling

Input: "Build RAG for documentation" Action: Chunk docs, create embeddings, set up vector store, implement search

Weekly Installs
7
Installed on
claude-code6
gemini-cli5
antigravity4
windsurf3
cursor3
trae2