ollama-integration
Ollama Integration
Integrate Ollama for local LLM inference in TypeScript applications. Ollama provides a simple API for running language models locally.
When to Apply
Use this skill when:
- Running LLMs locally without cloud APIs
- Generating text or embeddings with Ollama
- Building AI features that need to work offline
- Implementing RAG pipelines with local models
- Testing AI applications without API costs
Prerequisites
Install Ollama
# macOS
brew install ollama
# Linux
curl -fsSL https://ollama.com/install.sh | sh
# Start the server
ollama serve
Pull Required Models
# Embedding model (768 dimensions)
ollama pull nomic-embed-text
# Chat/generation model
ollama pull mistral
# Alternative models
ollama pull llama2
ollama pull codellama
OllamaClient Implementation
Complete TypeScript client for Ollama API:
// src/utils/ollama.ts
import fetch from 'cross-fetch';
interface OllamaResponse {
model: string;
created_at: string;
response: string;
done: boolean;
}
interface OllamaEmbeddingResponse {
embedding: number[];
}
interface OllamaChatMessage {
role: 'system' | 'user' | 'assistant';
content: string;
}
interface OllamaChatResponse {
model: string;
created_at: string;
message: OllamaChatMessage;
done: boolean;
}
export class OllamaClient {
private baseUrl: string;
constructor(baseUrl?: string) {
this.baseUrl = baseUrl || process.env.OLLAMA_HOST || 'http://localhost:11434';
}
// Generate embeddings for text
async generateEmbedding(
text: string,
model: string = 'nomic-embed-text'
): Promise<number[]> {
const response = await fetch(`${this.baseUrl}/api/embeddings`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ model, prompt: text }),
});
if (!response.ok) {
throw new Error(`Failed to generate embedding: ${response.statusText}`);
}
const data: OllamaEmbeddingResponse = await response.json();
return data.embedding;
}
// Generate text response (non-streaming)
async generateResponse(
prompt: string,
context?: string,
model: string = 'mistral'
): Promise<string> {
const fullPrompt = context
? `Context: ${context}\n\nQuestion: ${prompt}\n\nAnswer:`
: prompt;
const response = await fetch(`${this.baseUrl}/api/generate`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model,
prompt: fullPrompt,
stream: false,
}),
});
if (!response.ok) {
throw new Error(`Failed to generate response: ${response.statusText}`);
}
const data: OllamaResponse = await response.json();
return data.response;
}
// Generate text response (streaming)
async generateStreamingResponse(
prompt: string,
onChunk: (chunk: string) => void,
context?: string,
model: string = 'mistral'
): Promise<void> {
const fullPrompt = context
? `Context: ${context}\n\nQuestion: ${prompt}\n\nAnswer:`
: prompt;
const response = await fetch(`${this.baseUrl}/api/generate`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model,
prompt: fullPrompt,
stream: true,
}),
});
if (!response.ok) {
throw new Error(`Failed to generate streaming response: ${response.statusText}`);
}
const reader = response.body?.getReader();
if (!reader) {
throw new Error('Failed to get response reader');
}
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n').filter(Boolean);
for (const line of lines) {
try {
const data: OllamaResponse = JSON.parse(line);
if (data.response) {
onChunk(data.response);
}
} catch {
// Skip malformed JSON lines
}
}
}
}
// Chat completion API
async chat(
messages: OllamaChatMessage[],
model: string = 'mistral'
): Promise<string> {
const response = await fetch(`${this.baseUrl}/api/chat`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model,
messages,
stream: false,
}),
});
if (!response.ok) {
throw new Error(`Failed to chat: ${response.statusText}`);
}
const data: OllamaChatResponse = await response.json();
return data.message.content;
}
// List available models
async listModels(): Promise<string[]> {
const response = await fetch(`${this.baseUrl}/api/tags`);
if (!response.ok) {
throw new Error(`Failed to list models: ${response.statusText}`);
}
const data = await response.json();
return data.models?.map((m: { name: string }) => m.name) || [];
}
// Check if Ollama is running
async isHealthy(): Promise<boolean> {
try {
const response = await fetch(`${this.baseUrl}/api/tags`);
return response.ok;
} catch {
return false;
}
}
}
API Endpoints
| Endpoint | Method | Purpose |
|---|---|---|
/api/embeddings |
POST | Generate embeddings |
/api/generate |
POST | Generate text completion |
/api/chat |
POST | Chat completion |
/api/tags |
GET | List available models |
/api/pull |
POST | Pull a model |
Usage Examples
Basic Text Generation
const ollama = new OllamaClient();
const response = await ollama.generateResponse(
'Explain machine learning in simple terms'
);
console.log(response);
With Context (RAG)
const context = 'Our company was founded in 2020 and has 50 employees.';
const question = 'When was the company founded?';
const response = await ollama.generateResponse(question, context);
// "Based on the context, the company was founded in 2020."
Streaming Response
await ollama.generateStreamingResponse(
'Write a short poem about coding',
(chunk) => process.stdout.write(chunk)
);
Chat Conversation
const messages = [
{ role: 'system', content: 'You are a helpful coding assistant.' },
{ role: 'user', content: 'How do I reverse a string in JavaScript?' },
];
const response = await ollama.chat(messages);
console.log(response);
Generate Embeddings
const text = 'Machine learning is a subset of artificial intelligence.';
const embedding = await ollama.generateEmbedding(text);
console.log(`Embedding dimensions: ${embedding.length}`); // 768 for nomic-embed-text
Model Selection
Embedding Models
| Model | Dimensions | Speed | Quality |
|---|---|---|---|
nomic-embed-text |
768 | Fast | Good |
mxbai-embed-large |
1024 | Medium | Better |
all-minilm |
384 | Very Fast | Acceptable |
Generation Models
| Model | Size | Speed | Use Case |
|---|---|---|---|
mistral |
7B | Fast | General purpose |
llama2 |
7B | Fast | General purpose |
codellama |
7B | Fast | Code generation |
mixtral |
8x7B | Slow | High quality |
Environment Configuration
# Default Ollama host
export OLLAMA_HOST=http://localhost:11434
# For Docker/CI environments
export OLLAMA_HOST=http://ollama:11434
Testing with Ollama
import { OllamaClient } from '../src/utils/ollama';
let ollama: OllamaClient;
beforeAll(() => {
ollama = new OllamaClient();
});
test('should generate embedding', async () => {
const embedding = await ollama.generateEmbedding('test text');
expect(embedding).toHaveLength(768);
expect(embedding.every(n => typeof n === 'number')).toBe(true);
});
test('should generate response', async () => {
const response = await ollama.generateResponse('Say hello');
expect(response).toBeTruthy();
expect(typeof response).toBe('string');
});
CI/CD Integration
In GitHub Actions, use the Ollama service container:
services:
ollama:
image: ollama/ollama:latest
ports:
- 11434:11434
env:
OLLAMA_HOST: http://ollama:11434
steps:
- name: Pull models
run: |
wget -q -O - --post-data='{"name": "nomic-embed-text"}' \
--header='Content-Type: application/json' \
http://ollama:11434/api/pull
wget -q -O - --post-data='{"name": "mistral"}' \
--header='Content-Type: application/json' \
http://ollama:11434/api/pull
Error Handling
async function safeGenerate(prompt: string): Promise<string | null> {
const ollama = new OllamaClient();
// Check if Ollama is running
if (!await ollama.isHealthy()) {
console.error('Ollama is not running');
return null;
}
try {
return await ollama.generateResponse(prompt);
} catch (error) {
console.error('Generation failed:', error);
return null;
}
}
Troubleshooting
| Issue | Solution |
|---|---|
| "Connection refused" | Start Ollama: ollama serve |
| "Model not found" | Pull model: ollama pull <model> |
| Slow responses | Use smaller model or reduce prompt length |
| Out of memory | Use quantized model or smaller context |
| Timeout errors | Increase timeout or use streaming |
Package Dependencies
{
"dependencies": {
"cross-fetch": "^4.1.0"
}
}
References
- Related skill:
rag-pipelinefor complete RAG implementation - Related skill:
pgvector-embeddingsfor storing embeddings - Related skill:
github-workflows-ollamafor CI/CD setup - Ollama documentation
- Ollama API reference
More from constructive-io/constructive-skills
constructive-boilerplate-nextjs-app
Set up and develop with the Constructive App frontend boilerplate — a Next.js application with authentication, organization management, invites, members, and a GraphQL SDK. Use when scaffolding a new Constructive frontend application from the boilerplate.
17cnc-execution-engine
Execute GraphQL queries against Constructive APIs using the cnc CLI. Use when asked to "run a query", "execute GraphQL", "set up API context", "configure API token", "manage API endpoints", or when working with Constructive GraphQL APIs.
17drizzle-orm-test
Test PostgreSQL databases with Drizzle ORM using drizzle-orm-test. Use when asked to "test with Drizzle", "test Drizzle ORM", "write type-safe database tests", or when testing applications using Drizzle ORM.
17readme-formatting
Format README files with Constructive branding including header logos and badges. Use when creating new packages, publishing modules, or when asked to "add header image", "add badges", "format README", or "standardize README".
16constructive-safegres
Safegres is Constructive's security protocol for expressing authorization as Authz* policy nodes (types + JSON configs). This skill defines each Authz* type, its config shape, semantics, and when to use it. No SQL and no SDK/grant/RLS steps.
16github-workflows-ollama
Configure GitHub Actions workflows for Ollama and pgvector testing. Use when asked to "set up CI for RAG", "configure Ollama in CI", "test embeddings in GitHub Actions", or when building CI/CD pipelines for AI applications with pgvector.
16