api-vector-db-pinecone
Pinecone Patterns
Quick Guide: Use
@pinecone-database/pinecone(v7.x) for serverless vector database operations. Target indexes by host (pc.index({ host })), not by name. Use namespaces for multi-tenant isolation (physically separate, cheaper queries). Batch upserts at 200 records (max 1,000 or 2 MB). Metadata is limited to 40 KB per record with flat key-value pairs only (no nested objects). Pinecone is eventually consistent -- vectors may not appear in queries immediately after upsert. UsedescribeIndexStats()to verify indexing progress. For hybrid search, usedotproductmetric with sparse+dense vectors in a single index.
<critical_requirements>
CRITICAL: Before Using This Skill
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
import type, named constants)
(You MUST target indexes by host URL, not by name -- pc.index({ host }) is the v7 API; pc.index('name') is deprecated)
(You MUST batch upserts to max 1,000 records or 2 MB per request -- exceeding either limit causes a 400 error)
(You MUST use flat key-value metadata only -- nested objects, null values, and keys starting with $ are rejected by Pinecone)
(You MUST handle eventual consistency -- vectors are not queryable immediately after upsert; use describeIndexStats() or retry logic for freshness-critical flows)
</critical_requirements>
Examples
- Core Patterns -- Client setup, index creation, upsert, query, fetch, update, delete
- Namespaces & Multi-Tenancy -- Namespace isolation, multi-tenant patterns, namespace management API
- Metadata Filtering -- Filter operators, compound filters, best practices
- Hybrid Search -- Sparse-dense vectors, hybrid index setup, alpha weighting
- Inference API -- Embedding generation, reranking, integrated inference indexes
- Batch Operations -- Chunked upserts, parallel ingestion, bulk import
Additional resources:
- reference.md -- API quick reference, filter operators, limits, decision frameworks, production checklist
Auto-detection: Pinecone, @pinecone-database/pinecone, createIndex, createIndexForModel, upsert, query, topK, includeMetadata, sparseValues, namespace, describeIndexStats, vector database, similarity search, embedding, cosine, dotproduct, euclidean, RAG retrieval, semantic search, pinecone-sparse-english, rerank, searchRecords, upsertRecords, fetchByMetadata
When to use:
- Semantic search over document embeddings (RAG retrieval)
- Similarity search for recommendations, deduplication, or classification
- Multi-tenant vector isolation using namespaces
- Hybrid semantic + keyword search using sparse-dense vectors
- Embedding generation and result reranking via Pinecone Inference API
Key patterns covered:
- Client setup and index management (serverless vs pod-based)
- Vector CRUD operations (upsert, query, fetch, update, delete)
- Metadata filtering with compound operators
- Namespace-based multi-tenancy
- Sparse-dense hybrid search
- Pinecone Inference API (embed, rerank)
- Batch ingestion with chunking and parallelism
- Integrated inference indexes (automatic embedding)
When NOT to use:
- Full-text search with complex boolean queries (use a dedicated search engine)
- Relational data with joins and transactions (use a relational database)
- Real-time streaming or pub/sub messaging (use a message broker)
- Storing large binary blobs or documents (use object storage; store only embeddings + metadata references)
Philosophy
Pinecone is a managed serverless vector database purpose-built for similarity search at scale. The core principle: store embeddings and metadata, query by vector similarity, filter by metadata.
Core principles:
- Vectors in, results out -- Pinecone stores high-dimensional vectors and returns the most similar ones. It is not a general-purpose database. Structure your data as embeddings + metadata references.
- Namespaces for isolation -- Use namespaces to physically separate tenant data. Queries scan only the target namespace, reducing cost and latency compared to metadata filtering across a shared namespace.
- Metadata is for filtering, not storage -- Keep metadata small (40 KB limit) and flat. Store document content in your primary database; store only filterable attributes (category, date, tenant ID) as Pinecone metadata.
- Batch for throughput -- Individual upserts are inefficient. Batch at 200 records for optimal throughput (max 1,000 or 2 MB per request).
- Eventual consistency is normal -- Freshly upserted vectors may not appear in query results immediately. Design your application to tolerate brief staleness or poll
describeIndexStats()before querying.
Core Patterns
Pattern 1: Client Initialization
Create a Pinecone client from an API key. See examples/core.md for full examples.
// Good Example
import { Pinecone } from "@pinecone-database/pinecone";
function createPineconeClient(): Pinecone {
const apiKey = process.env.PINECONE_API_KEY;
if (!apiKey) {
throw new Error("PINECONE_API_KEY environment variable is required");
}
return new Pinecone({ apiKey });
}
export { createPineconeClient };
Why good: API key from environment variable, validation before construction, named export
// Bad Example
import { Pinecone } from "@pinecone-database/pinecone";
const pc = new Pinecone({ apiKey: "sk-abc123..." });
// Hardcoded key leaks in version control
Why bad: Hardcoded API key is a security risk, no validation
Pattern 2: Index Targeting (v7 API)
Always target an index by its host URL, not its name. See examples/core.md.
// Good Example -- target by host
const indexModel = await pc.createIndex({
name: "products",
dimension: EMBEDDING_DIMENSION,
metric: "cosine",
spec: { serverless: { cloud: "aws", region: "us-east-1" } },
});
const index = pc.index({ host: indexModel.host });
Why good: pc.index({ host }) is the v7 API, avoids an extra API call to resolve the name to a host
// Bad Example -- target by name (deprecated)
const index = pc.index("products");
// Triggers an extra describeIndex call to resolve the host URL
Why bad: Targeting by name requires an extra network call and is deprecated in v7
Pattern 3: Upsert with Metadata
Upsert vectors with flat metadata for filtering. See examples/core.md for typed metadata.
// Good Example
interface DocumentMetadata {
title: string;
category: string;
createdAt: number; // Unix timestamp (numbers only, no Date objects)
}
const NAMESPACE = "articles";
await index.namespace(NAMESPACE).upsert({
records: [
{
id: "doc-1",
values: embedding, // number[] matching index dimension
metadata: { title: "Guide", category: "tutorial", createdAt: 1710000000 },
},
],
});
Why good: Typed metadata interface, flat key-value pairs, numeric timestamp (not Date), namespace isolation
Pattern 4: Query with Metadata Filter
Query for similar vectors with metadata filtering. See examples/metadata-filtering.md for all operators.
// Good Example
const TOP_K = 10;
const results = await index.namespace(NAMESPACE).query({
vector: queryEmbedding,
topK: TOP_K,
includeMetadata: true,
filter: {
$and: [
{ category: { $eq: "tutorial" } },
{ createdAt: { $gte: 1700000000 } },
],
},
});
for (const match of results.matches) {
console.log(match.id, match.score, match.metadata);
}
Why good: Named constant for topK, structured filter with $and, includes metadata in response
// Bad Example
const results = await index.query({
vector: queryEmbedding,
topK: 100,
includeMetadata: true,
filter: { tags: ["a", "b"] }, // INVALID: arrays are not valid filter values
});
Why bad: Missing namespace (queries default namespace), array filter syntax is invalid (use $in), no named constant for topK
Pattern 5: Namespace-Based Multi-Tenancy
Use namespaces for tenant isolation. See examples/namespaces.md.
// Good Example -- physically isolated tenant data
function getTenantIndex(pc: Pinecone, host: string, tenantId: string) {
return pc.index({ host }).namespace(`tenant-${tenantId}`);
}
// Each tenant's queries scan only their namespace
const tenantIndex = getTenantIndex(pc, INDEX_HOST, "acme-corp");
const results = await tenantIndex.query({ vector: embedding, topK: TOP_K });
Why good: Physical isolation per tenant, queries scan only the target namespace (lower cost and latency)
// Bad Example -- metadata filtering for multi-tenancy
await index.query({
vector: embedding,
topK: 10,
filter: { tenantId: { $eq: "acme-corp" } },
// Scans ENTIRE index, filters after -- expensive at scale
});
Why bad: Metadata filtering scans the full namespace regardless of filter selectivity, cost scales with total data not tenant data
Pattern 6: Pinecone Inference API
Generate embeddings and rerank results. See examples/inference.md.
// Good Example -- embed text
const embedResult = await pc.inference.embed({
model: "multilingual-e5-large",
inputs: [{ text: "What is machine learning?" }],
parameters: { inputType: "query", truncate: "END" },
});
const queryVector = embedResult.data[0].values;
Why good: Specifies inputType (query vs passage), handles truncation for long inputs
// Good Example -- rerank results
const rerankResult = await pc.inference.rerank({
model: "pinecone-rerank-v0",
query: "machine learning basics",
documents: results.matches.map((m) => ({
id: m.id,
text: m.metadata?.content as string,
})),
topN: 5,
returnDocuments: true,
});
Why good: Reranks query results for better relevance, limits output with topN
<decision_framework>
Decision Framework
Which Index Type?
Which Pinecone index type should I use?
|-- Serverless? (recommended for most use cases)
| |-- Variable or unpredictable traffic? -> Serverless (auto-scales, pay-per-use)
| |-- Starting a new project? -> Serverless (simpler, no capacity planning)
| '-- Need hybrid sparse-dense search? -> Serverless with dotproduct metric
|
'-- Pod-based? (legacy, specific needs)
|-- Need guaranteed low latency SLAs? -> Pod-based (dedicated compute)
'-- Using collections for snapshots? -> Pod-based (collections are pod-only)
Which Metric?
Which distance metric should I use?
|-- Using embeddings from a language model? -> cosine (normalized, most common)
|-- Need hybrid search (sparse + dense)? -> dotproduct (REQUIRED for hybrid)
|-- Comparing raw feature vectors? -> euclidean (absolute distance matters)
'-- Unsure? -> cosine (safe default for most embedding models)
Namespaces vs Metadata Filtering?
How should I isolate tenant data?
|-- Strict data isolation required? -> Namespaces (physical separation)
|-- Need to query across tenants? -> Metadata filtering (logical separation)
|-- Cost-sensitive at scale? -> Namespaces (query cost = tenant size, not total)
|-- Few tenants (< 10)? -> Either approach works
'-- Many tenants (100+)? -> Namespaces (metadata filtering scans everything)
Embedded Inference vs External Embeddings?
How should I generate embeddings?
|-- Want simplest architecture? -> Integrated inference (createIndexForModel)
| (Pinecone handles embedding automatically on upsert/query)
|
|-- Need a specific embedding model not hosted by Pinecone? -> External
| (Generate embeddings yourself, upsert raw vectors)
|
|-- Need hybrid search with sparse vectors? -> External sparse model
| (Use pinecone-sparse-english-v0 via inference API + your dense model)
|
'-- Need full control over embedding pipeline? -> External
(Custom preprocessing, chunking, model selection)
</decision_framework>
<red_flags>
RED FLAGS
High Priority Issues:
- Targeting index by name instead of host --
pc.index("name")is deprecated in v7; usepc.index({ host })to avoid an extra API call - Upserting vectors with wrong dimensions -- dimension mismatch causes a 400 error; verify your embedding model's output dimension matches the index
- Nested metadata objects -- Pinecone only supports flat key-value metadata; nested objects are silently ignored or rejected
- Using metadata filtering for multi-tenancy at scale -- scans the entire namespace regardless of filter selectivity; use namespaces instead
Medium Priority Issues:
- Missing
includeMetadata: truein queries -- metadata is NOT included by default; omitting this returns only IDs and scores - Upsert batches exceeding 1,000 records or 2 MB -- triggers a 400 error; chunk at 200 records for safety margin
- Using
Dateobjects in metadata -- Pinecone metadata supports strings, numbers, booleans, and string arrays only; convert dates to Unix timestamps - Not awaiting
createIndex()readiness -- index creation is async; the index is not ready for operations immediately aftercreateIndex()returns
Common Mistakes:
- Querying immediately after upsert and expecting results -- eventual consistency means freshly upserted vectors may not be queryable for seconds
- Using
topK> 1,000 withincludeMetadata: true-- the maxtopKis 1,000 when including metadata or values; without them, max is 10,000 - Passing array values as metadata filter values (
{ tags: ["a", "b"] }) -- use$inoperator instead:{ tags: { $in: ["a", "b"] } } - Forgetting that
deleteAll()without a namespace deletes from the default namespace only, not the entire index
Gotchas & Edge Cases:
describeIndexStats()returns approximate counts -- record counts are not exact in real-time, especially after recent upserts or deletes- Metadata values are always returned as their original types, but filter comparisons are type-strict --
$eq: "42"does not match numeric42 - Sparse vector indices must be positive 32-bit integers (uint32), and values must be non-zero floats -- zero values are silently dropped
listPaginated()returns vector IDs only (no values or metadata) -- usefetch()to get full vector data- The
$inand$ninoperators accept a maximum of 10,000 values each - Metadata keys cannot start with
$(reserved for operators) upsertis an upsert, not an insert -- upserting with an existing ID overwrites the previous vector and metadata entirely (no partial merge)update()merges metadata by default -- updating metadata replaces only the fields you specify, not the entire metadata object- The v7 SDK includes built-in automatic retry with exponential backoff for transient errors -- custom retry logic is only needed for fine-grained control or non-default retry policies
upsertRecords(integrated inference) accepts a direct array, not{ records: [...] }-- this differs from the regularupsertmethod which uses{ records: [...] }
</red_flags>
<critical_reminders>
CRITICAL REMINDERS
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
import type, named constants)
(You MUST target indexes by host URL, not by name -- pc.index({ host }) is the v7 API; pc.index('name') is deprecated)
(You MUST batch upserts to max 1,000 records or 2 MB per request -- exceeding either limit causes a 400 error)
(You MUST use flat key-value metadata only -- nested objects, null values, and keys starting with $ are rejected by Pinecone)
(You MUST handle eventual consistency -- vectors are not queryable immediately after upsert; use describeIndexStats() or retry logic for freshness-critical flows)
Failure to follow these rules will cause index creation failures, rejected upserts, empty query results, and degraded multi-tenant performance.
</critical_reminders>
More from agents-inc/skills
web-animation-css-animations
CSS Animation patterns - transitions, keyframes, scroll-driven animations, @property, GPU-accelerated properties, accessibility with prefers-reduced-motion
20web-testing-playwright-e2e
Playwright E2E testing patterns - test structure, Page Object Model, locator strategies, assertions, network mocking, visual regression, parallel execution, fixtures, and configuration
18web-animation-framer-motion
Motion (formerly Framer Motion) animation patterns - motion components, variants, gestures, layout animations, scroll-linked animations, accessibility
17web-animation-view-transitions
View Transitions API patterns - same-document transitions, cross-document MPA transitions, shared element animations, pseudo-element styling, accessibility
16web-styling-cva
Class Variance Authority - type-safe component variant styling with cva(), compound variants, and VariantProps
16web-performance-web-performance
Bundle optimization, render performance, Core Web Vitals
16