cohere-best-practices
Cohere Best Practices Reference
Official Resources
- Docs & Cookbooks: https://github.com/cohere-ai/cohere-developer-experience
- API Reference: https://docs.cohere.com/reference/about
Model Selection Guide
| Use Case | Model | Notes |
|---|---|---|
| General chat/reasoning | command-a-03-2025 |
Latest Command A model |
| RAG with citations | command-r-plus-08-2024 |
Excellent grounded generation |
| Cost-sensitive tasks | command-r-08-2024 |
Good balance of quality/cost |
| Embeddings (English) | embed-english-v3.0 |
Best for English-only |
| Embeddings (Multilingual) | embed-multilingual-v3.0 |
100+ languages |
| Reranking | rerank-v3.5 |
Good balance |
| Reranking (Quality) | rerank-v4.0-pro |
Best quality, slower |
| Reranking (Speed) | rerank-v4.0-fast |
Optimized for latency |
API Configuration Best Practices
Use Client V2
import cohere
# Correct: Use ClientV2 for all new projects
co = cohere.ClientV2()
# Deprecated: Don't use the old client
# co = cohere.Client() # Avoid
Temperature Settings
# For agents/tool calling - lower temperature for reliability
co.chat(model="command-a-03-2025", temperature=0.3, ...)
# For creative tasks - higher temperature
co.chat(model="command-a-03-2025", temperature=0.7, ...)
# For deterministic outputs - zero temperature
co.chat(model="command-a-03-2025", temperature=0, ...)
Embedding Best Practices
Always Specify input_type
# For documents being indexed
doc_embeddings = co.embed(
texts=documents,
model="embed-english-v3.0",
input_type="search_document", # Critical!
embedding_types=["float"]
)
# For search queries
query_embedding = co.embed(
texts=[query],
model="embed-english-v3.0",
input_type="search_query", # Must match at query time
embedding_types=["float"]
)
Critical: Mismatched
input_typebetween indexing and querying will degrade search quality significantly.
RAG Best Practices
Two-Stage Retrieval Pattern
# Stage 1: Broad retrieval with embeddings
candidates = vectorstore.similarity_search(query, k=30)
# Stage 2: Precise reranking
reranked = co.rerank(
model="rerank-v3.5",
query=query,
documents=[doc.page_content for doc in candidates],
top_n=5
)
# Use reranked results for generation
final_docs = [candidates[r.index] for r in reranked.results]
Grounded Generation with Citations
response = co.chat(
model="command-r-plus-08-2024",
messages=[{"role": "user", "content": question}],
documents=[
{"id": f"doc_{i}", "data": {"text": doc}}
for i, doc in enumerate(final_docs)
]
)
# Access citations
for citation in response.message.citations:
print(f"'{citation.text}' from {citation.sources}")
Error Handling
from cohere.core import ApiError
def safe_chat(messages, max_retries=3):
for attempt in range(max_retries):
try:
return co.chat(
model="command-a-03-2025",
messages=messages
)
except ApiError as e:
if e.status_code == 429: # Rate limit
time.sleep(2 ** attempt)
continue
elif e.status_code >= 500: # Server error
time.sleep(1)
continue
else:
raise
raise Exception("Max retries exceeded")
Cost Optimization
- Use appropriate models: Don't use Command A for simple tasks
- Batch embeddings: Embed multiple texts in one call (up to 96 texts)
- Cache embeddings: Store computed embeddings in a vector database
- Use reranking wisely: Only rerank when quality matters
- Stream for UX: Streaming doesn't cost more but improves perceived latency
Production Checklist
- Use
ClientV2for all API calls - Set appropriate
temperaturefor your use case - Always specify
input_typefor embeddings - Implement retry logic with exponential backoff
- Use two-stage retrieval for RAG
- Cache embeddings to reduce API calls
- Monitor token usage and costs
- Handle rate limits gracefully
More from rshvr/unofficial-cohere-best-practices
unofficial-cohere-best-practices
Unofficial best practices guide for Cohere's AI APIs. Use when working with Cohere models for chat/text generation (Command A, Command R+, Command R), embeddings (Embed v4, v3), reranking (Rerank v4, v3.5), streaming, structured outputs, RAG, tool use/function calling, or agents. Supports Python, TypeScript, Java, and Go SDKs, plus LangChain/LangGraph integrations. Triggers on mentions of Cohere API, Command models, CohereEmbeddings, ChatCohere, CohereRerank, cohere-ai, or any Cohere-related development task.
2cohere-langchain
Cohere LangChain integration reference for ChatCohere, CohereEmbeddings, CohereRerank, and CohereRagRetriever. Use for building RAG pipelines, chains, and tool-calling workflows with LangChain.
1cohere-typescript-sdk
Cohere TypeScript/JavaScript SDK reference for chat, streaming, embeddings, reranking, and tool use. Use when building Node.js or browser applications with Cohere APIs.
1cohere-python-sdk
Cohere Python SDK reference for chat, streaming, tool use, structured outputs, and RAG. Use when building Python applications with Cohere's Command models, embeddings, or reranking APIs.
1