together-embeddings

Installation
SKILL.md

Together Embeddings & Reranking

Overview

Use this skill for semantic retrieval components:

  • create embeddings
  • batch embeddings
  • build retrieval or RAG pipelines
  • rerank retrieved candidates

This skill is for retrieval plumbing, not for the final language-model response itself.

When This Skill Wins

  • Build vector search or semantic similarity features
  • Add embedding generation to a data pipeline
  • Improve retrieval quality with reranking
  • Assemble a retrieval stage before calling a chat model

Hand Off To Another Skill

  • Use together-chat-completions for the final answer-generation step
  • Use together-batch-inference for very large offline embedding backfills
  • Use together-dedicated-endpoints when reranking requires a dedicated deployment

Quick Routing

Workflow

  1. Confirm that the user needs vectors or retrieval, not direct generation.
  2. Choose the embedding model and batch shape.
  3. Generate embeddings for corpus and query paths consistently.
  4. Retrieve candidates. An in-memory cosine-similarity store works for prototyping and small corpora (see semantic_search.py). Use a dedicated vector database for production scale.
  5. Rerank only when the extra latency and endpoint requirement are justified. When no dedicated rerank endpoint is available, cosine-similarity ranking is a reasonable fallback.

High-Signal Rules

  • Python scripts require the Together v2 SDK (together>=2.0.0). If the user is on an older version, they must upgrade first: uv pip install --upgrade "together>=2.0.0".
  • Keep embeddings and reranking conceptually separate; rerank is a second-stage precision step.
  • Reranking in this repo assumes a dedicated endpoint. Do not promise serverless rerank unless the product changes. When no endpoint is available, fall back to cosine-similarity ranking.
  • The embedding model has a 514-token context limit. Chunk longer documents before embedding.
  • The rag_pipeline.py example demonstrates retrieval plus generation; treat generation as a hand-off to chat completions.
  • Preserve model consistency across indexing and querying.

Resource Map

Official Docs

Related skills

More from zainhas/togetherai-skills

Installs
10
GitHub Stars
2
First Seen
Feb 27, 2026