vector-db
vector-db
Purpose
This skill enables the management of vector databases for storing, indexing, and querying high-dimensional vectors, optimizing AI/ML workflows for tasks like similarity searches and embeddings.
When to Use
Use this skill for AI/ML applications requiring fast vector similarity queries, such as building recommendation engines, semantic search in NLP, or image retrieval systems. Apply it when dealing with large-scale vector data (e.g., embeddings from models like BERT) to avoid brute-force comparisons.
Key Capabilities
- Store vectors with metadata and perform efficient nearest-neighbor searches using indexes.
- Support distance metrics like cosine, Euclidean, and dot product for similarity calculations.
- Handle vector dimensions up to 2048 and scale to millions of entries.
- Integrate with embedding models for real-time vector generation and querying.
Usage Patterns
Invoke this skill via CLI for quick operations or through API calls in code. Always set the environment variable $VECTOR_DB_API_KEY for authentication before use. For CLI, prefix commands with vector-db and use JSON config files for complex setups (e.g., config.json with { "dimension": 768, "metric": "cosine" }). In code, use HTTP requests to the API endpoint, ensuring error checking on responses. Pattern: First, create an index; then, insert vectors; finally, query them.
Common Commands/API
Use the CLI tool vector-db or the API at https://api.openclaw.com/vector-db/v1. Authentication requires $VECTOR_DB_API_KEY in headers.
-
CLI Command: Create an index
vector-db create index --name myindex --dimension 768 --metric cosine --file config.json
This initializes a new index; ensure config.json specifies additional options like shards. -
CLI Command: Insert vectors
vector-db insert --index myindex --vectors "[0.1, 0.2, 0.3]" --id vec1
Vectors must be in JSON array format; use--batchflag for multiple inserts. -
API Endpoint: Query vectors
POST https://api.openclaw.com/vector-db/v1/indexes/myindex/query
Body:{ "vector": [0.1, 0.2, 0.3], "top_k": 5 }
Response: JSON array of nearest neighbors. -
API Endpoint: Delete index
DELETE https://api.openclaw.com/vector-db/v1/indexes/myindex
Include header:Authorization: Bearer $VECTOR_DB_API_KEY
Config format: Use JSON files like { "index_name": "myindex", "vector_size": 768, "distance": "cosine" } for CLI operations.
Integration Notes
Integrate with AI/ML tools by exporting vectors from models and using this skill for storage. Set $VECTOR_DB_API_KEY in your environment or .env file. For Python integration, use requests library:
import requests
headers = {'Authorization': f'Bearer {os.environ.get("VECTOR_DB_API_KEY")}' }
response = requests.post('https://api.openclaw.com/vector-db/v1/indexes/myindex/insert', json={'vectors': [[0.1, 0.2]]}, headers=headers)
Ensure the API base URL matches your deployment; handle rate limits by adding retries. For clustering with aimlops, link via shared IDs (e.g., use skill ID "vector-db" in workflows).
Error Handling
Common errors include authentication failures (HTTP 401) from missing $VECTOR_DB_API_KEY, invalid vector dimensions (e.g., mismatch with index), or network issues. To handle:
- Check for 401 errors and prompt user to set
$VECTOR_DB_API_KEY. - For invalid inputs, use try-except in code:
try: response = requests.post(url, json=data) response.raise_for_status() except requests.exceptions.HTTPError as e: print(f"Error: {e} - Check vector dimensions.") - CLI errors show as "Error: Invalid metric specified"; fix by verifying command flags. Always validate inputs before sending requests.
Concrete Usage Examples
-
Example: Building a simple search engine
First, create an index:vector-db create index --name searchindex --dimension 512.
Insert embeddings:vector-db insert --index searchindex --vectors '[[0.5, 0.6], [0.7, 0.8]]' --ids 'doc1,doc2'.
Query for similarities: Use API POST to/indexes/searchindex/querywith body{ "vector": [0.5, 0.6], "top_k": 3 }.
This pattern is ideal for NLP, e.g., searching similar documents based on embeddings. -
Example: Image similarity in ML pipeline
Generate image embeddings with a model, then store:vector-db insert --index imageindex --vectors '[[0.1, 0.2, 0.3]]' --metadata '{"url": "image1.jpg"}'.
Query for similar images: CLIvector-db query --index imageindex --vector [0.1, 0.2, 0.3] --top_k 5.
Integrate in code by fetching results and filtering by metadata, useful for recommendation systems.
Graph Relationships
- Connected to cluster: aimlops (e.g., shares data pipelines with data-processing skills).
- Relates to: embedding-generation skills (for vector creation) and query-optimization tools (for enhancing searches).
- Links with: ai skills for ML model integration and ml skills for training data storage.