skills/smithery.ai/openclaw-dify-kb-search

openclaw-dify-kb-search

SKILL.md

Dify Knowledge Base Search Skill

🔍 Search your Dify Knowledge Base to get accurate, contextual answers

This skill enables AI agents to query Dify datasets for RAG (Retrieval-Augmented Generation) context retrieval. Perfect for knowledge base Q&A, documentation search, and contextual AI responses.

Dify Knowledge Base

✨ Features

  • List Knowledge Bases - Discover all available Dify datasets
  • Smart Search - Query datasets with hybrid, semantic, or keyword search
  • Auto-Discovery - Automatically find available datasets if ID not provided
  • Configurable Results - Adjust top-k, search method, and reranking
  • Error Handling - Graceful error messages for debugging
  • Zero Hardcoding - All configuration via environment variables

🚀 Quick Start

1. Configure Environment Variables

Set up in openclaw.json:

{
  "env": {
    "vars": {
      "DIFY_API_KEY": "${DIFY_API_KEY}",
      "DIFY_BASE_URL": "https://dify.example.com/v1"
    }
  }
}

Environment Variables:

Variable Required Default Description
DIFY_API_KEY ✅ Yes - Your Dify API Key (from Settings → API)
DIFY_BASE_URL ❌ No http://localhost/v1 Your Dify instance base URL

2. Install Dependencies

pip3 install requests

🛠️ Tools

dify_list

Lists all available knowledge bases (datasets) in your Dify instance.

Invocation: dify_list tool

Example Response:

{
  "status": "success",
  "count": 2,
  "datasets": [
    {
      "id": "dataset-abc123",
      "name": "Product Documentation",
      "doc_count": 42,
      "description": "All product guides and tutorials"
    },
    {
      "id": "dataset-xyz789",
      "name": "API Reference",
      "doc_count": 156,
      "description": "REST API documentation"
    }
  ]
}

Usage:

{}

dify_search

Searches a Dify Dataset for relevant context chunks.

Invocation: dify_search tool (mapped to python3 scripts/search.py)

Parameters:

Parameter Type Required Default Description
query string ✅ Yes - Search query or question
dataset_id string ❌ No Auto-discover Specific dataset ID to search
top_k integer ❌ No 3 Number of results to return
search_method string ❌ No hybrid_search Search strategy
reranking_enable boolean ❌ No false Enable reranking for better results

Search Methods:

  • hybrid_search - Combine semantic + keyword search (recommended)
  • semantic_search - Meaning-based similarity search
  • keyword_search - Exact keyword matching

Example Usage:

{
  "query": "How do I configure OpenClaw?",
  "top_k": 5
}
{
  "query": "API authentication methods",
  "dataset_id": "dataset-xyz789",
  "search_method": "semantic_search",
  "reranking_enable": true
}

Example Response:

{
  "status": "success",
  "query": "How do I configure OpenClaw?",
  "dataset_id": "dataset-abc123",
  "count": 3,
  "results": [
    {
      "content": "To configure OpenClaw, edit the openclaw.json file...",
      "score": 0.8923,
      "title": "Installation Guide",
      "document_id": "doc-001"
    },
    {
      "content": "OpenClaw supports environment variables via...",
      "score": 0.8451,
      "title": "Configuration Options",
      "document_id": "doc-002"
    }
  ]
}

📋 Complete Workflow Example

[
  {
    "tool": "dify_list",
    "parameters": {}
  },
  {
    "tool": "dify_search",
    "parameters": {
      "query": "What are the system requirements?",
      "top_k": 5,
      "search_method": "hybrid_search"
    }
  }
]

🔧 Troubleshooting

Common Errors

Error Solution
Missing DIFY_API_KEY Set DIFY_API_KEY in environment variables
Connection refused Check DIFY_BASE_URL is correct and accessible
No datasets found Verify dataset exists in your Dify workspace
API request failed Check network connectivity and API key permissions

Debug Mode

Run manually to see detailed errors:

DIFY_API_KEY=your-key python3 scripts/search.py <<< '{"query":"test"}'

📚 Integration Tips

RAG Pipeline Integration

# Example: Use search results in AI response
results = dify_search(query, top_k=5)
context = "\n".join([r["content"] for r in results["results"]])
final_prompt = f"Answer based on context:\n\n{context}\n\nQuestion: {query}"

Multiple Datasets

For searching across multiple datasets, loop through them:

{
  "query": "Find information about authentication",
  "dataset_id": "dataset-api-docs"
}

Then query another dataset separately.

🔒 Security

  • Never commit API keys - Use environment variables or .env files
  • Rotate keys regularly - Generate new keys in Dify Settings
  • Restrict access - Limit API key permissions where possible

📖 Implementation Details

This skill uses the Dify Dataset API:

  • List Datasets: GET /v1/datasets
  • Search: POST /v1/datasets/{id}/retrieve

For API documentation, see: https://docs.dify.ai/reference/api-reference

📝 Changelog

v1.1.0 (2026-02-08):

  • ✅ Added search method selection (hybrid/semantic/keyword)
  • ✅ Added reranking support
  • ✅ Auto-discovery of datasets
  • ✅ Improved error handling
  • ✅ Removed hardcoded URLs (fully configurable)
  • ✅ Added detailed logging

v1.0.0 (2026-02-06):

  • Initial release
  • Basic list and search functionality
Weekly Installs
1
First Seen
Feb 28, 2026
Installed on
cursor1