skills/tiangong-ai/skills/dify-knowledge-base-upload

dify-knowledge-base-upload

SKILL.md

Dify Knowledge Base Upload

Use this skill when

  • The target Dify knowledge base is backed by a published pipeline datasource.
  • UI uploads succeed but document/create-by-file or create-by-text yields completed with 0 chunks.
  • You need to upload one local file and optionally write existing metadata fields afterward.

Prepare caller-side inputs

  • Provide a local file path to upload.
  • For pure text, save it as .txt or .md first and upload that file.
  • Set caller env values:
    • DIFY_API_BASE_URL
    • DIFY_DATASET_ID
    • DIFY_API_KEY
  • Optionally provide pipeline inputs JSON if the published pipeline exposes user input variables.
  • Optionally provide metadata JSON:
    • a flat object mapping existing Dify metadata field names to values, or
    • a list of { "name": ..., "value": ... } / { "id": ..., "value": ... } items.
    • The skill does not hardcode your metadata schema. Pass the current field names or ids in the JSON you provide at runtime.
  • Read references/env.md only if you need env or debug details.

Run workflow

  • Use scripts/upload_to_dataset.py to:
    • discover the published datasource plugin from ${DIFY_API_BASE_URL}/datasets/${DIFY_DATASET_ID}/pipeline/datasource-plugins?is_published=true
    • upload the file to ${DIFY_API_BASE_URL}/datasets/pipeline/file-upload
    • run the published pipeline through ${DIFY_API_BASE_URL}/datasets/${DIFY_DATASET_ID}/pipeline/run
    • poll ${DIFY_API_BASE_URL}/datasets/${DIFY_DATASET_ID}/documents/{batch}/indexing-status
    • fetch ${DIFY_API_BASE_URL}/datasets/${DIFY_DATASET_ID}/documents/{document_id}?metadata=without
    • optionally resolve metadata field names through ${DIFY_API_BASE_URL}/datasets/${DIFY_DATASET_ID}/metadata
    • optionally update metadata through ${DIFY_API_BASE_URL}/datasets/${DIFY_DATASET_ID}/documents/metadata
  • Upload without metadata:
python3 scripts/upload_to_dataset.py \
  --file /path/to/document.pdf
  • Upload with pipeline inputs:
python3 scripts/upload_to_dataset.py \
  --file /path/to/document.pdf \
  --inputs-json assets/example-pipeline-inputs.json
  • Upload and then apply metadata:
python3 scripts/upload_to_dataset.py \
  --file /path/to/document.pdf \
  --metadata-json assets/example-metadata.json

Interpret response

  • Success prints JSON with batch, document_id, file_upload_response, pipeline_run_response, indexing_status_response, document_response, optional metadata_response, and validation.
  • validation.ok is only true when:
    • indexing_status = "completed"
    • total_segments > 0
    • segment_count > 0
  • If tokens is returned by the document detail API, it should also be greater than 0. Some pipeline deployments leave tokens = null; the script reports that as a warning instead of a hard failure.
  • The script exits non-zero if the required checks fail.
  • If --metadata-json is omitted, the script uploads the file only and skips metadata API calls.
  • Metadata keys must already exist in the target dataset. Unknown names fail fast before the metadata write request.
  • assets/example-metadata.json is only a template. Replace its keys with your own existing Dify metadata field names before live use.
  • Use --dry-run to validate local files and request shape without calling Dify.

Troubleshoot quickly

  • If auth fails, verify DIFY_API_KEY and Authorization header format.
  • If datasource discovery fails, verify the dataset has a published pipeline datasource and the key belongs to that dataset.
  • If the pipeline run fails with missing input errors, inspect user_input_variables from discovery_response and pass the missing keys in --inputs-json.
  • If the published datasource is not local_file, this skill is not the right uploader.
  • If metadata resolution fails, rename keys in the metadata JSON to the exact field names defined in Dify.
  • If you only have raw text, save it to .txt or .md and upload that file.
  • In the current tested deployment, .md uploaded successfully while .txt hit a Dify-side indexing 400. If plain text fails as .txt, retry as .md.

References

  • references/env.md
  • references/request-response.md
  • references/testing.md

Assets

  • assets/example-pipeline-inputs.json
  • assets/example-metadata.json
Weekly Installs
5
GitHub Stars
4
First Seen
1 day ago
Installed on
openclaw5
gemini-cli4
github-copilot4
codex4
kimi-cli4
cursor4