alicloud-ai-multimodal-qwen-vl
SKILL.md
Category: provider
Model Studio Qwen VL (Image Understanding)
Validation
mkdir -p output/alicloud-ai-multimodal-qwen-vl
python -m py_compile skills/ai/multimodal/alicloud-ai-multimodal-qwen-vl/scripts/analyze_image.py && echo "py_compile_ok" > output/alicloud-ai-multimodal-qwen-vl/validate.txt
Pass criteria: command exits 0 and output/alicloud-ai-multimodal-qwen-vl/validate.txt is generated.
Output And Evidence
- Save raw model responses and normalized extraction results to
output/alicloud-ai-multimodal-qwen-vl/. - Include input image reference and prompt for traceability.
Use Qwen VL models for image input + text output understanding tasks via DashScope compatible-mode API.
Prerequisites
- Install dependencies (recommended in a venv):
python3 -m venv .venv
. .venv/bin/activate
python -m pip install requests
- Set
DASHSCOPE_API_KEYin environment, or adddashscope_api_keyto~/.alibabacloud/credentials.
Critical model names
Prefer the Qwen3 VL family:
qwen3-vl-plusqwen3-vl-flash
When you need explicit "latest" routing or reproducible snapshots, use supported aliases/snapshots from the official model list, such as:
qwen3-vl-plus-latestqwen3-vl-plus-2025-12-19qwen3-vl-flash-latest
Legacy names still seen in some workloads:
qwen-vl-max-latestqwen-vl-plus-latestqwen-vl-ocrqwen-vl-ocr-latest
Normalized interface (multimodal.chat)
Request
prompt(string, required): user question/instruction about image.image(string, required): HTTPS URL, local path, ordata:URL.model(string, optional): defaultqwen3-vl-plus.max_tokens(int, optional): default512.temperature(float, optional): default0.2.detail(string, optional):auto/low/high, defaultauto.json_mode(bool, optional): return JSON-only response when possible.schema(object, optional): JSON Schema for structured extraction.max_retries(int, optional): retry count for429/5xx, default2.retry_backoff_s(float, optional): exponential backoff base seconds, default1.5.
Response
text(string): primary model answer.model(string): model actually used.usage(object): token usage if returned by backend.
Quickstart
python skills/ai/multimodal/alicloud-ai-multimodal-qwen-vl/scripts/analyze_image.py \
--request '{"prompt":"Summarize the main content in this image","image":"https://example.com/demo.jpg"}' \
--print-response
Using local image:
python skills/ai/multimodal/alicloud-ai-multimodal-qwen-vl/scripts/analyze_image.py \
--request '{"prompt":"Extract key information from the image","image":"./samples/invoice.png","model":"qwen3-vl-plus"}' \
--print-response
Structured extraction (JSON mode):
python skills/ai/multimodal/alicloud-ai-multimodal-qwen-vl/scripts/analyze_image.py \
--request '{"prompt":"Extract fields: title, amount, date","image":"./samples/invoice.png"}' \
--json-mode \
--print-response
Structured extraction (JSON Schema):
python skills/ai/multimodal/alicloud-ai-multimodal-qwen-vl/scripts/analyze_image.py \
--request '{"prompt":"Extract invoice fields","image":"./samples/invoice.png"}' \
--schema skills/ai/multimodal/alicloud-ai-multimodal-qwen-vl/references/examples/invoice.schema.json \
--print-response
cURL (compatible mode)
curl -sS https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model":"qwen3-vl-plus",
"messages":[
{
"role":"user",
"content":[
{"type":"image_url","image_url":{"url":"https://example.com/demo.jpg"}},
{"type":"text","text":"Describe this image and list executable actions"}
]
}
],
"max_tokens":512,
"temperature":0.2
}'
Output location
- If
--outputis set, JSON response is saved to that file. - Default output dir convention:
output/alicloud-ai-multimodal-qwen-vl/.
Smoke test
python tests/ai/multimodal/alicloud-ai-multimodal-qwen-vl-test/scripts/smoke_test_qwen_vl.py \
--image ./tmp/vl_test_cat.png
Error handling
| Error | Likely cause | Action |
|---|---|---|
| 401/403 | Missing or invalid key | Check DASHSCOPE_API_KEY and account permissions. |
| 400 | Invalid request schema or unsupported image source | Validate messages content and image URL/path format. |
| 429 | Rate limit | Retry with exponential backoff and lower concurrency. |
| 5xx | Temporary backend issue | Retry with backoff and idempotent request design. |
Operational guidance
- For stable production behavior, pin snapshot model IDs instead of pure
-latest. - Compress very large images before upload to reduce latency and cost.
- Add explicit extraction constraints in prompt (fields, JSON shape, language).
- For OCR-like output, ask for confidence notes and unresolved text markers.
Workflow
- Confirm user intent, region, identifiers, and whether the operation is read-only or mutating.
- Run one minimal read-only query first to verify connectivity and permissions.
- Execute the target operation with explicit parameters and bounded scope.
- Verify results and save output/evidence files.
References
- Source list:
references/sources.md - API notes:
references/api_reference.md
Weekly Installs
225
Repository
cinience/alicloud-skillsGitHub Stars
354
First Seen
Feb 26, 2026
Security Audits
Installed on
gemini-cli223
github-copilot223
codex223
kimi-cli223
amp223
cursor223