byted-data-label
Seederive 非结构化打标平台
你是 Seederive 平台的操作助手。所有 Seederive 操作从这里开始。
什么是 Seederive
Seederive 用 LLM 对文本/语音/图片数据做情感分析、标签分类、观点提取等批量处理。
认证配置
使用前需要设置 AK/SK 环境变量:
| 环境变量 | 说明 | 必填 |
|---|---|---|
VOLCENGINE_ACCESS_KEY |
Access Key | 是 |
VOLCENGINE_SECRET_KEY |
Secret Key | 是 |
验证连通性
设置好环境变量后,执行以下命令验证:
python3 ${SKILL_DIR}/scripts/seederive.py task list --page-size 1
如果返回 "code": 0 表示连通成功。如果返回认证错误,请检查 AK/SK 是否正确。
执行命令的方式
python3 ${SKILL_DIR}/scripts/seederive.py <子命令和参数>
第一步:判断用户意图
阅读用户的需求,对照下表确定属于哪个场景:
| 场景 | 用户说了什么(示例) | 下一步 |
|---|---|---|
| A. 快速试效果 | "帮我分析这几条评论" / "试一下情感分析" / "看看这些文本的标签" | → 直接用 quick-preview,见下方「场景 A」 |
| B. 创建批量任务 | "帮我对这个数据表做情感分析" / "建一个打标任务" | → 读取 ${SKILL_DIR}/references/task.md 获取详细指引 |
| C. 需要标签体系 | "按我们的标签分类" / "建一个标签库" / "主体识别" | → 读取 ${SKILL_DIR}/references/tag-base.md 获取详细指引 |
| D. 优化效果 | "效果不好" / "帮我优化" / "上传错题" / "换个模型" | → 读取 ${SKILL_DIR}/references/optimize.md 获取详细指引 |
| E. 不确定 | "我有一批数据想处理" / "能做什么" | → 先问用户数据是什么、想得到什么结果,再回到本表判断 |
重要:场景 B/C/D 的具体操作步骤、参数说明、JSON 格式都在对应的参考文件中。你必须用 Read 工具读取对应文件后再执行,本文件不包含这些细节。
场景 A:快速试效果(唯一可以直接执行的场景)
这是最轻量的路径,无需创建任务,传几条文本就能看结果。
支持的分析类型
| 分析类型 | nodeType 值 | 输出 | 额外参数 |
|---|---|---|---|
| 情感分析 | EMOTION_DETECTION |
正面/负面/中性 + 原因 | 无 |
| 营销水军识别 | SHILL_DETECTION |
是/否 + 原因 | 无 |
| 观点提取 | OPINION_SUMMARY |
核心观点 + 理由 | 无 |
| 内容评分 | CONTENT_SCORING |
质量/原创/有用/合规评分 | 无 |
| 翻译 | TRANSLATION |
翻译结果 | --target-language |
| 标签分类 | TAG_DETECTION |
多级标签 | --tag-base-id(需要先建标签库,见场景 C) |
| 主体识别 | SUBJECT_DETECTION |
多级主体 | --tag-base-id(需要先建标签库,见场景 C) |
| 自定义分析 | CUSTOM_APPLICATION |
自定义 | --prompt + --output-fields |
执行方式
方式一:直接传文本(推荐,最快)
python3 ${SKILL_DIR}/scripts/seederive.py task quick-preview \
--raw-data '["文本1", "文本2", "文本3"]' \
--node-type EMOTION_DETECTION \
--input-column "评论内容"
方式二:上传文件
python3 ${SKILL_DIR}/scripts/seederive.py task quick-preview \
--file data.csv \
--node-type EMOTION_DETECTION \
--input-column "评论内容"
方式三:导出结果为 CSV 文件
python3 ${SKILL_DIR}/scripts/seederive.py task quick-preview \
--raw-data '["文本1", "文本2"]' \
--node-type EMOTION_DETECTION \
--input-column "评论内容" \
--response-format csv --output result.csv
自定义分析示例:
python3 ${SKILL_DIR}/scripts/seederive.py task quick-preview \
--raw-data '["今天天气真好", "堵车堵了两小时"]' \
--node-type CUSTOM_APPLICATION \
--input-column "内容" \
--prompt "提取关键词和情绪强度" \
--output-fields '[{"fieldName":"keywords","fieldType":"String"},{"fieldName":"intensity","fieldType":"String"}]'
quick-preview 全部参数
| 参数 | 必填 | 说明 |
|---|---|---|
--raw-data |
与 file 二选一 | JSON 字符串数组或对象数组 |
--raw-data-file |
与上二选一 | JSON 文件路径 |
--file |
与 raw-data 二选一 | CSV / Excel 文件 |
--node-type |
是 | 分析类型,见上表 |
--input-column |
是 | 待处理文本的列名 |
--max-rows |
否 | 最大处理行数(默认 10,上限 50) |
--tag-base-id |
TAG/SUBJECT 需要 | 标签库 ID |
--prompt |
CUSTOM 需要 | 自定义提示词 |
--output-fields |
CUSTOM 需要 | 输出字段 JSON 数组 |
--target-language |
TRANSLATION 用 | 目标语言(默认"中文") |
--response-format |
否 | json(默认)或 csv |
--output |
否 | CSV 输出文件路径 |
场景之间的流转
场景 A(试效果)
│
├─ 效果满意 + 数据量大 → 场景 B(建正式任务批量跑)
│ → 读取 ${SKILL_DIR}/references/task.md
│
├─ 需要标签分类 → 场景 C(先建标签库)→ 回到 A 或 B
│ → 读取 ${SKILL_DIR}/references/tag-base.md
│
└─ 效果不满意 → 场景 D(优化提示词/换模型)→ 回到 A 验证
→ 读取 ${SKILL_DIR}/references/optimize.md
关键原则
- 先试后建:建议用户先用 quick-preview 试效果,满意后再创建正式任务
- 渐进披露:不要一次给用户灌输所有概念,按需引导到对应参考文件
- 按需加载:只有需要执行场景 B/C/D 时才去读取对应参考文件
More from bytedance/agentkit-samples
byted-las-video-edit
Extracts and clips video segments from long videos using natural language descriptions. AI-powered smart video editing, video trimming, and video cutting powered by Volcengine LAS. Describe what you want — scenes, people, objects, actions, events — and get trimmed clips automatically. Video search and video content retrieval: find and locate specific people, objects, or scenes in footage. Supports reference images for person matching and object matching (search video by image). Two modes: simple (fast) and detail (thorough, optional ASR). Use this skill when the user wants to edit/clip/cut videos using natural language descriptions, extract highlights or key moments from videos, find specific people/objects/scenes in video footage (by text or reference image), compile highlight reels from long videos, trim video segments, or do AI-powered smart video editing.
163byted-las-pdf-parse-doubao
Parses and reads PDF documents into structured Markdown text using Volcengine LAS Doubao AI models. PDF parsing, PDF OCR, and document recognition — extracts text, headings, paragraphs, tables, charts, and layout structure from PDF files with high fidelity. Performs layout analysis including multi-column recognition and complex table extraction. Two modes: normal (fast, cost-effective everyday parsing) and detail (deep analysis for complex tables, charts, and multi-column layouts). Converts PDF to Markdown, PDF to text, and structured data. Digitizes scanned PDF documents and scanned images via OCR. Supports TOS paths, HTTP URLs, and local file upload. Async submit-poll workflow with batch processing support. Use this skill when the user wants to parse PDF files into Markdown/text, extract text/tables/charts from PDFs, convert PDF to Markdown format, do OCR on scanned documents, recognize PDF layout structure, digitize paper documents, process PDFs in batch, or extract structured data from PDF documents.
129byted-seedance-video-generate
Generate videos using Seedance models. Invoke when user wants to create videos from text prompts, images, or reference materials.
109byted-data-search
|
106byted-las-vlm-video
Analyzes and understands video content using Volcengine LAS Doubao vision-language models (VLM). Multimodal AI video analysis, video comprehension, and visual understanding of video clips and footage. Performs video question answering (video Q&A) — ask questions about what happens in a video and get AI answers. Scene recognition and scene description, object recognition and object detection, action recognition and action detection from video frames. Generates video descriptions, video captions, video summaries, video annotations, and content summarization. Visual frame analysis for identifying people, objects, actions, and events in video. Auto-compresses video to 50MB before inference. Synchronous single-call processing. Use this skill when the user wants to analyze or understand video content using VLM/AI, do video Q&A (ask questions about a video), describe what happens in a video, recognize objects/actions/scenes in video frames, generate video captions/descriptions/summaries, annotate or label video content, get AI-powered visual understanding of video clips, or perform multimodal video analysis with vision-language models.
97byted-text-to-speech
将文本合成为语音(TTS)。使用火山引擎豆包语音合成 API,支持流式合成、多种音色、语速/音调/音量调节、Markdown 过滤和 LaTeX 公式播报。当用户需要把文字转成语音、生成朗读音频、配音、旁白、播报,或提到「文字转语音」「TTS」「语音合成」「朗读」「配音」时使用本技能。
93