skills-download
No SKILL.md available for this skill.
View on GitHubMore from bytedance/agentkit-samples
byted-las-pdf-parse-doubao
Parses and reads PDF documents into structured Markdown text using Volcengine LAS Doubao AI models. PDF parsing, PDF OCR, and document recognition — extracts text, headings, paragraphs, tables, charts, and layout structure from PDF files with high fidelity. Performs layout analysis including multi-column recognition and complex table extraction. Two modes: normal (fast, cost-effective everyday parsing) and detail (deep analysis for complex tables, charts, and multi-column layouts). Converts PDF to Markdown, PDF to text, and structured data. Digitizes scanned PDF documents and scanned images via OCR. Supports TOS paths, HTTP URLs, and local file upload. Async submit-poll workflow with batch processing support. Use this skill when the user wants to parse PDF files into Markdown/text, extract text/tables/charts from PDFs, convert PDF to Markdown format, do OCR on scanned documents, recognize PDF layout structure, digitize paper documents, process PDFs in batch, or extract structured data from PDF documents.
129byted-data-search
|
106byted-las-vlm-video
Analyzes and understands video content using Volcengine LAS Doubao vision-language models (VLM). Multimodal AI video analysis, video comprehension, and visual understanding of video clips and footage. Performs video question answering (video Q&A) — ask questions about what happens in a video and get AI answers. Scene recognition and scene description, object recognition and object detection, action recognition and action detection from video frames. Generates video descriptions, video captions, video summaries, video annotations, and content summarization. Visual frame analysis for identifying people, objects, actions, and events in video. Auto-compresses video to 50MB before inference. Synchronous single-call processing. Use this skill when the user wants to analyze or understand video content using VLM/AI, do video Q&A (ask questions about a video), describe what happens in a video, recognize objects/actions/scenes in video frames, generate video captions/descriptions/summaries, annotate or label video content, get AI-powered visual understanding of video clips, or perform multimodal video analysis with vision-language models.
97byted-las-asr-pro
ASR / STT / speech recognition / voice recognition engine powered by Volcengine LAS. Transcribes and converts speech to text from audio and video files — extracts spoken words and generates text transcription from any recording. Supports dictation, subtitle and caption generation. Handles meeting recordings, meeting notes, meeting minutes, meeting summary, interview transcription, podcast transcription, lecture transcription, customer service call center audio, phone call recording, and recorded audio files. Features speaker diarization and speaker identification (detect who said what), emotion recognition, sentiment detection, gender recognition, and multilingual multi-language auto-detection. Accepts wav, mp3, m4a formats with async submit-poll workflow and batch processing for large-scale transcription jobs. Use this skill when the user wants to transcribe audio or video to text (ASR/speech-to-text), generate subtitles or captions from recordings, do speaker diarization or emotion analysis on meeting/interview/podcast/lecture recordings, or extract spoken content from any audio/video media file.
64byted-marketing-agent-trending-list
当用户想了解行业热点、查话题挑战榜单、看最近有什么热搜事件或公域流行趋势时使用。支持话题挑战和热榜事件两种维度。手动触发:/trending
62byted-voice-to-text
语音转文字(ASR)。使用火山引擎 BigModel ASR 识别语音,包含极速版(≤2h/100MB 同步快速返回)和标准版(≤5h 异步识别)两种模式。支持飞书语音消息、本地音频文件及音频 URL。当收到语音消息或音频附件(.ogg/.mp3/.wav)时使用本技能。
59