Data Analysis

Installation

SKILL.md

Data Analysis

Uses the DataVoyager agent (hosted on Cloud Run, accessed via asta-gateway) to run a multi-agent data-science pipeline: the agent writes and executes code against the caller's dataset(s) in a sandboxed notebook and answers a research question. Auth is asta auth login.

Step 1 — Draft a tightened query

Before asking the user anything, analyze their request and the surrounding context (current project, conversation history, files they've been working on) to produce a tightened analytical question that:

Names the specific dataset(s) that will be analyzed
States what decision or insight the user is after, not just "analyze X"
Is phrased as a question DataVoyager can actually answer with code

Examples:

User says "look at this CSV" → inspect the file path, sample the top rows if possible, produce a concrete query like "Which columns in sales_q3.csv have the strongest correlation with quarterly revenue, controlling for region?"
User says "what's in the titanic data" → "What features best predict survival in titanic.csv, and how do survival rates differ across passenger class and sex?"
User gives a specific question with a specific dataset → echo it verbatim

Step 2 — Confirm with one chat question

In chat (not AskUserQuestion), present:

Proposed analysis:

Dataset(s): <path> (will be uploaded to your DataVoyager workspace)

Question: <tightened>

You can:

Reply yes / go to run as-is

Reply with edits (e.g. "focus on just Q3", "ignore missing values") and I'll revise the question

(Only if AskUserQuestion is available) Reply interview to refine the question through a form

Only include the interview bullet when the AskUserQuestion tool is available.

Wait for the user's response. Paths:

Affirmative ("yes", "go", "proceed", "looks good") → Step 4.
Natural-language edit → update the question, re-show the same prompt.
"Interview" / "refine" → Step 3 (AskUserQuestion required; if unavailable, ask in chat instead).

Never ask the user to pre-upload the dataset. The skill handles the upload — see Step 4 — and the user just supplies the local file path.

Step 3 — Optional interview (only when requested)

Fire one AskUserQuestion with:

Question — <tightened> vs <alternative reframe> (if you have one)
Scope — "focused (single hypothesis)" / "exploratory (multiple angles)"
Extra context — free-text "anything DataVoyager should know about the data?" (field meanings, known caveats)

Fold the answers back into the query string.

Step 4 — Submit

A single submit call mints a session UUID, uploads the file(s) under context/<uuid>/, and starts the analysis. The response carries the task ID (id) and the session UUID (contextId); capture both for polling and any follow-ups.

asta analyze-data submit \
  --output "/tmp/analyze-data-$$.json" \
  "<confirmed question>" ./sales.csv ./regions.csv

TID=$(jq -r .id /tmp/analyze-data-$$.json)
CTX=$(jq -r .contextId /tmp/analyze-data-$$.json)

Resumability. $CTX identifies the DataVoyager session. To ask a follow-up against the same workspace, run asta analyze-data submit --context-id "$CTX" '<follow-up question>' [<new-files>...]. New files (if any) attach to the existing context; if no files are passed, the agent reuses what's already there. To start a clean session over the same datasets, omit --context-id and pass the files again.

Step 5 — Poll

Don't foreground-poll in a loop (session blocks) and don't start individual sleep 60-then-check turns (harness blocks long leading sleeps). Instead, run the poll subcommand backgrounded — it exits on a terminal state and the harness will notify you when it finishes.

asta analyze-data poll "$TID" --output "/tmp/analyze-data-$TID.json"

Run with run_in_background: true. Status ticks go to stderr; the final Task JSON is written to --output. When the completion notification fires, read /tmp/analyze-data-$TID.json for the final payload.

While it's running, do not proactively check. Work on other things or wait — the notification is authoritative. If the user asks for a status check before the notification, only then tail the background task's stderr.

Terminal states:

completed → Step 6
failed → report status.message and stop
input-required → relay to user, then asta analyze-data send-message --task-id <ID> --context-id "$CTX" '<reply>' and re-kick the polling loop

Runtime: highly variable (simple EDAs finish in a few minutes; multi-step modeling runs can take 20–40 min). Don't hard-fail before ~2 hours.

Step 6 — Export and index

Hand off to the Asta Artifacts skill to export the task output (tables, plots, the notebook, any written analysis) and register each artifact with asta-documents. Pass analyze-data as the invoking skill and a slug derived from the analytical question; Asta Artifacts handles the path convention, manifest, and index.yaml.

Step 7 — Summarize for the user

Present, in this order:

Indexing + exploration paths — one short block naming both ways to browse. Always include BOTH (the skill path for semantic search, the filesystem path for direct reading):

Indexed N artifacts in .asta/analyze-data/<slug>/index.yaml. Explore via asta documents search --summary='<concept>' --root=.asta/analyze-data/<slug> or open the directory directly: open <absolute-path-to-slug-dir>

Use the absolute path (e.g. /Users/.../project/.asta/analyze-data/2026-04-23-…/). Pick <concept> from a term central to the analysis (concrete, not generic — e.g. a column name, a model type).
One-paragraph synthesis — 2–4 sentences written fresh for this run. What's the headline finding? What did the data say vs. what did the user expect? Surface surprises, caveats, and whether the analysis answered the original question. This is discretionary — don't template it, read the output and synthesize.
Table of key findings / charts — one row per notable insight or figure: finding + 1–2 sentence detail + (if applicable) chart filename.

Don't dump raw JSON. Don't repeat every step the agent took. Don't add a trailing "let me know if you'd like…" summary — the exploration block already tells the user how to keep going.

References

DataVoyager: https://github.com/allenai/dv-core-asta-integration
A2A spec: https://a2a-protocol.org/latest/specification/

Related skills

More from allenai/asta-plugins

Installs

–

Repository

allenai/asta-plugins

GitHub Stars

First Seen

–

Security Audits

Gen Agent Trust HubPass

SocketWarn

SnykPass

Data Analysis

Data Analysis

Step 1 — Draft a tightened query

Step 2 — Confirm with one chat question

Step 3 — Optional interview (only when requested)

Step 4 — Submit

Step 5 — Poll

Step 6 — Export and index

Step 7 — Summarize for the user

References

More from allenai/asta-plugins

semantic scholar lookup

asta literature reports

asta library

preview

pdf text extraction

asta literature search