data-sync
Data Sync
Run or inspect data synchronization workflows for: $ARGUMENTS
Overview
- Implementation status: workflow-only
- Current backing path: use runtime artifacts, manifests, bundle diagnostics, and host-provided sync steps
- Primary purpose: define the data-refresh, source-health, and freshness-audit workflow without overstating local orchestration support
- Research layer: data infrastructure (Stage 2: Data Collection & Quality Assurance - Data refresh and validation)
- Workflow stage: stage 2
Data Collection & Quality Assurance
Use When
- The user wants to refresh source data.
- The user wants to compare provider behavior or diagnose stale records.
- The caller needs an explicit audit of whether current artifacts are fresh enough for research, screening, or backtesting.
- The user wants to know what must happen before a clean rerun.
- The user wants to verify data freshness before critical decisions.
- The user wants to diagnose data quality issues or provider failures.
Do Not Use When
- The user wants immediate one-symbol analysis and the existing data is already sufficient.
- The user only wants runtime context locations. Use
session-status. - The user wants normalization or feature enrichment of an already available dataset. Use
market-data. - The user wants to analyze data quality after collection. Use
market-datafor quality assessment.
Inputs
- A symbol, a provider name, a date window, or a sync scope.
- Optional host-side parameters such as freshness threshold, source priority, or required artifact types.
- Optional user requirement such as:
- current-day screening freshness
- backtest-grade historical coverage
- point-in-time fundamental correctness
- explicit provider comparison
Execution
Step 1: Classify the sync request
Determine which type of sync operation is needed:
Sync request types:
- Freshness audit: Check if existing data is recent enough for intended use
- Provider comparison: Compare data from multiple sources for consistency
- Missing history diagnosis: Identify gaps in historical data coverage
- Symbol-level refresh: Update data for specific symbols
- Artifact lineage check: Verify data provenance and transformation chain
- Pre-backtest readiness check: Validate data quality for backtesting
- Bulk refresh: Update entire universe or watchlist
- Incremental update: Add only new data since last sync
Use case classification:
- Descriptive analysis: Lower freshness requirement (1-2 days stale acceptable)
- Screening: Medium freshness requirement (same-day data preferred)
- Decision support: High freshness requirement (intraday data needed)
- Backtesting: Historical completeness required (no gaps, point-in-time correct)
- Live trading: Real-time or near-real-time data required
Step 2: Inspect existing data state
Audit current data availability and quality:
Data inventory:
- Which symbols have data available
- Date range for each symbol (start date, end date)
- Data freshness (latest date, days since last update)
- Data source (provider, file, cache)
- Data completeness (% of expected dates present)
- Data quality score (from market-data if available)
Artifact inspection:
- Runtime artifacts (state.json, report.md, metadata.json)
- Manifests (data source declarations, version info)
- Bundle diagnostics (error logs, sync logs)
- File timestamps (when files were last modified)
- Cache status (what is cached, cache age)
Provider status:
- Which providers are configured (akshare, tushare, etc.)
- Provider health (last successful fetch, error rate)
- Provider rate limits (requests remaining, reset time)
- Provider coverage (which symbols, which fields)
- Provider latency (typical fetch time)
Step 3: Evaluate data readiness
Assess whether existing data meets requirements:
Freshness evaluation:
- Fresh (< 1 day old): Suitable for all use cases
- Recent (1-3 days old): Suitable for descriptive analysis, screening
- Stale (3-7 days old): Suitable for historical analysis only
- Very stale (> 7 days old): Requires refresh before any use
- Missing: No data available, must fetch
Completeness evaluation:
- Complete (100%): All expected dates present
- Mostly complete (90-99%): Minor gaps, acceptable for most uses
- Incomplete (70-89%): Significant gaps, use with caution
- Sparse (< 70%): Major gaps, unsuitable for analysis
Quality evaluation:
- High quality (score 90-100): Suitable for all uses including backtesting
- Good quality (score 70-89): Suitable for descriptive analysis, screening
- Fair quality (score 50-69): Use with caution, validate results
- Poor quality (score < 50): Requires refresh or alternative source
Point-in-time correctness (for backtesting):
- Corporate actions properly adjusted (splits, dividends)
- No look-ahead bias (data as-of historical dates)
- Survivorship bias addressed (delisted symbols included)
- Restatements handled (as-reported vs. as-restated)
- Suspension periods identified
Step 4: Diagnose data issues
Identify specific problems requiring attention:
Common data issues:
- Stale data: Latest date is too old for intended use
- Missing symbols: Symbols in watchlist have no data
- Incomplete history: Gaps in date coverage
- Provider failures: Recent fetch attempts failed
- Quality degradation: Data quality score declining over time
- Inconsistent sources: Different providers show different values
- Corporate action errors: Splits/dividends not properly adjusted
- Suspension gaps: Suspension periods not identified
- Delisting issues: Delisted symbols missing from historical data
For each issue:
- Issue type and severity (critical, high, medium, low)
- Affected symbols (count and list)
- Impact on downstream workflows (which skills affected)
- Recommended action (refresh, validate, alternative source)
- Automation status (can be fixed automatically or requires manual intervention)
Step 5: Generate sync plan
Create actionable plan to address issues:
Sync plan components:
Part 1: Immediate actions (critical issues):
- Symbols requiring immediate refresh (stale data blocking decisions)
- Provider health checks (if recent failures detected)
- Cache invalidation (if corrupted data suspected)
- Manual interventions (if automation unavailable)
Part 2: Scheduled actions (high priority):
- Bulk refresh for watchlist (if many symbols stale)
- Historical backfill (if gaps detected)
- Provider comparison (if inconsistencies suspected)
- Quality validation (if quality scores declining)
Part 3: Maintenance actions (medium priority):
- Routine refresh schedule (daily, weekly)
- Cache cleanup (remove old artifacts)
- Provider rotation (switch to backup if primary failing)
- Monitoring setup (alerts for future staleness)
Part 4: Validation actions (before critical use):
- Pre-backtest validation (point-in-time correctness)
- Pre-decision validation (freshness and quality)
- Cross-provider validation (consistency check)
- Manual spot checks (sample verification)
Step 6: Execute or delegate sync operations
Perform sync based on automation availability:
If local automation available:
- Execute refresh commands directly
- Monitor progress and handle errors
- Validate results after sync
- Update artifacts and manifests
If host-framework automation available:
- Delegate to host-framework sync API
- Provide sync parameters (symbols, date range, providers)
- Monitor sync status
- Validate results after sync
If manual intervention required:
- Provide explicit instructions for manual refresh
- List specific commands to run
- Specify validation steps after manual sync
- Document what was done for audit trail
Sync execution priorities:
- Critical symbols (blocking immediate decisions)
- High-priority watchlist (screening, monitoring)
- Historical backfill (backtesting preparation)
- Routine maintenance (scheduled updates)
Step 7: Validate sync results
After sync, verify data quality:
Post-sync validation:
- Freshness improved (latest date is now recent)
- Completeness improved (gaps filled)
- Quality maintained or improved (quality score)
- No new issues introduced (consistency checks)
- Artifacts updated (manifests, metadata)
Validation checks:
- Compare before/after freshness
- Compare before/after completeness
- Compare before/after quality scores
- Check for new gaps or inconsistencies
- Verify corporate action adjustments
If validation fails:
- Identify specific failures
- Recommend retry with different provider
- Recommend manual intervention
- Document issue for future investigation
Step 8: Generate sync report
Organize findings into structured report:
Part 1: Sync Summary
- Sync scope (symbols, date range, providers)
- Sync timestamp
- Sync status (success, partial, failed)
- Symbols refreshed (count and list)
- Issues resolved (count and list)
Part 2: Data Inventory
- Total symbols with data
- Freshness distribution (fresh, recent, stale, very stale, missing)
- Completeness distribution (complete, mostly complete, incomplete, sparse)
- Quality distribution (high, good, fair, poor)
- Provider distribution (which providers used)
Part 3: Issues Diagnosed For each issue:
- Issue type and severity
- Affected symbols
- Impact on workflows
- Recommended action
- Automation status
Part 4: Sync Plan
- Immediate actions (critical)
- Scheduled actions (high priority)
- Maintenance actions (medium priority)
- Validation actions (before critical use)
Part 5: Sync Execution Results
- Actions taken (refresh, backfill, validation)
- Success rate (% of actions successful)
- Failures (count and reasons)
- Manual interventions required
Part 6: Post-Sync Validation
- Freshness improvement (before/after)
- Completeness improvement (before/after)
- Quality improvement (before/after)
- Remaining issues (if any)
Part 7: Readiness Assessment
- Ready for descriptive analysis? (Yes/No/Conditional)
- Ready for screening? (Yes/No/Conditional)
- Ready for decision support? (Yes/No/Conditional)
- Ready for backtesting? (Yes/No/Conditional)
- Conditions or caveats (if conditional)
Part 8: Next Steps
- Downstream workflows ready to proceed
- Workflows blocked pending further sync
- Monitoring and alerting recommendations
- Scheduled maintenance recommendations
Step 9: Return explicit sync diagnosis
When delivering results, maintain proper framing:
Sync interpretation:
- This is data infrastructure status, not investment recommendation
- Freshness and quality assessments are technical, not fundamental
- Readiness evaluation is for workflow purposes, not market timing
- Sync recommendations are operational, not strategic
Automation transparency:
- State which sync operations are locally automated
- State which operations require host-framework support
- State which operations require manual intervention
- State limitations of current automation
Readiness framing:
- Separate readiness for different use cases (analysis, screening, backtesting)
- State specific conditions or caveats for conditional readiness
- Recommend validation steps before critical use
- Tie readiness to next workflow stage
Limitation disclosure:
- State what sync can and cannot guarantee
- State point-in-time correctness limitations
- State survivorship bias limitations
- State provider coverage limitations
Output Contract
- Expected result: sync status, stale-data diagnosis, provider comparison, or an explicit next-action plan.
- Caller-facing delivery standard:
- Eight-part structure: Sync summary, data inventory, issues diagnosed, sync plan, execution results, post-sync validation, readiness assessment, next steps
- Scope identification: State what is being audited (symbols, date range, providers)
- Evidence basis: Identify evidence used (manifests, timestamps, runtime status, file inspection)
- Automation transparency: State whether locally automated, partially automated, or host-dependent
- Readiness assessment: Tie sync recommendation to next workflow stage (screening, analysis, backtesting, trading)
- Issue diagnosis: Specific issues with severity, affected symbols, impact, recommended actions
- Sync plan: Immediate, scheduled, maintenance, and validation actions
- Validation results: Before/after comparison of freshness, completeness, quality
- Limitation disclosure: State what sync can and cannot guarantee (point-in-time, survivorship, etc.)
- Local limitation:
- The skill does not currently guarantee a full local provider-orchestration layer
- The workflow may depend on external scheduling or host-side refresh behavior
- Some sync operations may require manual intervention
Failure Handling
- If the requested sync scope is not supported locally, state that explicitly.
- If available evidence is insufficient to diagnose freshness, say what is missing.
- If the user asks for backtest-grade readiness and the skill cannot verify survivorship, corporate-action, or point-in-time integrity, say so directly.
- If provider is unavailable or failing, recommend alternative sources or manual intervention.
- If sync execution fails, provide specific error details and retry recommendations.
Key Rules
- Do not imply a full provider-orchestration layer where none exists locally.
- Separate source-health diagnostics from market-analysis claims.
- Prefer explicit freshness and readiness language over vague advice to “rerun.”
- Treat descriptive-analysis readiness as a lower bar than backtest or live-verification readiness.
- Automation transparency is mandatory. State what is automated vs. manual.
- Readiness must be use-case specific. Different standards for analysis, screening, backtesting, trading.
- Issue diagnosis must be specific. Type, severity, affected symbols, impact, actions.
- Sync plan must be actionable. Specific commands, parameters, validation steps.
- Validation must be quantitative. Before/after metrics, not just “looks better.”
- Limitation disclosure is mandatory. State what sync cannot guarantee.
Composition
- Often paired with
session-status,market-data,stock-data, andbacktest-evaluator. - Serves as the stage-2 control surface before broader research or validation workflows.
- Should be run before
market-screenordecision-dashboardif data freshness is uncertain. - Should be run before
backtest-evaluatorto ensure historical data quality. - Can be triggered by
market-dataif quality issues detected. - Results feed into readiness decisions for all downstream skills.