skills/dglowacki/mega-agent-v2/data-aggregation

data-aggregation

SKILL.md

Data Aggregation

Tools for aggregating, transforming, and merging data from multiple sources.

Quick Start

Aggregate App Store sales:

python scripts/aggregate_sales.py --input sales_reports/ --output aggregated.json

Aggregate GitHub commits:

python scripts/aggregate_commits.py --input commits.json --period week --output summary.json

Merge multiple sources:

python scripts/merge_sources.py --sources app_store.json github.json skillz.json --output combined.json

Aggregation Types

1. Time-Based Aggregation

Group data by time periods (day, week, month).

Example: Daily sales totals

from aggregate_sales import aggregate_by_time

# Input: List of sales records
sales = [
    {"date": "2026-01-14", "revenue": 123.45, "units": 5},
    {"date": "2026-01-14", "revenue": 67.89, "units": 3},
    {"date": "2026-01-15", "revenue": 234.56, "units": 8}
]

# Output: Aggregated by day
result = aggregate_by_time(sales, period='day')
# {
#     "2026-01-14": {"revenue": 191.34, "units": 8},
#     "2026-01-15": {"revenue": 234.56, "units": 8}
# }

2. Entity-Based Aggregation

Group data by entities (apps, users, repos, etc.).

Example: Per-app metrics

from aggregate_sales import aggregate_by_entity

sales = [
    {"app": "App A", "revenue": 100, "units": 5},
    {"app": "App A", "revenue": 50, "units": 2},
    {"app": "App B", "revenue": 200, "units": 10}
]

result = aggregate_by_entity(sales, entity_field='app')
# {
#     "App A": {"revenue": 150, "units": 7},
#     "App B": {"revenue": 200, "units": 10}
# }

3. Statistical Aggregation

Calculate statistics (sum, avg, min, max, percentiles).

Example: Commit statistics

from aggregate_commits import calculate_stats

commits = [
    {"author": "John", "lines": 125},
    {"author": "Jane", "lines": 87},
    {"author": "John", "lines": 43}
]

result = calculate_stats(commits, group_by='author', metric='lines')
# {
#     "John": {"sum": 168, "avg": 84, "min": 43, "max": 125, "count": 2},
#     "Jane": {"sum": 87, "avg": 87, "min": 87, "max": 87, "count": 1}
# }

Data Sources

App Store Sales

Input format (TSV from App Store Connect):

Provider	Provider Country	SKU	Developer	Title	Version	Product Type Identifier	Units	Developer Proceeds	Begin Date	End Date	Customer Currency	Country Code	Currency of Proceeds	Apple Identifier	Customer Price	Promo Code	Parent Identifier	Subscription	Period	Category	CMB	Device	Supported Platforms	Proceeds Reason	Preserved Pricing	Client

Aggregated output:

{
  "period": "2026-01-14",
  "apps": {
    "com.example.app": {
      "name": "My App",
      "downloads": 1234,
      "revenue": 567.89,
      "updates": 45,
      "countries": ["US", "CA", "UK"]
    }
  },
  "totals": {
    "total_downloads": 5678,
    "total_revenue": 2345.67,
    "total_apps": 5
  }
}

GitHub Commits

Input format (from GitHub API):

[
  {
    "sha": "abc123",
    "author": {"name": "John Doe", "email": "john@example.com"},
    "commit": {
      "message": "Add feature X",
      "author": {"date": "2026-01-14T10:30:00Z"}
    },
    "stats": {"additions": 125, "deletions": 45}
  }
]

Aggregated output:

{
  "period": "week",
  "date_range": "2026-01-07 to 2026-01-14",
  "summary": {
    "total_commits": 45,
    "total_contributors": 5,
    "total_lines": 2345,
    "total_files": 123
  },
  "by_author": {
    "John Doe": {
      "commits": 15,
      "lines_added": 1234,
      "lines_deleted": 456,
      "files_changed": 45
    }
  },
  "by_day": {
    "2026-01-14": {"commits": 8, "lines": 567}
  }
}

Skillz Events

Input format (from Skillz Developer Portal):

{
  "event_id": "888831",
  "name": "Winter Tournament",
  "status": "active",
  "start_date": "2026-01-10",
  "end_date": "2026-01-20",
  "prize_pool": 1000,
  "entries": 234
}

Aggregated output:

{
  "period": "active",
  "summary": {
    "total_events": 8,
    "total_prize_pool": 8000,
    "total_entries": 1234
  },
  "by_status": {
    "active": {"count": 5, "prize_pool": 5000},
    "completed": {"count": 3, "prize_pool": 3000}
  }
}

Aggregation Scripts

aggregate_sales.py

Aggregate App Store sales data.

Usage:

python scripts/aggregate_sales.py \
    --input sales_reports/ \
    --period week \
    --group-by app \
    --output aggregated.json

Arguments:

  • --input: Input directory or file (TSV/JSON)
  • --period: Time period (day, week, month)
  • --group-by: Grouping field (app, country, category)
  • --output: Output JSON file

aggregate_commits.py

Aggregate GitHub commit data.

Usage:

python scripts/aggregate_commits.py \
    --input commits.json \
    --period week \
    --metrics lines,files,commits \
    --output summary.json

Arguments:

  • --input: Input JSON file (commits array)
  • --period: Time period (day, week, month)
  • --metrics: Metrics to calculate (comma-separated)
  • --output: Output JSON file

aggregate_events.py

Aggregate Skillz event data.

Usage:

python scripts/aggregate_events.py \
    --input events/ \
    --status active,completed \
    --output summary.json

Arguments:

  • --input: Input directory with event JSON files
  • --status: Filter by status (comma-separated)
  • --output: Output JSON file

merge_sources.py

Merge data from multiple sources.

Usage:

python scripts/merge_sources.py \
    --sources app_store.json github.json skillz.json \
    --strategy combine \
    --output combined.json

Arguments:

  • --sources: Space-separated list of JSON files
  • --strategy: Merge strategy (combine, average, latest)
  • --output: Output JSON file

Merge strategies:

  • combine: Combine all data (keep all fields)
  • average: Average numeric fields
  • latest: Keep latest values (by timestamp)

Data Transformations

Filtering

from aggregate_sales import filter_data

sales = [...]

# Filter by country
us_sales = filter_data(sales, country='US')

# Filter by date range
recent_sales = filter_data(sales, start_date='2026-01-01', end_date='2026-01-14')

# Filter by value
high_revenue = filter_data(sales, min_revenue=100)

Grouping

from aggregate_commits import group_data

commits = [...]

# Group by author
by_author = group_data(commits, group_by='author')

# Group by repository
by_repo = group_data(commits, group_by='repository')

# Group by date
by_date = group_data(commits, group_by='date', period='day')

Sorting

from merge_sources import sort_data

data = [...]

# Sort by revenue (descending)
sorted_data = sort_data(data, field='revenue', reverse=True)

# Sort by date (ascending)
sorted_data = sort_data(data, field='date')

Integration with Agents

Reporting Agent

# Aggregate App Store sales
from aggregate_sales import aggregate_sales

sales_data = appstore_client.get_sales_report(days=7)
aggregated = aggregate_sales(sales_data, period='day', group_by='app')

# Use for report
html = render_template('appstore-metrics', aggregated)

Automation Agent

# Aggregate GitHub commits
from aggregate_commits import aggregate_commits

commits = github_client.get_commits(repo='owner/repo', days=7)
summary = aggregate_commits(commits, period='week')

# Create ClickUp task if high activity
if summary['total_commits'] > 50:
    clickup_client.create_task(
        title='High GitHub Activity',
        description=f"Total commits: {summary['total_commits']}"
    )

Examples

See examples/ directory for:

  • sample_sales_aggregation.json - App Store sales example
  • sample_commit_aggregation.json - GitHub commits example
  • sample_multi_source_merge.json - Multi-source merge example
Weekly Installs
1
First Seen
9 days ago
Installed on
zencoder1
amp1
cline1
openclaw1
opencode1
cursor1