data-analytics

Installation
SKILL.md

Data Analytics Diagram Generator

Quick Start: Define data sources → Declare ingestion/ETL icons → Connect to storage/warehouse → Add BI/visualization → Wrap in ```plantuml fence.

⚠️ IMPORTANT: Always use ```plantuml or ```puml code fence. NEVER use ```text — it will NOT render as a diagram.

Critical Rules

  • Every diagram starts with @startuml and ends with @enduml
  • Use left to right direction for data pipelines (Source → Ingest → Transform → Store → Visualize)
  • Use mxgraph.aws4.* stencil syntax for analytics, database, and storage icons
  • Default colors are applied automatically — you do NOT need to specify fillColor or strokeColor
  • Use rectangle "Zone" { ... } or package "Layer" { ... } for grouping pipeline stages
  • Directed flows use -->, async/streaming flows use ..> (dashed)

Full stencil reference: See stencils/README.md for 9500+ available icons.

Mxgraph Stencil Syntax

mxgraph.aws4.<icon> "Label" as <alias>

Analytics & ETL Stencils

Category Stencils Purpose
Query Engine athena, athena_data_source_connectors Serverless SQL on S3 data
ETL glue, glue_crawlers, glue_data_catalog, aws_glue_data_quality, aws_glue_for_ray Data integration & cataloging
Streaming kinesis, kinesis_data_streams, kinesis_data_firehose, kinesis_data_analytics, kinesis_video_streams Real-time data streaming
MapReduce emr, emr_engine, emr_engine_mapr_m3, emr_engine_mapr_m5 Big data processing (Spark, Hive)
Data Warehouse redshift, redshift_ra3, redshift_streaming_ingestion, redshift_ml Columnar analytics warehouse
Search opensearch_service_data_node, opensearch_ingestion, cloudsearch Full-text search & log analytics
BI quicksight Dashboards & visualizations
Data Lake lake_formation, s3, glacier, glacier_deep_archive Governed data lake storage
Catalog datazone_custom_asset_type, data_exchange Data governance & sharing
Streaming Kafka msk, msk_connect Managed Kafka streaming

Database Stencils

Category Stencils Purpose
Relational aurora, aurora_instance, rds, rds_instance, rds_mysql_instance, rds_postgresql_instance Transactional databases
NoSQL dynamodb, dynamodb_table, dynamodb_global_secondary_index, dynamodb_stream Key-value & document store
Graph neptune Graph database
In-Memory elasticache, elasticache_for_redis, elasticache_for_memcached Cache & session store
Document documentdb, documentdb_with_mongodb_compatibility Document database
Ledger quantum_ledger_database Immutable transaction log
Wide-Column keyspaces Cassandra-compatible

Connection Types

Syntax Meaning Use Case
A --> B Solid arrow Batch data flow / API call
A ..> B Dashed arrow Streaming / async / CDC
A -- B Solid line Bidirectional sync
A --> B : "label" Labeled connection Describe data format or volume

Quick Example

@startuml
left to right direction
mxgraph.aws4.s3 "Data Lake\n(S3)" as s3
mxgraph.aws4.glue "Glue\nETL" as glue
mxgraph.aws4.redshift "Redshift" as rs
mxgraph.aws4.quicksight "QuickSight" as qs

s3 --> glue
glue --> rs
rs --> qs
@enduml

Data Analytics Architecture Types

Type Purpose Key Stencils Example
Data Lake Centralized raw data store s3, lake_formation, glue, athena data-lake.md
Real-time Streaming Event stream processing kinesis, msk, lambda_function, opensearch_service real-time-streaming.md
Data Warehouse Star-schema analytics redshift, glue, quicksight data-warehouse.md
ETL Pipeline Extract-transform-load glue, glue_crawlers, glue_data_catalog, s3 etl-pipeline.md
Log Analytics Centralized logging kinesis_data_firehose, opensearch_service, lambda_function log-analytics.md
ML Feature Store Feature engineering pipeline glue, s3, athena, emr ml-feature-pipeline.md
CDC Pipeline Database change capture dynamodb_streams, kinesis, lambda_function, redshift cdc-pipeline.md
Multi-source BI Cross-database reporting aurora, dynamodb, redshift, quicksight multi-source-bi.md
Weekly Installs
37
GitHub Stars
7
First Seen
Today
Installed on
kimi-cli37
gemini-cli37
antigravity37
cursor37
opencode37
codex37