exploratory-data-analysis
Exploratory Data Analysis
This skill enables an AI agent to perform structured exploratory data analysis (EDA) on any tabular dataset. The agent systematically profiles the data's shape and types, examines distributions, computes correlations, detects outliers, and produces a summary of findings. EDA is the critical first step before any modeling or reporting — it reveals what the data actually contains versus what it is assumed to contain.
Workflow
-
Load and inspect basic structure. Read the dataset and immediately report its shape (rows, columns), column names, data types, and memory footprint. Display the first 5 and last 5 rows to catch header issues, trailing garbage rows, or encoding artifacts. This takes under a second but prevents hours of downstream confusion.
-
Assess data quality. Count nulls per column as both absolute and percentage. Identify columns with zero variance (constant values), high cardinality categoricals (e.g., a "notes" field with unique values per row), and mixed-type columns. Build a concise quality scorecard: columns with >5% missing, columns with suspicious types, and duplicate row counts.
-
Analyze distributions of individual variables. For numeric columns, compute mean, median, standard deviation, skewness, and kurtosis. Plot histograms or KDE plots. For categorical columns, show value counts and proportions for the top 10 categories. Flag highly imbalanced distributions (e.g., a binary target where one class is under 5%).
-
Explore relationships between variables. Compute the full correlation matrix for numeric columns and visualize it as a heatmap. For categorical-vs-numeric relationships, use grouped box plots or violin plots. For categorical-vs-categorical, use contingency tables or mosaic plots. Highlight pairs with correlation above 0.7 or below -0.7.
-
Detect outliers and anomalies. Apply the IQR method to every numeric column and report the count and percentage of outlier values. Visualize outliers with box plots. Cross-reference outliers across columns — a row that is an outlier in multiple columns simultaneously often represents a data entry error or a genuinely unusual observation.
-
Synthesize findings into an EDA report. Write a structured summary covering: dataset overview, quality issues found, key distribution characteristics, notable correlations, outlier summary, and recommended next steps (e.g., columns to drop, transformations to apply, features likely to be predictive).
Supported Technologies
More from seb1n/awesome-ai-agent-skills
summarization
Summarize text using extractive, abstractive, hierarchical, and multi-document techniques, producing concise outputs at configurable detail levels.
24note-taking
Capture, organize, and retrieve notes efficiently using structured formats, tagging, and file management for meetings, ideas, research, and daily logs.
20proofreading
Proofread and correct text for grammar, spelling, punctuation, style, clarity, and consistency, with support for multiple style guides and readability analysis.
20knowledge-graph-creation
Build structured knowledge graphs from unstructured text by extracting entities, mapping relationships, generating graph triples, and visualizing the result.
18data-visualization
Create clear, effective charts and dashboards from structured data using matplotlib, seaborn, and plotly.
16data-analysis
Analyze datasets to extract insights through statistical methods, trend identification, hypothesis testing, and correlation analysis.
15