NYC

streamlit

SKILL.md

Streamlit Data Application Skill

Build beautiful, interactive data applications with pure Python. Transform data scripts into shareable web apps in minutes with widgets, charts, and layouts.

When to Use This Skill

USE Streamlit when:

  • Rapid prototyping - Need to build a data app quickly
  • Internal tools - Creating tools for your team
  • Data exploration - Interactive exploration of datasets
  • Demo applications - Showcasing data science projects
  • ML model demos - Building interfaces for model inference
  • Simple dashboards - Quick insights without complex setup
  • Python-only development - No JavaScript/frontend knowledge required

DON'T USE Streamlit when:

  • Complex interactivity - Need fine-grained callback control (use Dash)
  • Enterprise deployment - Require advanced authentication/scaling (use Dash Enterprise)
  • Custom components - Heavy custom JavaScript requirements
  • High-traffic production - Thousands of concurrent users
  • Real-time streaming - Sub-second update requirements

Prerequisites

# Basic installation
pip install streamlit

# With common extras
pip install streamlit plotly pandas polars

# Using uv (recommended)
uv pip install streamlit plotly pandas polars altair

# Verify installation
streamlit hello

Core Capabilities

1. Basic Application Structure

Minimal App (app.py):

import streamlit as st
import pandas as pd
import polars as pl

# Page configuration (must be first Streamlit command)
st.set_page_config(
    page_title="My Data App",
    page_icon="📊",
    layout="wide",
    initial_sidebar_state="expanded"
)

# Title and header
st.title("My Data Application")
st.header("Welcome to the Dashboard")
st.subheader("Data Analysis Section")

# Text elements
st.text("This is plain text")
st.markdown("**Bold** and *italic* text with [links](https://streamlit.io)")
st.caption("This is a caption for additional context")
st.code("print('Hello, Streamlit!')", language="python")

# Display data
df = pd.DataFrame({
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [25, 30, 35],
    "City": ["NYC", "LA", "Chicago"]
})

st.dataframe(df)  # Interactive table
st.table(df)       # Static table
st.json({"key": "value", "list": [1, 2, 3]})

# Metrics
col1, col2, col3 = st.columns(3)
col1.metric("Revenue", "$1.2M", "+12%")
col2.metric("Users", "10,234", "-2%")
col3.metric("Conversion", "3.2%", "+0.5%")

Run the app:

streamlit run app.py

2. Widgets and User Input

Input Widgets:

import streamlit as st
from datetime import datetime, date

# Text inputs
name = st.text_input("Enter your name", value="User")
bio = st.text_area("Tell us about yourself", height=100)
password = st.text_input("Password", type="password")

# Numeric inputs
age = st.number_input("Age", min_value=0, max_value=120, value=25, step=1)
price = st.slider("Price Range", 0.0, 100.0, (25.0, 75.0))  # Range slider
rating = st.slider("Rating", 1, 5, 3)

# Selection widgets
option = st.selectbox("Choose an option", ["Option A", "Option B", "Option C"])
options = st.multiselect("Select multiple", ["Red", "Green", "Blue"], default=["Red"])
radio_choice = st.radio("Pick one", ["Small", "Medium", "Large"], horizontal=True)

# Boolean inputs
agree = st.checkbox("I agree to the terms")
toggle = st.toggle("Enable feature")

# Date and time
selected_date = st.date_input("Select a date", value=date.today())
date_range = st.date_input(
    "Date range",
    value=(date(2025, 1, 1), date.today()),
    format="YYYY-MM-DD"
)
selected_time = st.time_input("Select a time")

# File upload
uploaded_file = st.file_uploader("Upload a CSV file", type=["csv", "xlsx"])
if uploaded_file is not None:
    df = pd.read_csv(uploaded_file)
    st.write(f"Loaded {len(df)} rows")

# Color picker
color = st.color_picker("Pick a color", "#00FF00")

# Buttons
if st.button("Click me"):
    st.write("Button clicked!")

# Download button
@st.cache_data
def get_data():
    return pd.DataFrame({"x": [1, 2, 3], "y": [4, 5, 6]})

csv = get_data().to_csv(index=False)
st.download_button(
    label="Download CSV",
    data=csv,
    file_name="data.csv",
    mime="text/csv"
)

3. Layout and Organization

Columns:

import streamlit as st

# Equal columns
col1, col2, col3 = st.columns(3)

with col1:
    st.header("Column 1")
    st.write("Content for column 1")

with col2:
    st.header("Column 2")
    st.metric("Metric", "100")

with col3:
    st.header("Column 3")
    st.button("Action")

# Unequal columns
left, right = st.columns([2, 1])  # 2:1 ratio

with left:
    st.write("Wider column")

with right:
    st.write("Narrower column")

Sidebar:

import streamlit as st

# Sidebar content
st.sidebar.title("Navigation")
st.sidebar.header("Filters")

# Sidebar widgets
category = st.sidebar.selectbox("Category", ["All", "A", "B", "C"])
min_value = st.sidebar.slider("Minimum Value", 0, 100, 25)
show_raw = st.sidebar.checkbox("Show raw data")

# Using 'with' syntax
with st.sidebar:
    st.header("Settings")
    theme = st.radio("Theme", ["Light", "Dark"])
    st.divider()
    st.caption("App v1.0.0")

Tabs:

import streamlit as st

tab1, tab2, tab3 = st.tabs(["📈 Chart", "📊 Data", "⚙️ Settings"])

with tab1:
    st.header("Chart View")
    # Add chart here

with tab2:
    st.header("Data View")
    # Add dataframe here

with tab3:
    st.header("Settings")
    # Add settings here

Expanders and Containers:

import streamlit as st

# Expander (collapsible section)
with st.expander("Click to expand"):
    st.write("Hidden content revealed!")
    st.code("print('Hello')")

# Container (grouping elements)
with st.container():
    st.write("This is inside a container")
    col1, col2 = st.columns(2)
    col1.write("Left")
    col2.write("Right")

# Container with border
with st.container(border=True):
    st.write("Content with border")

# Empty placeholder (for dynamic updates)
placeholder = st.empty()
placeholder.text("Initial text")
# Later: placeholder.text("Updated text")

4. Data Visualization

Plotly Integration:

import streamlit as st
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd

# Sample data
df = pd.DataFrame({
    "date": pd.date_range("2025-01-01", periods=100),
    "value": [i + (i % 7) * 5 for i in range(100)],
    "category": ["A", "B", "C", "D"] * 25
})

# Plotly Express charts
fig = px.line(df, x="date", y="value", color="category", title="Time Series")
st.plotly_chart(fig, use_container_width=True)

# Scatter plot
fig_scatter = px.scatter(
    df, x="date", y="value",
    color="category", size="value",
    hover_data=["category"]
)
st.plotly_chart(fig_scatter, use_container_width=True)

# Bar chart
category_totals = df.groupby("category")["value"].sum().reset_index()
fig_bar = px.bar(category_totals, x="category", y="value", title="Category Totals")
st.plotly_chart(fig_bar, use_container_width=True)

# Graph Objects for more control
fig_go = go.Figure()
fig_go.add_trace(go.Scatter(
    x=df["date"],
    y=df["value"],
    mode="lines+markers",
    name="Values"
))
fig_go.update_layout(title="Custom Plotly Chart", hovermode="x unified")
st.plotly_chart(fig_go, use_container_width=True)

Built-in Charts:

import streamlit as st
import pandas as pd
import numpy as np

# Sample data
chart_data = pd.DataFrame(
    np.random.randn(20, 3),
    columns=["A", "B", "C"]
)

# Simple line chart
st.line_chart(chart_data)

# Area chart
st.area_chart(chart_data)

# Bar chart
st.bar_chart(chart_data)

# Scatter chart (Streamlit 1.26+)
scatter_data = pd.DataFrame({
    "x": np.random.randn(100),
    "y": np.random.randn(100),
    "size": np.random.rand(100) * 100
})
st.scatter_chart(scatter_data, x="x", y="y", size="size")

# Map
map_data = pd.DataFrame({
    "lat": np.random.randn(100) / 50 + 37.76,
    "lon": np.random.randn(100) / 50 - 122.4
})
st.map(map_data)

Matplotlib Integration:

import streamlit as st
import matplotlib.pyplot as plt
import numpy as np

# Create matplotlib figure
fig, ax = plt.subplots(figsize=(10, 6))
x = np.linspace(0, 10, 100)
ax.plot(x, np.sin(x), label="sin(x)")
ax.plot(x, np.cos(x), label="cos(x)")
ax.legend()
ax.set_title("Matplotlib Chart")

# Display in Streamlit
st.pyplot(fig)

5. Caching for Performance

Cache Data (for expensive data operations):

import streamlit as st
import pandas as pd
import polars as pl
import time

@st.cache_data
def load_data(file_path: str) -> pd.DataFrame:
    """Load and cache data. Cache key: file_path."""
    time.sleep(2)  # Simulate slow load
    return pd.read_csv(file_path)

@st.cache_data(ttl=3600)  # Cache expires after 1 hour
def fetch_api_data(endpoint: str) -> dict:
    """Fetch data from API with time-based cache."""
    import requests
    response = requests.get(endpoint)
    return response.json()

@st.cache_data(show_spinner="Loading data...")
def load_with_spinner(path: str) -> pl.DataFrame:
    """Show custom spinner while loading."""
    return pl.read_parquet(path)

# Using cached functions
df = load_data("data/sales.csv")  # First call: slow
df = load_data("data/sales.csv")  # Second call: instant (cached)

# Clear cache programmatically
if st.button("Clear cache"):
    st.cache_data.clear()

Cache Resources (for global resources):

import streamlit as st
from sqlalchemy import create_engine

@st.cache_resource
def get_database_connection():
    """Cache database connection (singleton pattern)."""
    return create_engine("postgresql://user:pass@localhost/db")

@st.cache_resource
def load_ml_model():
    """Cache ML model (loaded once per session)."""
    import joblib
    return joblib.load("model.pkl")

# Use cached resources
engine = get_database_connection()
model = load_ml_model()

6. Session State

Managing State:

import streamlit as st

# Initialize state
if "counter" not in st.session_state:
    st.session_state.counter = 0

if "messages" not in st.session_state:
    st.session_state.messages = []

# Display current state
st.write(f"Counter: {st.session_state.counter}")

# Update state with buttons
col1, col2, col3 = st.columns(3)

if col1.button("Increment"):
    st.session_state.counter += 1
    st.rerun()

if col2.button("Decrement"):
    st.session_state.counter -= 1
    st.rerun()

if col3.button("Reset"):
    st.session_state.counter = 0
    st.rerun()

# State with widgets
st.text_input("Name", key="user_name")
st.write(f"Hello, {st.session_state.user_name}!")

# State callback
def on_change():
    st.session_state.processed = st.session_state.raw_input.upper()

st.text_input("Raw input", key="raw_input", on_change=on_change)
if "processed" in st.session_state:
    st.write(f"Processed: {st.session_state.processed}")

Form State:

import streamlit as st

# Forms prevent rerunning on every widget change
with st.form("my_form"):
    st.write("Submit all at once:")
    name = st.text_input("Name")
    age = st.number_input("Age", min_value=0, max_value=120)
    color = st.selectbox("Favorite color", ["Red", "Green", "Blue"])

    # Every form needs a submit button
    submitted = st.form_submit_button("Submit")

    if submitted:
        st.success(f"Thanks {name}! You're {age} and like {color}.")

7. Multi-Page Applications

Directory Structure:

my_app/
├── app.py              # Main entry point (optional)
├── pages/
│   ├── 1_📊_Dashboard.py
│   ├── 2_📈_Analytics.py
│   └── 3_⚙️_Settings.py
└── utils/
    └── helpers.py

Main App (app.py):

import streamlit as st

st.set_page_config(
    page_title="Multi-Page App",
    page_icon="🏠",
    layout="wide"
)

st.title("Welcome to My App")
st.write("Use the sidebar to navigate between pages.")

# Shared state initialization
if "user" not in st.session_state:
    st.session_state.user = None

Page 1 (pages/1_Dashboard.py):

import streamlit as st

st.set_page_config(page_title="Dashboard", page_icon="📊")

st.title("📊 Dashboard")
st.write("This is the dashboard page")

# Access shared state
if st.session_state.get("user"):
    st.write(f"Welcome back, {st.session_state.user}!")

Page 2 (pages/2_Analytics.py):

import streamlit as st

st.set_page_config(page_title="Analytics", page_icon="📈")

st.title("📈 Analytics")
st.write("This is the analytics page")

# Add analytics content

8. Advanced Features

Status and Progress:

import streamlit as st
import time

# Progress bar
progress = st.progress(0, text="Processing...")
for i in range(100):
    time.sleep(0.01)
    progress.progress(i + 1, text=f"Processing... {i+1}%")

# Spinner
with st.spinner("Loading data..."):
    time.sleep(2)
st.success("Done!")

# Status messages
st.success("Operation successful!")
st.info("This is informational")
st.warning("Warning: Check your inputs")
st.error("An error occurred")
st.exception(ValueError("Example exception"))

# Toast notifications
st.toast("Data saved!", icon="✅")

# Balloons and snow
st.balloons()
st.snow()

Chat Interface:

import streamlit as st
import time

st.title("Chat Demo")

# Initialize chat history
if "messages" not in st.session_state:
    st.session_state.messages = []

# Display chat history
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])

# Chat input
if prompt := st.chat_input("What's on your mind?"):
    # Add user message
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user"):
        st.markdown(prompt)

    # Generate response
    with st.chat_message("assistant"):
        response = f"You said: {prompt}"
        st.markdown(response)
    st.session_state.messages.append({"role": "assistant", "content": response})

Data Editor:

import streamlit as st
import pandas as pd

# Editable dataframe
df = pd.DataFrame({
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [25, 30, 35],
    "Active": [True, False, True]
})

edited_df = st.data_editor(
    df,
    num_rows="dynamic",  # Allow adding/deleting rows
    column_config={
        "Name": st.column_config.TextColumn("Name", required=True),
        "Age": st.column_config.NumberColumn("Age", min_value=0, max_value=120),
        "Active": st.column_config.CheckboxColumn("Active")
    }
)

if st.button("Save changes"):
    st.write("Saved:", edited_df)

Complete Examples

Example 1: Sales Dashboard

import streamlit as st
import pandas as pd
import polars as pl
import plotly.express as px
import plotly.graph_objects as go
from datetime import datetime, timedelta

# Page config
st.set_page_config(
    page_title="Sales Dashboard",
    page_icon="📊",
    layout="wide"
)

# Custom CSS
st.markdown("""
<style>
    .metric-card {
        background-color: #f0f2f6;
        padding: 20px;
        border-radius: 10px;
        text-align: center;
    }
</style>
""", unsafe_allow_html=True)

# Title
st.title("📊 Sales Analytics Dashboard")

# Sidebar filters
st.sidebar.header("Filters")

# Date range filter
date_range = st.sidebar.date_input(
    "Date Range",
    value=(datetime.now() - timedelta(days=30), datetime.now()),
    format="YYYY-MM-DD"
)

# Category filter
categories = st.sidebar.multiselect(
    "Categories",
    options=["Electronics", "Clothing", "Food", "Home", "Sports"],
    default=["Electronics", "Clothing", "Food"]
)

# Region filter
regions = st.sidebar.multiselect(
    "Regions",
    options=["North", "South", "East", "West"],
    default=["North", "South", "East", "West"]
)

# Load and filter data
@st.cache_data
def load_sales_data():
    """Generate sample sales data."""
    import numpy as np
    np.random.seed(42)

    dates = pd.date_range(start="2024-01-01", end="2025-12-31", freq="D")
    n = len(dates) * 10  # Multiple records per day

    return pd.DataFrame({
        "date": np.random.choice(dates, n),
        "category": np.random.choice(
            ["Electronics", "Clothing", "Food", "Home", "Sports"], n
        ),
        "region": np.random.choice(["North", "South", "East", "West"], n),
        "revenue": np.random.uniform(100, 5000, n),
        "units": np.random.randint(1, 50, n),
        "customer_id": np.random.randint(1000, 9999, n)
    })

# Load data
df = load_sales_data()

# Apply filters
filtered_df = df[
    (df["date"] >= pd.Timestamp(date_range[0])) &
    (df["date"] <= pd.Timestamp(date_range[1])) &
    (df["category"].isin(categories)) &
    (df["region"].isin(regions))
]

# KPI Metrics Row
st.subheader("Key Performance Indicators")
col1, col2, col3, col4 = st.columns(4)

total_revenue = filtered_df["revenue"].sum()
total_units = filtered_df["units"].sum()
total_orders = len(filtered_df)
unique_customers = filtered_df["customer_id"].nunique()

col1.metric(
    "Total Revenue",
    f"${total_revenue:,.0f}",
    delta=f"+{(total_revenue * 0.12):,.0f} vs last period"
)
col2.metric(
    "Units Sold",
    f"{total_units:,}",
    delta=f"+{int(total_units * 0.08):,}"
)
col3.metric(
    "Orders",
    f"{total_orders:,}",
    delta=f"+{int(total_orders * 0.05):,}"
)
col4.metric(
    "Unique Customers",
    f"{unique_customers:,}",
    delta=f"+{int(unique_customers * 0.03):,}"
)

# Charts Row
st.subheader("Revenue Analysis")

col1, col2 = st.columns(2)

with col1:
    # Revenue trend
    daily_revenue = filtered_df.groupby("date")["revenue"].sum().reset_index()
    fig_trend = px.line(
        daily_revenue,
        x="date",
        y="revenue",
        title="Daily Revenue Trend"
    )
    fig_trend.update_layout(hovermode="x unified")
    st.plotly_chart(fig_trend, use_container_width=True)

with col2:
    # Revenue by category
    category_revenue = filtered_df.groupby("category")["revenue"].sum().reset_index()
    fig_category = px.pie(
        category_revenue,
        values="revenue",
        names="category",
        title="Revenue by Category"
    )
    st.plotly_chart(fig_category, use_container_width=True)

# Second charts row
col1, col2 = st.columns(2)

with col1:
    # Regional comparison
    regional_data = filtered_df.groupby("region").agg({
        "revenue": "sum",
        "units": "sum"
    }).reset_index()

    fig_region = px.bar(
        regional_data,
        x="region",
        y="revenue",
        color="region",
        title="Revenue by Region"
    )
    st.plotly_chart(fig_region, use_container_width=True)

with col2:
    # Category by region heatmap
    pivot_data = filtered_df.pivot_table(
        values="revenue",
        index="category",
        columns="region",
        aggfunc="sum"
    )

    fig_heatmap = px.imshow(
        pivot_data,
        title="Revenue Heatmap: Category vs Region",
        color_continuous_scale="Blues",
        text_auto=".0f"
    )
    st.plotly_chart(fig_heatmap, use_container_width=True)

# Data Table
st.subheader("Detailed Data")

with st.expander("View Raw Data"):
    # Aggregated summary
    summary = filtered_df.groupby(["date", "category", "region"]).agg({
        "revenue": "sum",
        "units": "sum",
        "customer_id": "nunique"
    }).reset_index()
    summary.columns = ["Date", "Category", "Region", "Revenue", "Units", "Customers"]

    st.dataframe(
        summary.sort_values("Date", ascending=False),
        use_container_width=True,
        height=400
    )

    # Download button
    csv = summary.to_csv(index=False)
    st.download_button(
        label="Download CSV",
        data=csv,
        file_name=f"sales_data_{datetime.now().strftime('%Y%m%d')}.csv",
        mime="text/csv"
    )

# Footer
st.markdown("---")
st.caption(f"Last updated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

Example 2: Data Explorer Tool

import streamlit as st
import pandas as pd
import polars as pl
import plotly.express as px

st.set_page_config(page_title="Data Explorer", page_icon="🔍", layout="wide")

st.title("🔍 Interactive Data Explorer")

# File upload
uploaded_file = st.file_uploader(
    "Upload your data file",
    type=["csv", "xlsx", "parquet"],
    help="Supported formats: CSV, Excel, Parquet"
)

@st.cache_data
def load_uploaded_file(file, file_type):
    """Load uploaded file based on type."""
    if file_type == "csv":
        return pd.read_csv(file)
    elif file_type == "xlsx":
        return pd.read_excel(file)
    elif file_type == "parquet":
        return pd.read_parquet(file)

if uploaded_file is not None:
    # Determine file type
    file_type = uploaded_file.name.split(".")[-1].lower()

    # Load data
    with st.spinner("Loading data..."):
        df = load_uploaded_file(uploaded_file, file_type)

    st.success(f"Loaded {len(df):,} rows and {len(df.columns)} columns")

    # Data overview tabs
    tab1, tab2, tab3, tab4 = st.tabs([
        "📋 Overview",
        "📊 Visualize",
        "🔢 Statistics",
        "🔍 Filter & Export"
    ])

    with tab1:
        st.subheader("Data Preview")
        st.dataframe(df.head(100), use_container_width=True)

        col1, col2 = st.columns(2)
        with col1:
            st.write("**Column Types:**")
            type_df = pd.DataFrame({
                "Column": df.columns,
                "Type": df.dtypes.astype(str),
                "Non-Null": df.notna().sum(),
                "Null %": (df.isna().sum() / len(df) * 100).round(2)
            })
            st.dataframe(type_df, use_container_width=True)

        with col2:
            st.write("**Data Shape:**")
            st.write(f"- Rows: {len(df):,}")
            st.write(f"- Columns: {len(df.columns)}")
            st.write(f"- Memory: {df.memory_usage(deep=True).sum() / 1024**2:.2f} MB")

    with tab2:
        st.subheader("Quick Visualizations")

        # Get column types
        numeric_cols = df.select_dtypes(include=["number"]).columns.tolist()
        categorical_cols = df.select_dtypes(include=["object", "category"]).columns.tolist()

        chart_type = st.selectbox(
            "Chart Type",
            ["Histogram", "Scatter", "Line", "Bar", "Box Plot"]
        )

        if chart_type == "Histogram":
            col = st.selectbox("Select column", numeric_cols)
            bins = st.slider("Number of bins", 10, 100, 30)
            fig = px.histogram(df, x=col, nbins=bins, title=f"Distribution of {col}")
            st.plotly_chart(fig, use_container_width=True)

        elif chart_type == "Scatter":
            col1, col2 = st.columns(2)
            x_col = col1.selectbox("X axis", numeric_cols)
            y_col = col2.selectbox("Y axis", numeric_cols, index=min(1, len(numeric_cols)-1))
            color_col = st.selectbox("Color by (optional)", ["None"] + categorical_cols)

            fig = px.scatter(
                df, x=x_col, y=y_col,
                color=None if color_col == "None" else color_col,
                title=f"{y_col} vs {x_col}"
            )
            st.plotly_chart(fig, use_container_width=True)

        elif chart_type == "Line":
            x_col = st.selectbox("X axis", df.columns.tolist())
            y_cols = st.multiselect("Y axis (select multiple)", numeric_cols)
            if y_cols:
                fig = px.line(df, x=x_col, y=y_cols, title="Line Chart")
                st.plotly_chart(fig, use_container_width=True)

        elif chart_type == "Bar":
            cat_col = st.selectbox("Category", categorical_cols if categorical_cols else df.columns.tolist())
            val_col = st.selectbox("Value", numeric_cols)
            agg_func = st.selectbox("Aggregation", ["sum", "mean", "count", "median"])

            agg_data = df.groupby(cat_col)[val_col].agg(agg_func).reset_index()
            fig = px.bar(agg_data, x=cat_col, y=val_col, title=f"{agg_func.title()} of {val_col} by {cat_col}")
            st.plotly_chart(fig, use_container_width=True)

        elif chart_type == "Box Plot":
            val_col = st.selectbox("Value", numeric_cols)
            group_col = st.selectbox("Group by (optional)", ["None"] + categorical_cols)

            fig = px.box(
                df, y=val_col,
                x=None if group_col == "None" else group_col,
                title=f"Distribution of {val_col}"
            )
            st.plotly_chart(fig, use_container_width=True)

    with tab3:
        st.subheader("Statistical Summary")

        # Numeric statistics
        if numeric_cols:
            st.write("**Numeric Columns:**")
            st.dataframe(df[numeric_cols].describe(), use_container_width=True)

        # Categorical statistics
        if categorical_cols:
            st.write("**Categorical Columns:**")
            for col in categorical_cols[:5]:  # Limit to first 5
                with st.expander(f"{col} value counts"):
                    st.dataframe(
                        df[col].value_counts().head(20).reset_index(),
                        use_container_width=True
                    )

        # Correlation matrix
        if len(numeric_cols) > 1:
            st.write("**Correlation Matrix:**")
            corr = df[numeric_cols].corr()
            fig = px.imshow(
                corr,
                text_auto=".2f",
                color_continuous_scale="RdBu_r",
                aspect="auto"
            )
            st.plotly_chart(fig, use_container_width=True)

    with tab4:
        st.subheader("Filter & Export")

        # Dynamic filtering
        st.write("**Apply Filters:**")

        filtered_df = df.copy()

        for col in df.columns[:10]:  # Limit columns for UI
            if df[col].dtype in ["int64", "float64"]:
                min_val, max_val = float(df[col].min()), float(df[col].max())
                if min_val < max_val:
                    range_val = st.slider(
                        f"{col} range",
                        min_val, max_val, (min_val, max_val),
                        key=f"filter_{col}"
                    )
                    filtered_df = filtered_df[
                        (filtered_df[col] >= range_val[0]) &
                        (filtered_df[col] <= range_val[1])
                    ]
            elif df[col].dtype == "object" and df[col].nunique() < 20:
                selected = st.multiselect(
                    f"{col}",
                    options=df[col].unique().tolist(),
                    default=df[col].unique().tolist(),
                    key=f"filter_{col}"
                )
                filtered_df = filtered_df[filtered_df[col].isin(selected)]

        st.write(f"**Filtered data: {len(filtered_df):,} rows**")
        st.dataframe(filtered_df.head(100), use_container_width=True)

        # Export
        col1, col2 = st.columns(2)
        with col1:
            csv = filtered_df.to_csv(index=False)
            st.download_button(
                "Download as CSV",
                csv,
                "filtered_data.csv",
                "text/csv"
            )
        with col2:
            excel_buffer = pd.ExcelWriter("filtered_data.xlsx", engine="openpyxl")
            filtered_df.to_excel(excel_buffer, index=False)

else:
    st.info("Please upload a data file to get started.")

    # Sample data option
    if st.button("Load sample data"):
        import numpy as np
        np.random.seed(42)

        sample_df = pd.DataFrame({
            "date": pd.date_range("2025-01-01", periods=100),
            "category": np.random.choice(["A", "B", "C"], 100),
            "value": np.random.randn(100) * 100 + 500,
            "count": np.random.randint(1, 100, 100)
        })
        sample_df.to_csv("/tmp/sample_data.csv", index=False)
        st.success("Sample data created! Upload '/tmp/sample_data.csv'")

Example 3: ML Model Demo

import streamlit as st
import pandas as pd
import numpy as np
import plotly.express as px
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

st.set_page_config(page_title="ML Demo", page_icon="🤖", layout="wide")

st.title("🤖 Machine Learning Model Demo")

# Sidebar
st.sidebar.header("Model Configuration")

# Model parameters
n_estimators = st.sidebar.slider("Number of trees", 10, 200, 100)
max_depth = st.sidebar.slider("Max depth", 1, 20, 5)
test_size = st.sidebar.slider("Test size", 0.1, 0.5, 0.2)

# Generate sample data
@st.cache_data
def generate_sample_data(n_samples=1000):
    np.random.seed(42)

    # Features
    X = pd.DataFrame({
        "feature_1": np.random.randn(n_samples),
        "feature_2": np.random.randn(n_samples),
        "feature_3": np.random.uniform(0, 100, n_samples),
        "feature_4": np.random.choice(["A", "B", "C"], n_samples)
    })

    # Target (based on features with some noise)
    y = (
        (X["feature_1"] > 0).astype(int) +
        (X["feature_3"] > 50).astype(int)
    ) >= 1
    y = y.astype(int)

    return X, y

X, y = generate_sample_data()

# Convert categorical
X_encoded = pd.get_dummies(X, columns=["feature_4"])

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(
    X_encoded, y, test_size=test_size, random_state=42
)

# Train model
@st.cache_resource
def train_model(n_est, depth, _X_train, _y_train):
    model = RandomForestClassifier(
        n_estimators=n_est,
        max_depth=depth,
        random_state=42
    )
    model.fit(_X_train, _y_train)
    return model

# Training
with st.spinner("Training model..."):
    model = train_model(n_estimators, max_depth, X_train, y_train)

# Predictions
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

# Results
st.subheader("Model Performance")

col1, col2, col3 = st.columns(3)
col1.metric("Accuracy", f"{accuracy:.2%}")
col2.metric("Training Samples", f"{len(X_train):,}")
col3.metric("Test Samples", f"{len(X_test):,}")

# Feature importance
st.subheader("Feature Importance")
importance_df = pd.DataFrame({
    "Feature": X_encoded.columns,
    "Importance": model.feature_importances_
}).sort_values("Importance", ascending=True)

fig = px.bar(
    importance_df,
    x="Importance",
    y="Feature",
    orientation="h",
    title="Feature Importance"
)
st.plotly_chart(fig, use_container_width=True)

# Interactive prediction
st.subheader("Make a Prediction")

with st.form("prediction_form"):
    col1, col2 = st.columns(2)

    with col1:
        f1 = st.number_input("Feature 1", value=0.0)
        f2 = st.number_input("Feature 2", value=0.0)

    with col2:
        f3 = st.number_input("Feature 3", min_value=0.0, max_value=100.0, value=50.0)
        f4 = st.selectbox("Feature 4", ["A", "B", "C"])

    submitted = st.form_submit_button("Predict")

    if submitted:
        # Prepare input
        input_data = pd.DataFrame({
            "feature_1": [f1],
            "feature_2": [f2],
            "feature_3": [f3],
            "feature_4": [f4]
        })
        input_encoded = pd.get_dummies(input_data, columns=["feature_4"])

        # Align columns
        for col in X_encoded.columns:
            if col not in input_encoded.columns:
                input_encoded[col] = 0
        input_encoded = input_encoded[X_encoded.columns]

        # Predict
        prediction = model.predict(input_encoded)[0]
        proba = model.predict_proba(input_encoded)[0]

        st.success(f"Prediction: **{'Positive' if prediction == 1 else 'Negative'}**")
        st.write(f"Confidence: {max(proba):.2%}")

Deployment Patterns

Streamlit Cloud Deployment

# requirements.txt
streamlit>=1.32.0
pandas>=2.0.0
polars>=0.20.0
plotly>=5.18.0
numpy>=1.24.0
# .streamlit/config.toml
[theme]
primaryColor="#1f77b4"
backgroundColor="#ffffff"
secondaryBackgroundColor="#f0f2f6"
textColor="#262730"
font="sans serif"

[server]
maxUploadSize=200
enableXsrfProtection=true

Docker Deployment

# Dockerfile
FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

EXPOSE 8501

HEALTHCHECK CMD curl --fail http://localhost:8501/_stcore/health

ENTRYPOINT ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]
# Build and run
docker build -t my-streamlit-app .
docker run -p 8501:8501 my-streamlit-app

Best Practices

1. Use Caching Appropriately

# GOOD: Cache data loading
@st.cache_data
def load_data():
    return pd.read_csv("data.csv")

# GOOD: Cache resources (DB connections, models)
@st.cache_resource
def get_model():
    return load_model("model.pkl")

# AVOID: Caching with unhashable arguments
# Use _arg prefix to skip hashing
@st.cache_data
def process_data(_db_connection, query):
    return _db_connection.execute(query)

2. Organize Large Apps

# utils/data.py
def load_data():
    pass

# utils/charts.py
def create_chart(df):
    pass

# app.py
from utils.data import load_data
from utils.charts import create_chart

3. Handle State Carefully

# GOOD: Initialize state at the top
if "data" not in st.session_state:
    st.session_state.data = None

# GOOD: Use callbacks for complex updates
def on_filter_change():
    st.session_state.filtered_data = apply_filter(st.session_state.data)

st.selectbox("Filter", options, on_change=on_filter_change)

4. Optimize Performance

# Use containers for layout stability
placeholder = st.empty()

# Batch widget updates in forms
with st.form("filters"):
    # Multiple widgets
    st.form_submit_button()

# Use columns for responsive layout
cols = st.columns([1, 2, 1])

Troubleshooting

Common Issues

Issue: App reruns on every interaction

# Use forms to batch inputs
with st.form("my_form"):
    input1 = st.text_input("Input")
    submit = st.form_submit_button()

Issue: Slow data loading

# Add caching
@st.cache_data(ttl=3600)
def load_data():
    return pd.read_csv("large_file.csv")

Issue: Memory issues with large files

# Use chunking
@st.cache_data
def load_large_file(path, nrows=10000):
    return pd.read_csv(path, nrows=nrows)

Issue: Widget state lost on rerun

# Persist in session state
if "value" not in st.session_state:
    st.session_state.value = default_value

# Use key parameter
st.text_input("Name", key="user_name")

Version History

  • 1.0.0 (2026-01-17): Initial release
    • Basic app structure and widgets
    • Layout and organization patterns
    • Data visualization integration
    • Caching strategies
    • Session state management
    • Multi-page applications
    • Complete dashboard examples
    • Deployment patterns
    • Best practices and troubleshooting

Resources


Build beautiful data apps with pure Python - no frontend experience required!

Weekly Installs
12
First Seen
Jan 24, 2026
Installed on
antigravity10
claude-code10
codex9
gemini-cli9
cursor9
opencode9