Deephaven Development

Mandatory Reference Reading

BEFORE writing ANY code involving these topics, you MUST read the reference file:

Topic	Reference File	Read BEFORE writing code that...
Joins	`references/joins.md`	uses natural_join, aj, raj, exact_join, range_join
Aggregations	`references/aggregations.md`	uses agg_by, sum_by, avg_by, count_by, etc.
Update-by	`references/updateby.md`	uses rolling ops, cumulative ops, EMAs, forward fill
Time	`references/time-operations.md`	parses, bins, or manipulates timestamps
Kafka	`references/kafka.md`	consumes from or produces to Kafka
Iceberg	`references/iceberg.md`	reads/writes Iceberg tables
UI	`references/ui.md`	creates dashboards, components, or uses hooks
Plotting	`references/plotting.md`	creates charts or visualizations with dx
CSV	`references/csv.md`	imports/exports CSV files, fixes column types, parses dates from CSVs
Sitemap	`references/sitemap.md`	needs to look up additional documentation URLs

If you read Aggregations you must also read update-by, as they often go hand-in-hand.

Do NOT guess or rely on memory. Deephaven APIs have specific patterns that differ from similar libraries. Reading the reference ensures correct usage.

When debugging errors:

Re-read the relevant reference file for the operation that failed
Compare your code against the reference examples
Check for common mistakes noted in the reference

YOU NEVER ADD PRINT STATEMENTS TO PRINT TABLES, DO NOT CONVERT TO PANDAS JUST TO PRINT

Fetch online docs when needed:

Read the sitemap reference if you need to look up documentation URLs for specific operations or classes.

Table Operations

Core Principles

Do as much in-engine as possible. Deephaven's Java engine is highly optimized. Prefer built-in operations over Python UDFs.

Avoid Python UDFs in query strings. Each Python call crosses the Python-Java boundary (~30x slower). Use built-in functions from java.lang.Math, time functions, and auto-imported query language functions instead.

Use the right column operation:

select/update: Materialize in RAM - use for expensive formulas accessed frequently
view/update_view: On-demand calculation - use for fast formulas or infrequent access
lazy_update: Cache results - use for repeated values

Filter early. Place partition/grouping column filters first in where() to exclude data early.

Table Creation

from deephaven import empty_table, new_table, ring_table, time_table
from deephaven.column import double_col, string_col

# Empty table with formulas
t = empty_table(100).update(["X = i", "Y = X * 2"])

# New table from columns
t = new_table(
    [string_col("Sym", ["AAPL", "GOOG"]), double_col("Price", [150.0, 140.0])]
)

# Ticking time table (real-time)
t = time_table("PT1s")  # ticks every second

# Ring table (bounded size, keeps last N rows)
source = time_table("PT1s")
t = ring_table(source, capacity=1000)

Table Types

Type	Memory	Use Case
Static	Fixed	Imported files, snapshots
Append-only	Unbounded	Complete history from streams
Blink	Bounded	Current cycle only (Kafka default)
Ring	Bounded	Last N rows only

Extracting Scalar Values

print(table) only shows the object reference. To get actual values for a specific cell:

from deephaven import empty_table

table = empty_table(1).update(["ColName = 42.5"])

# Get a scalar from a 1-row result table
value = table.j_table.getColumnSource("ColName").get(
    table.j_table.getRowSet().firstRowKey()
)
print(f"Result: {value:,.2f}")

Column Operations

from deephaven import empty_table

t = empty_table(100).update(["A = i", "B = i * 2", "OldName = i", "Unwanted = i"])

# select - new table with specified columns (materialized)
t.select(["A", "B", "C = A + B"])

# view - formula table (calculated on access)
t.view(["A", "B", "C = A + B"])

# update - add columns to existing (materialized)
t.update(["C = A + B", "D = sqrt(C)"])

# update_view - add columns (calculated on access)
t.update_view(["C = A + B"])

# drop_columns - remove columns
t.drop_columns(["Unwanted"])

# rename_columns - rename columns
t.rename_columns(["NewName = OldName"])

Filtering

import datetime

from deephaven import new_table
from deephaven.column import datetime_col, double_col, string_col

t = new_table(
    [
        string_col("Sym", ["AAPL", "GOOG", "MSFT", "AMZN"]),
        double_col("Price", [150.0, 140.0, 200.0, 50.0]),
        string_col("Description", ["no error", "has error", "fine", "ok"]),
        datetime_col(
            "Timestamp",
            [
                datetime.datetime(2024, 6, 1, tzinfo=datetime.timezone.utc),
                datetime.datetime(2024, 6, 2, tzinfo=datetime.timezone.utc),
                datetime.datetime(2024, 6, 3, tzinfo=datetime.timezone.utc),
                datetime.datetime(2024, 6, 4, tzinfo=datetime.timezone.utc),
            ],
        ),
    ]
)

filter_table = new_table([string_col("Sym", ["AAPL", "GOOG"])])

# Basic where
t.where("Price > 100")
t.where(["Sym = `AAPL`", "Price > 100"])  # AND logic

# String matching (Java String methods)
t.where("Sym.startsWith(`A`)")
t.where("Description.contains(`error`)")

# Set membership (fast)
t.where("Sym in `AAPL`, `GOOG`, `MSFT`")
t.where_in(filter_table, "Sym")
t.where_not_in(filter_table, "Sym")

# Time filtering
t.where("Timestamp > parseInstant(`2024-01-01T00:00:00 America/New_York`)")

Joins Overview

Read references/joins.md before using joins.

Join	Use Case	Match Type
`natural_join`	Add columns from right, NULL if no match	Exact
`exact_join`	Add columns, error if not exactly one match	Exact
`join`	All matching combinations	Exact
`aj`	As-of join: find closest <= timestamp	Time-series
`raj`	Reverse as-of: find closest >= timestamp	Time-series
`range_join`	Match within ranges, aggregate results	Range

Aggregations Overview

Read references/aggregations.md before using aggregations.

Dedicated (single operation):

from deephaven import new_table
from deephaven.column import double_col, int_col, string_col

t = new_table(
    [
        string_col("Sym", ["AAPL", "AAPL", "GOOG", "GOOG"]),
        double_col("Price", [150.0, 152.0, 140.0, 142.0]),
        int_col("Qty", [100, 200, 150, 250]),
    ]
)

t.sum_by("Sym")  # Sum all numeric columns
t.avg_by("Sym")  # Average
t.count_by("Count", "Sym")  # Count rows
t.first_by("Sym")  # First row per group
t.last_by("Sym")  # Last row per group

Combined (multiple operations):

from deephaven import agg, new_table
from deephaven.column import double_col, int_col, string_col

t = new_table(
    [
        string_col("Sym", ["AAPL", "AAPL", "GOOG", "GOOG"]),
        double_col("Price", [150.0, 152.0, 140.0, 142.0]),
        int_col("Qty", [100, 200, 150, 250]),
    ]
)

t.agg_by(
    [
        agg.avg(cols=["AvgPrice = Price"]),
        agg.sum_(cols=["TotalQty = Qty"]),
        agg.count_(col="Count"),
    ],
    by=["Sym"],
)

Query String Syntax

Literals:

Boolean: true, false (lowercase)
Int: 42, Long: 42L, Double: 3.14 (underscores like 1_000L are NOT supported)
String: backticks `hello`
DateTime: parseInstant(\2024-01-01T12:00:00 America/New_York`)(short aliases likeNY` do NOT work)
Duration: 'PT1h30m', Period: 'P1y2m3d'

Built-in variables:

i - row index (0-based)
ii / k - row key (stable identifier)

Ternary operator:

from deephaven import empty_table

t = empty_table(10).update(["Price = i * 25.0"])
t.update(["Category = Price > 100 ? `High` : `Low`"])

Null handling:

from deephaven import empty_table

t = empty_table(10).update(["Value = i % 2 == 0 ? NULL_INT : i"])
t.update(["Safe = isNull(Value) ? 0 : Value"])

Data Import/Export

from deephaven import empty_table, read_csv, write_csv
from deephaven.parquet import read, write

t = empty_table(10).update(["X = i", "Y = X * 2"])

# CSV
write_csv(t, "/tmp/output.csv")
t_csv = read_csv("/tmp/output.csv")

# Parquet
write(t, "/tmp/output.parquet")
t_parquet = read("/tmp/output.parquet")

Reference Documentation

Reference	Content
`references/joins.md`	All 6 join types with examples, match syntax, performance tips
`references/aggregations.md`	Dedicated and combined aggregations, 20+ aggregators, common patterns
`references/updateby.md`	Rolling/cumulative ops, EMAs, MACD/Bollinger patterns
`references/time-operations.md`	Time literals, parsing, binning, calendars, timezone conversion
`references/kafka.md`	Kafka consumption/production, table types, key/value specs
`references/iceberg.md`	Catalog types, reading/writing tables, partitioned writes
`references/ui.md`	Dashboard structure, hooks, components, ui.table, styling
`references/plotting.md`	Deephaven Express (dx), all plot types, subplots, interactivity
`references/csv.md`	CSV import/export, column types, renaming, date parsing, messy files
`references/sitemap.md`	Full documentation URL lookup

deephaven-core-query-writing