neo4j-aura-graph-analytics-skill by neo4j-contrib/neo4j-skills

When to Use

Running GDS algorithms on Aura Business Critical (BC) or Virtual Dedicated Cloud (VDC)
Processing graph data from non-Neo4j sources (Pandas, Spark, CSV)
On-demand / pipeline workloads — ephemeral sessions, pay per session-minute
Full isolation from the live database during analytics

When NOT to Use

Aura Pro with embedded GDS plugin → neo4j-gds-skill
Self-managed Neo4j with embedded GDS plugin → neo4j-gds-skill
Writing Cypher queries → neo4j-cypher-skill
Snowflake Graph Analytics → neo4j-snowflake-graph-analytics-skill

Deployment Decision Table

Deployment	Skill
Aura Free	❌ AGA not available
Aura Pro	`neo4j-gds-skill` (embedded plugin)
Aura Business Critical	this skill
Aura Virtual Dedicated Cloud	this skill
Non-Neo4j data (Pandas, Spark)	this skill (standalone mode)

Defaults

graphdatascience >= 1.15 required; >= 1.18 for Spark
Always call gds.verify_connectivity() after session creation
Always estimate memory before creating a session for large graphs
Always set TTL; default is 1 hour idle, max 7 days
Close session when done — gds.delete() or sessions.delete(name) stops billing
Use AuraAPICredentials.from_env() — never hardcode credentials

Installation

pip install "graphdatascience>=1.15"

Key Patterns

Step 1 — Authenticate

import os
from graphdatascience.session import AuraAPICredentials, GdsSessions

sessions = GdsSessions(api_credentials=AuraAPICredentials.from_env())
# Reads: AURA_CLIENT_ID, AURA_CLIENT_SECRET, AURA_PROJECT_ID (optional)
# Create API credentials in Aura Console → Account → API credentials

If member of multiple projects, set AURA_PROJECT_ID or pass project_id= explicitly.

Step 2 — Estimate Memory

from graphdatascience.session import AlgorithmCategory, SessionMemory

memory = sessions.estimate(
    node_count=1_000_000,
    relationship_count=5_000_000,
    algorithm_categories=[
        AlgorithmCategory.CENTRALITY,
        AlgorithmCategory.NODE_EMBEDDING,
        AlgorithmCategory.COMMUNITY_DETECTION,
    ],
)
# Returns a SessionMemory tier, e.g. SessionMemory.m_8GB
# Fixed tiers: m_2GB … m_256GB — see references/limitations.md

Step 3 — Create Session

Mode A — AuraDB connected:

from graphdatascience.session import DbmsConnectionInfo, SessionMemory, CloudLocation
from datetime import timedelta

db_connection = DbmsConnectionInfo(
    username=os.environ["NEO4J_USERNAME"],
    password=os.environ["NEO4J_PASSWORD"],
    aura_instance_id=os.environ["AURA_INSTANCEID"],  # from Aura Console URL
)

gds = sessions.get_or_create(
    session_name="my-analysis",
    memory=memory,
    db_connection=db_connection,
    ttl=timedelta(hours=2),
)
gds.verify_connectivity()

Mode B — Self-managed Neo4j:

db_connection = DbmsConnectionInfo(
    uri=os.environ["NEO4J_URI"],          # e.g. "bolt://my-server:7687"
    username=os.environ["NEO4J_USERNAME"],
    password=os.environ["NEO4J_PASSWORD"],
)
gds = sessions.get_or_create(
    session_name="my-analysis-sm",
    memory=SessionMemory.m_8GB,
    db_connection=db_connection,
    ttl=timedelta(hours=2),
    cloud_location=CloudLocation("gcp", "europe-west1"),
)
gds.verify_connectivity()

Mode C — Standalone (no Neo4j DB):

gds = sessions.get_or_create(
    session_name="my-standalone",
    memory=SessionMemory.m_4GB,
    ttl=timedelta(hours=1),
    cloud_location=CloudLocation("gcp", "europe-west1"),
)
gds.verify_connectivity()

get_or_create() is idempotent — reconnects to existing session by name.

Step 4 — Project Graph

From connected Neo4j (remote projection):

G, result = gds.graph.project(
    "my-graph",
    """
    CALL () {
        MATCH (p:Person)
        OPTIONAL MATCH (p)-[r:KNOWS]->(p2:Person)
        RETURN p AS source, r AS rel, p2 AS target,
               p {.age, .score} AS sourceNodeProperties,
               p2 {.age, .score} AS targetNodeProperties
    }
    RETURN gds.graph.project.remote(source, target, {
        sourceNodeLabels:     labels(source),
        targetNodeLabels:     labels(target),
        sourceNodeProperties: sourceNodeProperties,
        targetNodeProperties: targetNodeProperties,
        relationshipType:     type(rel)
    })
    """,
)
print(f"Projected {G.node_count()} nodes, {G.relationship_count()} relationships")

CALL () { ... } is required for multi-pattern MATCH. Use UNION inside CALL for multiple labels/rel types.

From Pandas DataFrames (standalone mode):

import pandas as pd

nodes_df = pd.DataFrame([
    {"nodeId": 0, "labels": "Person", "age": 30},
    {"nodeId": 1, "labels": "Person", "age": 25},
])
rels_df = pd.DataFrame([
    {"sourceNodeId": 0, "targetNodeId": 1, "relationshipType": "KNOWS"},
])

G = gds.graph.construct("my-graph", nodes_df, rels_df)
# Multiple DataFrames: gds.graph.construct("g", [nodes1, nodes2], [rels1, rels2])

Required columns — nodes: nodeId (int), labels (str). Relationships: sourceNodeId, targetNodeId, relationshipType. String node properties not supported — drop before construct().

Step 5 — Run Algorithms

# Mutate — chain results without writing to DB
gds.pageRank.mutate(G, mutateProperty="pagerank", dampingFactor=0.85)
gds.fastRP.mutate(G,
    mutateProperty="embedding",
    embeddingDimension=128,
    featureProperties=["pagerank"],
    randomSeed=42,
)

# Stream — inspect results as DataFrame
df = gds.pageRank.stream(G)
print(df.sort_values("score", ascending=False).head(10))

# Write — persist to connected Neo4j DB (connected modes only)
gds.louvain.write(G, writeProperty="community")

All GDS algorithms work in AGA except topological link prediction. See neo4j-gds-skill for the full algorithm reference.

Step 6 — Async Job Polling

Algorithm calls may return a job handle for long-running computations. Poll until done:

import time

job = gds.pageRank.mutate(G, mutateProperty="pagerank")

# If job object returned (async mode), poll explicitly:
if hasattr(job, "status"):
    while job.status() not in ("RUNNING_DONE", "FAILED", "CANCELLED"):
        time.sleep(5)
        print(f"Job status: {job.status()}")
    if job.status() != "RUNNING_DONE":
        raise RuntimeError(f"Algorithm job failed: {job.status()}")

Do NOT assume immediate completion on large graphs. Check .status() before reading results.

Step 7 — Retrieve Results

# Stream node properties — one column per property
result_df = gds.graph.nodeProperties.stream(
    G,
    node_properties=["pagerank", "embedding"],
    separate_property_columns=True,
    db_node_properties=["name"],   # pull from connected DB for context (connected modes only)
)
result_df.head(10)

Standalone mode — no db_node_properties; join back to source DataFrame:

result_df = gds.graph.nodeProperties.stream(G, ["pagerank"], separate_property_columns=True)
result_df.merge(nodes_df[["nodeId", "name"]], how="left")

Step 8 — Write Back and Clean Up

# Write multiple node properties to connected Neo4j
gds.graph.nodeProperties.write(G, ["pagerank", "embedding"])

# Write relationship properties
gds.graph.relationshipProperties.write(G, G.relationship_types(), ["score"])

# Run Cypher against connected DB from within session
gds.run_cypher("MATCH (n:Person) RETURN count(n)")

# Drop projected graph (frees session memory)
G.drop()

# Delete session — stops billing
sessions.delete(session_name="my-analysis")
# or: gds.delete()

Write before deleting — results not written back are lost when session closes.

Session Management

# List active sessions
from pandas import DataFrame
DataFrame(sessions.list())

# Reconnect to existing session
gds = sessions.get_or_create(session_name="my-analysis", memory=..., db_connection=...)

Common Errors

Error	Cause	Fix
`AuthenticationError` / 401	Wrong `CLIENT_ID`/`CLIENT_SECRET`	Regenerate in Aura Console → Account → API credentials
`SessionNotFoundError`	Session expired (TTL exceeded) or name typo	`sessions.list()` to check; recreate session
`GraphNotFoundError`	Projection dropped or session reconnected without re-projecting	Re-run `gds.graph.project()` or `gds.graph.construct()`
Algorithm job `FAILED`	Memory limit exceeded or unsupported algorithm	Increase `SessionMemory`; check topological link prediction not used
`MemoryEstimationExceeded`	Graph larger than estimated	Re-estimate with actual counts; pick next tier up
Results empty after session reconnect	Results not written before session was closed	Always write/stream before `gds.delete()`
`String node properties not supported`	String column in nodes DataFrame	Drop string columns before `gds.graph.construct()`
`AGA not enabled for project`	AGA feature not activated	Enable in Aura Console → project settings

References

Load on demand:

references/workflows.md — full AuraDB and standalone workflow examples, Spark integration
references/limitations.md — AGA vs embedded GDS feature table, SessionMemory tiers, cloud locations

WebFetch

Need	URL
AGA Python client docs	`https://neo4j.com/docs/graph-data-science-client/current/aura-graph-analytics/`
AuraDB tutorial notebook	`https://github.com/neo4j/graph-data-science-client/blob/main/examples/graph-analytics-serverless.ipynb`
GDS algorithm reference	`https://neo4j.com/docs/graph-data-science/current/algorithms/`

Checklist

Aura API credentials created and set in environment (AURA_CLIENT_ID, AURA_CLIENT_SECRET)
AGA feature enabled for Aura project (Aura Console → project settings)
Memory estimated before session creation (sessions.estimate(...))
Cloud location chosen near data source
gds.verify_connectivity() called after session creation
TTL set to avoid unexpected costs on idle sessions
Async algorithm jobs polled until RUNNING_DONE before reading results
Results written back (connected modes) or streamed and persisted (standalone) before deletion
Session deleted when done (sessions.delete(...) or gds.delete())