recipe-patterns

SKILL.md

Dataiku Recipe Patterns

Reference patterns for creating different recipe types via the Python API.

Before Writing Code

MANDATORY: Read the relevant reference file before writing any recipe code.

Do NOT rely on general knowledge for GREL functions or API methods. Dataiku GREL differs from OpenRefine GREL and other variants. Always verify function names against the reference.

Recipe Type Decision Table

Recipe Type Use When Key Method
Prepare Column transforms, filtering, formula columns, renaming, data cleaning project.new_recipe("prepare", ...)
Join Combining datasets on key columns (LEFT, INNER, RIGHT, OUTER) project.new_recipe("join", ...)
Group Aggregations: sum, count, avg, min, max, stddev, etc. project.new_recipe("grouping", ...)
Sync Copying data between connections (e.g., to a data warehouse) project.new_recipe("sync", ...)
Python Custom transformations not possible with visual recipes project.new_recipe("python", ...)

Universal Builder Pattern

Every recipe follows the same create-configure-run lifecycle:

# 1. Create via builder
builder = project.new_recipe("<type>", "<recipe_name>")
builder.with_input("<input_dataset>")
builder.with_new_output("<output_dataset>", "<connection>")  # creates output dataset
recipe = builder.create()

# 2. Configure settings
settings = recipe.get_settings()
# ... recipe-specific configuration ...
settings.save()

# 3. Apply schema updates
schema_updates = recipe.compute_schema_updates()
if schema_updates.any_action_required():
    schema_updates.apply()

# 4. Run and check
job = recipe.run(no_fail=True)
state = job.get_status()["baseStatus"]["state"]  # "DONE" or "FAILED"

After Running Any Recipe

Always sample the output and verify the result before reporting success. Silent data issues (wrong values, all nulls, unexpected types) are common.

from helpers.export import sample
rows = sample(client, "PROJECT_KEY", "output_dataset", 5)
for r in rows:
    print(r)

Always Remember

  1. Call settings.save() after configuration changes
  2. Call compute_schema_updates().apply() for visual recipes
  3. Call recipe.run(no_fail=True) to execute (already waits for completion)
  4. Check job.get_status()["baseStatus"]["state"] for "DONE" or "FAILED"
  5. Sample and verify the output data before reporting success

Tested Patterns

Copy-paste patterns that have been validated against a live Dataiku instance:

Detailed References

Recipe types:

Data preparation:

Troubleshooting:

Weekly Installs
4
GitHub Stars
6
First Seen
14 days ago
Installed on
gemini-cli4
github-copilot4
codex4
kimi-cli4
cursor4
amp4