recipe-patterns
SKILL.md
Dataiku Recipe Patterns
Reference patterns for creating different recipe types via the Python API.
Before Writing Code
MANDATORY: Read the relevant reference file before writing any recipe code.
- GREL formulas → read references/grel-functions.md first
- Prepare steps → read references/processors.md first
- Joins → read references/join-recipe.md first
- Grouping → read references/group-recipe.md first
- Python recipes → read references/python-recipe.md first
- Sync recipes → read references/sync-recipe.md first
- Date handling → read references/date-operations.md first
- Pitfalls index → references/pitfalls.md (recipe-type reference files also have a Pitfalls section at the top)
Do NOT rely on general knowledge for GREL functions or API methods. Dataiku GREL differs from OpenRefine GREL and other variants. Always verify function names against the reference.
Recipe Type Decision Table
| Recipe Type | Use When | Key Method |
|---|---|---|
| Prepare | Column transforms, filtering, formula columns, renaming, data cleaning | project.new_recipe("prepare", ...) |
| Join | Combining datasets on key columns (LEFT, INNER, RIGHT, OUTER) | project.new_recipe("join", ...) |
| Group | Aggregations: sum, count, avg, min, max, stddev, etc. | project.new_recipe("grouping", ...) |
| Sync | Copying data between connections (e.g., to a data warehouse) | project.new_recipe("sync", ...) |
| Python | Custom transformations not possible with visual recipes | project.new_recipe("python", ...) |
Universal Builder Pattern
Every recipe follows the same create-configure-run lifecycle:
# 1. Create via builder
builder = project.new_recipe("<type>", "<recipe_name>")
builder.with_input("<input_dataset>")
builder.with_new_output("<output_dataset>", "<connection>") # creates output dataset
recipe = builder.create()
# 2. Configure settings
settings = recipe.get_settings()
# ... recipe-specific configuration ...
settings.save()
# 3. Apply schema updates
schema_updates = recipe.compute_schema_updates()
if schema_updates.any_action_required():
schema_updates.apply()
# 4. Run and check
job = recipe.run(no_fail=True)
state = job.get_status()["baseStatus"]["state"] # "DONE" or "FAILED"
After Running Any Recipe
Always sample the output and verify the result before reporting success. Silent data issues (wrong values, all nulls, unexpected types) are common.
from helpers.export import sample
rows = sample(client, "PROJECT_KEY", "output_dataset", 5)
for r in rows:
print(r)
Always Remember
- Call
settings.save()after configuration changes - Call
compute_schema_updates().apply()for visual recipes - Call
recipe.run(no_fail=True)to execute (already waits for completion) - Check
job.get_status()["baseStatus"]["state"]for"DONE"or"FAILED" - Sample and verify the output data before reporting success
Tested Patterns
Copy-paste patterns that have been validated against a live Dataiku instance:
- patterns/bin-numeric-column.py — Bin a string numeric column into ranges
- patterns/calculated-columns.py — Common GREL formula patterns
- patterns/filter-and-clean.py — Data cleaning pipeline
Detailed References
Recipe types:
- references/prepare-recipe.md — Prepare recipe builder,
add_processor_step()API - references/join-recipe.md — Join configuration, multi-table joins, column selection
- references/group-recipe.md — Aggregation flags, output naming, type compatibility
- references/sync-recipe.md — Sync recipe pattern
- references/python-recipe.md — Python recipe with
set_code
Data preparation:
- references/processors.md — All processor types with parameters and complete example
- references/grel-functions.md — Full GREL function table and formula syntax
- references/date-operations.md — DateParser, DateFormatter, datePart examples
Troubleshooting:
- references/pitfalls.md — Index of all pitfalls (details are inline in each reference file)
Weekly Installs
4
Repository
jediv/dataiku-c…-controlGitHub Stars
6
First Seen
14 days ago
Security Audits
Installed on
gemini-cli4
github-copilot4
codex4
kimi-cli4
cursor4
amp4