Keboola Configuration
Keboola Configuration Knowledge
Provide expertise on Keboola project structure, configuration formats, and best practices for managing data pipelines.
Project Structure
A Keboola project pulled locally has this structure:
project-root/
├── .keboola/
│ └── manifest.json # Project metadata and branch info
├── .env.local # API token (never commit)
├── .env.dist # Template for .env.local
├── .gitignore
└── [branch-name]/ # One directory per branch
└── [component-id]/ # e.g., keboola.snowflake-transformation
└── [config-name]/ # Configuration directory
├── config.json # Main configuration
├── meta.json # Metadata (name, description)
└── rows/ # Configuration rows (if applicable)
Configuration Files
manifest.json
Located in .keboola/manifest.json, contains:
- Project ID and API host
- Branch information
- Sorting and naming conventions
config.json
The main configuration file for each component. Structure varies by component type but typically includes:
parameters- Component-specific settingsstorage- Input/output table mappingsprocessors- Pre/post processing steps
meta.json
Metadata about the configuration:
{
"name": "Configuration Name",
"description": "What this configuration does",
"isDisabled": false
}
Component Types
Transformations
SQL or Python/R transformations for data processing.
Snowflake Transformation (keboola.snowflake-transformation):
{
"parameters": {
"blocks": [
{
"name": "Block Name",
"codes": [
{
"name": "Script Name",
"script": ["SELECT * FROM table"]
}
]
}
]
},
"storage": {
"input": {
"tables": [
{
"source": "in.c-bucket.table",
"destination": "table"
}
]
},
"output": {
"tables": [
{
"source": "output_table",
"destination": "out.c-bucket.result"
}
]
}
}
}
Extractors
Components that pull data from external sources (databases, APIs, files).
Common extractors:
keboola.ex-db-snowflake- Snowflake extractorkeboola.ex-google-analytics-v4- Google Analyticskeboola.ex-generic-v2- Generic HTTP API extractor
Writers
Components that push data to external destinations.
Common writers:
keboola.wr-db-snowflake- Snowflake writerkeboola.wr-google-sheets- Google Sheets writer
Orchestrations
Workflow definitions that run multiple configurations in sequence.
Located in keboola.orchestrator/ with:
- Task definitions
- Dependencies
- Scheduling
Best Practices
When Editing Configurations
- Always run
kbc diffbefore and after changes - Validate JSON syntax before pushing
- Use
kbc validateto check configuration validity - Keep descriptions updated in meta.json
Storage Mappings
- Input tables: Map source tables to working names
- Output tables: Map result tables to destination buckets
- Use consistent naming conventions
Transformations
- Break complex logic into multiple blocks
- Use meaningful names for blocks and scripts
- Document SQL with comments
- Test locally when possible
Common Tasks
Add a New Input Table to Transformation
In config.json, add to storage.input.tables:
{
"source": "in.c-bucket.new_table",
"destination": "new_table",
"columns": [] // Empty = all columns
}
Add Output Table
In config.json, add to storage.output.tables:
{
"source": "result_table",
"destination": "out.c-bucket.result",
"primary_key": ["id"]
}
Modify SQL Script
Edit the script array in the relevant block/code section. Each array element is a SQL statement.
Troubleshooting
Invalid Configuration
- Check JSON syntax (missing commas, brackets)
- Verify table names exist in storage
- Check column names in mappings
Push Conflicts
- Pull latest changes first
- Merge conflicts manually in config files
- Push again after resolution
Missing Tables
- Ensure input tables exist in Keboola Storage
- Check bucket permissions
- Verify table names match exactly (case-sensitive)