tuning-incremental-sync-config
Tuning incremental sync config
A sync's configuration lives on the ExternalDataSchema and can be changed any time via
external-data-schemas-partial-update. Most changes are non-destructive (take effect on the next sync), but a few
(switching sync_type, changing primary keys) require careful handling to avoid corrupting the synced data.
When to use this skill
- The user wants to change how an already-connected table is synced
- A diagnosis flagged the incremental field or primary key as wrong
- The table is syncing too often / not often enough
- Switching an incremental table to CDC (or vice versa)
- The source table was changed on the other side (new columns, dropped columns) and the sync config needs to catch up
If the user is setting up a brand-new source, use setting-up-a-data-warehouse-source instead — configuration is
chosen at creation time there.
Available tools
| Tool | Purpose |
|---|---|
external-data-schemas-retrieve |
Current sync_type, incremental_field, PKs, sync_frequency |
external-data-schemas-incremental-fields-create |
Refresh candidate incremental fields from the live source |
external-data-schemas-partial-update |
Apply the config change |
external-data-schemas-reload |
Trigger a sync with the new config |
external-data-schemas-resync |
Wipe and re-import from scratch when the change invalidates existing data |
external-data-schemas-delete-data |
Drop the synced table while keeping the schema entry |
external-data-sources-check-cdc-prerequisites-create |
Pre-flight Postgres CDC (only when switching to/from CDC) |
external-data-sources-webhook-info-retrieve |
Current webhook state (when switching to/from sync_type=webhook) |
external-data-sources-create-webhook-create |
Register a webhook after switching a schema to sync_type=webhook |
external-data-sources-update-webhook-inputs-create |
Rotate a webhook signing secret |
external-data-sources-delete-webhook-create |
Unregister webhook when switching schemas off sync_type=webhook |
The fields you can tune
From the partial-update endpoint:
| Field | Values | Notes |
|---|---|---|
sync_type |
full_refresh, incremental, append, cdc, webhook |
Source must support the target type — check via incremental-fields |
incremental_field |
Column name from the source | Must appear in incremental_fields list for the schema |
incremental_field_type |
datetime, date, timestamp, integer, numeric, objectid |
Must match the column's real type |
primary_key_columns |
Array of column names | Required for CDC. Used for upsert dedup on incremental |
cdc_table_mode |
consolidated, cdc_only, both |
Only meaningful when sync_type=cdc |
sync_frequency |
1min, 5min, 15min, 30min, 1hour, 6hour, 12hour, 24hour, 7day, 30day, never |
Applies to all non-CDC types |
sync_time_of_day |
HH:MM:SS |
When sync_frequency is daily/weekly-scale |
should_sync |
true / false |
Pause the schema without deleting it |
Workflow
Step 1 — Read the current config
Always start with external-data-schemas-retrieve({id}). Understanding the current state prevents mistakes like
"fixing" an incremental_field that's actually correct.
Note:
- Current
sync_type,incremental_field,incremental_field_type,primary_key_columns - Current
status(don't tune a schema that's currentlyRunning— wait or cancel first) last_synced_at(so you can tell if the next sync worked)latest_errorif present (the error often tells you exactly what to change)
Step 2 — If changing sync_type or incremental_field, refresh candidates
Call external-data-schemas-incremental-fields-create({id}). Even though the operation name says "create", it
re-reads the source and returns the current candidate fields — use it to confirm the field you want to set actually
exists on the source and which sync types are now available for this table.
The response:
{
"incremental_fields": [{"field": "updated_at", "type": "datetime", ...}, ...],
"incremental_available": true,
"append_available": true,
"cdc_available": true,
"full_refresh_available": true,
"detected_primary_keys": ["id"],
"available_columns": [...]
}
If your target incremental_field isn't in the list, tell the user — they need to either pick a different field or
change the source table to add one.
Step 3 — Apply the change
Call external-data-schemas-partial-update({id}, {...changed fields}).
Only send the fields that are actually changing. Partial update means unspecified fields stay as they are.
Examples:
// Switch from full_refresh to incremental
{
"sync_type": "incremental",
"incremental_field": "updated_at",
"incremental_field_type": "datetime"
}
// Change sync frequency to hourly
{"sync_frequency": "1hour"}
// Fix wrong PK on a CDC table
{"primary_key_columns": ["tenant_id", "order_id"]}
// Pause a schema
{"should_sync": false}
Step 4 — Decide whether existing data is still valid
This is the step that's easy to get wrong. Some config changes invalidate the synced data; others don't.
Changes that DON'T invalidate existing data:
sync_frequency,sync_time_of_day— scheduling onlyshould_sync— on/offcdc_table_modein most cases — next sync will start writing to the new shape, but historical consolidated rows stay valid- Switching between
incrementalandfull_refreshwith the sameincremental_field— next sync just re-runs fresh - Switching to or from
sync_type: "webhook"— the synced data stays valid; only the ingestion path changes. Remember to register or unregister the webhook (see sections below) alongside the sync_type change.
Changes that MAY invalidate existing data and need a resync:
- Changing
incremental_fieldto a different column — the high-water mark is from the old column and won't match. Without a resync you'll miss rows that were updated between the two fields' histories. - Changing
primary_key_columns— existing rows may be deduplicated incorrectly against new PK definitions. - Switching from
full_refreshtoappend— the existing rows don't have the version-history shape that append expects. - Switching from
appendtofull_refresh— opposite problem; you'll end up with duplicate historical versions. - Switching to/from
cdc— the table shape changes fundamentally.
When the change invalidates data, the clean flow is:
external-data-schemas-partial-updatewith the new config- Warn the user this is destructive
external-data-schemas-resyncto wipe and re-import under the new config
Or equivalently, external-data-schemas-delete-data → external-data-schemas-reload. delete-data + reload is
cleaner when the table is large and the user wants to start from zero.
Step 5 — Trigger and confirm
For non-destructive changes, call external-data-schemas-reload({id}) to pick up the new config immediately rather
than waiting for the schedule.
Wait a moment, then external-data-schemas-retrieve({id}) to confirm status = Running then Completed. Report
last_synced_at and any new latest_error.
Specific common changes
Switching full_refresh → incremental
incremental-fields-createto confirm the desired field exists andincremental_available: true.partial-update:{sync_type: "incremental", incremental_field, incremental_field_type}.- No data wipe needed — next sync just switches strategy. If the source is growing fast, the next incremental sync is the cheap one.
Switching incremental → cdc (Postgres only)
- Run
external-data-sources-check-cdc-prerequisites-createon the parent source. Only proceed ifvalid: true. incremental-fields-createto confirmcdc_available: trueand seedetected_primary_keys.partial-update:{sync_type: "cdc", primary_key_columns: [...], cdc_table_mode: "consolidated"}.- Resync required — CDC tables have a different shape. Trigger
external-data-schemas-resyncafter the update. Warn the user this wipes existing data.
Fixing a stale incremental field after schema drift
Source dropped the updated_at column. Sync has been failing with "column does not exist".
incremental-fields-createto see what fields remain.- Pick a replacement (or switch to
full_refreshif none are suitable). partial-updatewith the new field + type (or new sync_type).reloadto retry.
Changing primary keys on a CDC table
partial-update:{primary_key_columns: [...]}.- Resync required — existing CDC tombstones and upsert keys won't match the new PK definition, leading to row duplication or missed updates.
resync, warn the user.
Changing sync_frequency
partial-update:{sync_frequency: "1hour"}.- No reload needed — the next scheduled sync picks up the new cadence. Or reload manually if the user wants to confirm nothing broke.
Switching a schema to sync_type: "webhook"
Only works for sources that implement WebhookSource (today: Stripe) and tables where supports_webhooks: true
from incremental-fields-create.
incremental-fields-createto confirmsupports_webhooks: truefor the table.partial-update:{sync_type: "webhook"}.- If the source doesn't already have a webhook registered (check with
webhook-info-retrieve), callexternal-data-sources-create-webhook-create({source_id})to register it. - No resync required — the schema's existing bulk-synced data stays, and the webhook becomes the primary ingestion path once the next reconciliation finishes.
- Keep
sync_frequencyset (e.g.24hour) — it acts as a safety-net reconciliation in case any webhook delivery is missed.
Switching off sync_type: "webhook"
partial-update:{sync_type: "incremental"}(or whatever bulk type is appropriate) with the requiredincremental_field+incremental_field_type.- If no other schemas on the source are still using
sync_type: "webhook", callexternal-data-sources-delete-webhook-create({source_id})to unregister. Leaving an orphaned webhook registered on the source side just means events will be received and dropped — not harmful, but messy. - If other schemas on the source are still on webhook, leave the webhook registered — it's shared across all webhook-type schemas on the source.
Rotating a webhook signing secret
The source's signing secret (e.g. Stripe's whsec_...) was rotated, and payloads are now failing signature
verification.
- Grab the new secret from the source's dashboard.
external-data-sources-update-webhook-inputs-create({source_id}, {inputs: {signing_secret: "whsec_..."}}).- No reload needed — the next inbound webhook payload will verify against the new secret.
Pausing a schema
partial-update:{should_sync: false}. Schema stops syncing but stays configured.- To resume later:
partial-update:{should_sync: true}, thenreloadfor an immediate run.
Important notes
- Read before you write. Always retrieve the current config first.
partial-updatedoesn't complain if you set a field to the value it already had, but you might be about to change something you didn't realize was already set. - Not every sync_type is available on every schema. The
incremental-fields-createresponse tells you what's available right now, which can be different from what was available at creation (e.g. CDC may have been enabled for the team since). - Wipe when the shape changes. Switching sync strategy often changes the physical table. If you don't resync, you'll be mixing row shapes and queries will return garbage.
- CDC needs prerequisites. Never switch to
sync_type: "cdc"without runningcheck-cdc-prerequisites-createfirst. The sync will just fail immediately. - Don't touch a Running schema. If the schema is currently running, either wait for it to finish or
external-data-schemas-cancelbefore applying the change. Updating config mid-sync can leave the incremental high-water mark inconsistent. - Sync frequency is cheap to change. Encourage experimentation there. Sync_type and incremental_field are expensive to change — encourage care.
- Webhooks are registered at the source level, not the schema level. Multiple webhook-type schemas on the same source share one webhook registration. Only delete the webhook when the last webhook-type schema on that source is being switched away, otherwise other schemas stop receiving pushes.