MasterData v2 Integration

When this skill applies

Use this skill when your VTEX IO app needs to store custom data (reviews, wishlists, form submissions, configuration records), query or filter that data, or set up automated workflows triggered by data changes—and when you must justify Master Data versus other VTEX or external stores.

Defining data entities and JSON Schemas using the masterdata builder
Performing CRUD operations through MasterDataClient (ctx.clients.masterdata)
Configuring search, scroll, and indexing for efficient data retrieval
Setting up Master Data triggers for automated workflows
Managing schema lifecycle to avoid the 60-schema limit
Deciding whether data belongs in Catalog (fields, specifications, unstructured SKU/product specs), Master Data, native OMS/checkout surfaces, or an external SQL/NoSQL/database
Avoiding synchronous Master Data on the purchase critical path (cart, checkout, payment, placement) unless there is a hard performance and reliability case
Preferring one source of truth—avoid duplicating order headers or OMS lists in Master Data for convenience; prefer OMS or a BFF with caching (see Related skills)

Do not use this skill for:

General backend service patterns (use vtex-io-service-apps instead)
GraphQL schema definitions (use vtex-io-graphql-api instead)
Manifest and builder configuration (use vtex-io-app-structure instead)

Decision rules

Before you choose Master Data

Architects, developers, and anyone designing or implementing a solution should think deeply and treat this section as a checklist to critique the default: double-check that Master Data is the right persistence layer—not an automatic pick. The skill is written to question convenience-driven choices.

Purpose — Master Data is a document-oriented store (similar in spirit to document DBs / DynamoDB-style access patterns). It is one option among many; choosing it because it is “there” or “cheap” without a workload fit review is a design smell.
Product-bound data — If the information is fundamentally about products or SKUs, evaluate Catalog first: specifications, unstructured product/SKU specifications, and native catalog fields before creating a parallel MD entity that mirrors catalog truth.
Purchase path — Do not place synchronous Master Data reads/writes in the hot path of checkout (cart mutation, payment, order placement) unless you have evidence (latency budget, failure modes). Prefer native commerce stores and async or after-order enrichment.
Orders and lists — Duplicating OMS or order data into Master Data to power My Orders or similar usually fights single source of truth. Prefer OMS APIs (or marketplace protocols) behind a BFF or IO layer with application caching and correct HTTP/path semantics—not a second order database in MD “because it is easier.”
Exposing MD — Master Data is storage only. Any storefront or partner access should go through a service that enforces authentication, authorization, and rate limits—typically VTEX IO or an external BFF following headless-bff-architecture patterns.
When MD fits — After storage fit review, if MD remains appropriate, implement CRUD and schema discipline as below; combine with vtex-io-application-performance and vtex-io-service-paths-and-cdn when exposing HTTP or GraphQL from IO.

Entity governance and hygiene

Before creating a new entity or extending an existing one, understand the landscape:

Native entities — The platform manages entities like CL (clients), AD (addresses), OD (orders), BK (bookmarks), AU (auth), and others. Never create custom entities that duplicate native entity purposes. Know which entities exist before adding new ones.
Entity usage audit — In accounts with dozens of custom entities, classify each by purpose: logs/monitoring, cache/temporary, order extension, customer extension, marketing, CMS/content, integration/sync, auth/identity, logistics/geo, or custom business logic. Entities in the logs or cache categories often indicate misuse—IO app logs belong in the logger, not MD; caches belong in VBase, not MD.
Critical path flag — Identify whether an entity is used in checkout, cart, payment, or login flows. Entities on the critical path must meet strict latency and availability requirements. If an MD entity is on the critical path, question whether it should be there at all.
Document count awareness — Use REST-Content-Range headers from GET /search?_fields=id with REST-Range: resources=0-0 to efficiently count documents without fetching them. Large entities (100k+ docs) need scrollDocuments, pagination strategy, and potentially a BFF caching layer.

Bulk operations and data migration

When importing, exporting, or migrating large datasets:

Validate before import — Cross-reference import data against the authoritative source (e.g. catalog export for SKU validation, CL entity or user management API for email allowlists). Produce exception reports for invalid rows before touching MD.
JSONL payloads — Generate one JSON object per MD document in a .jsonl file for bulk imports. This enables resumable, line-by-line processing.
Rate limiting — MD APIs enforce rate limits. Use configurable delays between calls (e.g. 400ms) with exponential backoff on HTTP 429 responses.
Checkpoints — For large imports (10k+ documents), persist progress to a checkpoint file (last successful document ID or line index). On failure or timeout, resume from the checkpoint instead of restarting.
Parallel with bounded concurrency — Use a concurrency pool (e.g. p-queue with concurrency 5-10) for parallel POST or PATCH operations. Too much parallelism triggers rate limits; too little is slow.
Bulk delete before re-import — When replacing all documents in an entity, use scroll + delete before import, or implement a separate delete pass with the same checkpoint and backoff patterns.
Schema alignment — Ensure import payloads match the entity's JSON Schema exactly. Missing required fields or type mismatches cause silent validation failures.

Implementation rules

A data entity is a named collection of documents (analogous to a database table). A JSON Schema defines structure, validation, and indexing.
When using the masterdata builder, entities are defined by folder structure: masterdata/{entityName}/schema.json. The builder creates entities named {vendor}_{appName}_{entityName}.
Use ctx.clients.masterdata or masterDataFor from @vtex/clients for all CRUD operations — never direct REST calls.
All fields used in where clauses MUST be declared in the schema's v-indexed array for efficient querying.
Use searchDocuments for bounded result sets (known small size, max page size 100). Use scrollDocuments for large/unbounded result sets.
The masterdata builder creates a new schema per app version. Clean up unused schemas to avoid the 60-schema-per-entity hard limit.

MasterDataClient methods:

Method	Description
`getDocument`	Retrieve a single document by ID
`createDocument`	Create a new document, returns generated ID
`createOrUpdateEntireDocument`	Upsert a complete document
`createOrUpdatePartialDocument`	Upsert partial fields (patch)
`updateEntireDocument`	Replace all fields of an existing document
`updatePartialDocument`	Update specific fields only
`deleteDocument`	Delete a document by ID
`searchDocuments`	Search with filters, pagination, and field selection
`searchDocumentsWithPaginationInfo`	Search with total count metadata
`scrollDocuments`	Iterate over large result sets

Search where clause syntax:

where: "productId=12345 AND approved=true"
where: "rating>3"
where: "createdAt between 2025-01-01 AND 2025-12-31"

Architecture:

VTEX IO App (node builder)
  │
  ├── ctx.clients.masterdata.createDocument()
  │       │
  │       ▼
  │   Master Data v2 API
  │       │
  │       ├── Validates against JSON Schema
  │       ├── Indexes declared fields
  │       └── Fires triggers (if conditions match)
  │             │
  │             ▼
  │         HTTP webhook / Email / Action
  │
  └── ctx.clients.masterdata.searchDocuments()
          │
          ▼
      Master Data v2 (reads indexed fields for efficient queries)

Hard constraints

Constraint: Use MasterDataClient — Never Direct REST Calls

All Master Data operations in VTEX IO apps MUST go through the MasterDataClient (ctx.clients.masterdata) or the masterDataFor factory from @vtex/clients. You MUST NOT make direct REST calls to /api/dataentities/ endpoints.

Why this matters

The MasterDataClient handles authentication token injection, request routing, retry logic, caching, and proper error handling. Direct REST calls bypass all of these, requiring manual auth headers, pagination, and retry logic. When the VTEX auth token format changes, direct calls break while the client handles it transparently.

Detection

If you see direct HTTP calls to URLs matching /api/dataentities/, api.vtex.com/api/dataentities, or raw fetch/axios calls targeting Master Data endpoints, warn the developer to use ctx.clients.masterdata instead.

Correct

// Using MasterDataClient through ctx.clients
export async function getReview(ctx: Context, next: () => Promise<void>) {
  const { id } = ctx.query;

  const review = await ctx.clients.masterdata.getDocument<Review>({
    dataEntity: "reviews",
    id: id as string,
    fields: [
      "id",
      "productId",
      "author",
      "rating",
      "title",
      "text",
      "approved",
    ],
  });

  ctx.status = 200;
  ctx.body = review;
  await next();
}

Wrong

// Direct REST call to Master Data — bypasses client infrastructure
import axios from "axios";

export async function getReview(ctx: Context, next: () => Promise<void>) {
  const { id } = ctx.query;

  // No caching, no retry, no proper auth, no metrics
  const response = await axios.get(
    `https://api.vtex.com/api/dataentities/reviews/documents/${id}`,
    {
      headers: {
        "X-VTEX-API-AppKey": process.env.VTEX_APP_KEY,
        "X-VTEX-API-AppToken": process.env.VTEX_APP_TOKEN,
      },
    },
  );

  ctx.status = 200;
  ctx.body = response.data;
  await next();
}

Constraint: Define JSON Schemas for All Data Entities

Every data entity your app uses MUST have a corresponding JSON Schema, either via the masterdata builder (recommended) or created via the Master Data API before the app is deployed.

Why this matters

Without a schema, Master Data stores documents as unstructured JSON. This means no field validation, no indexing (making search extremely slow on large datasets), no type safety, and no trigger support. Queries on unindexed fields perform full scans, which can time out or hit rate limits.

Detection

If the app creates or searches documents in a data entity but no JSON Schema exists for that entity (either in the masterdata/ builder directory or via API), warn the developer to define a schema.

Correct

{
  "$schema": "http://json-schema.org/schema#",
  "title": "review-schema-v1",
  "type": "object",
  "properties": {
    "productId": {
      "type": "string"
    },
    "author": {
      "type": "string"
    },
    "rating": {
      "type": "integer",
      "minimum": 1,
      "maximum": 5
    },
    "title": {
      "type": "string",
      "maxLength": 200
    },
    "text": {
      "type": "string",
      "maxLength": 5000
    },
    "approved": {
      "type": "boolean"
    },
    "createdAt": {
      "type": "string",
      "format": "date-time"
    }
  },
  "required": ["productId", "rating", "title", "text"],
  "v-default-fields": [
    "productId",
    "author",
    "rating",
    "title",
    "approved",
    "createdAt"
  ],
  "v-indexed": ["productId", "author", "approved", "rating", "createdAt"]
}

Wrong

// Saving documents without any schema — no validation, no indexing
await ctx.clients.masterdata.createDocument({
  dataEntity: "reviews",
  fields: {
    productId: "12345",
    rating: "five", // String instead of number — no validation!
    title: 123, // Number instead of string — no validation!
  },
});

// Searching on unindexed fields — full table scan, will time out on large datasets
await ctx.clients.masterdata.searchDocuments({
  dataEntity: "reviews",
  where: "productId=12345", // productId is not indexed — very slow
  fields: ["id", "rating"],
  pagination: { page: 1, pageSize: 10 },
});

Constraint: Manage Schema Versions to Avoid the 60-Schema Limit

Master Data v2 data entities have a limit of 60 schemas per entity. When using the masterdata builder, each app version linked or installed creates a new schema. You MUST delete unused schemas regularly.

Why this matters

Once the 60-schema limit is reached, the masterdata builder cannot create new schemas, and linking or installing new app versions will fail. This is a hard platform limit that cannot be increased.

Detection

If the app has been through many link/install cycles, warn the developer to check and clean up old schemas using the Delete Schema API.

Correct

# Periodically clean up unused schemas
# List schemas for the entity
curl -X GET "https://{account}.vtexcommercestable.com.br/api/dataentities/reviews/schemas" \
  -H "X-VTEX-API-AppKey: {appKey}" \
  -H "X-VTEX-API-AppToken: {appToken}"

# Delete old schemas that are no longer in use
curl -X DELETE "https://{account}.vtexcommercestable.com.br/api/dataentities/reviews/schemas/old-schema-name" \
  -H "X-VTEX-API-AppKey: {appKey}" \
  -H "X-VTEX-API-AppToken: {appToken}"

Wrong

Never cleaning up schemas during development.
After 60 link cycles, the builder fails:
"Error: Maximum number of schemas reached for entity 'reviews'"
The app cannot be linked or installed until old schemas are deleted.

Constraint: Do not use Master Data as a log, cache, or temporary store

Entities used for application logging, caching (IO app state, query results), or temporary staging data do not belong in Master Data. Use ctx.vtex.logger for logs, VBase for app-specific caches and temp state, and external log aggregation for audit trails.

Why this matters — Log and cache entities accumulate millions of documents, hit rate limits, make the entity unusable for legitimate queries, and waste storage. MD is not designed for high-write, high-volume, disposable data.

Detection — Entities with names like LOG, cache, temp, staging, debug, or entities whose document count grows unboundedly with traffic volume rather than business events.

Correct

// Logs: use structured logger
ctx.vtex.logger.info({ action: 'priceUpdate', skuId, newPrice })

// Cache: use VBase
await ctx.clients.vbase.saveJSON('my-cache', cacheKey, data)

Wrong

// Using MD as a log store — creates millions of documents
await ctx.clients.masterdata.createDocument({
  dataEntity: 'appLogs',
  fields: { level: 'info', message: `Price updated for ${skuId}`, timestamp: new Date() },
})

Constraint: Do not create a parallel source of truth in Master Data without justification

Using Master Data to mirror data that already has a system of record in OMS, Catalog, or an external ERP—for example order headers for a custom list view, or SKU attributes that belong in catalog specifications—creates drift, reconciliation cost, and incident risk.

Why this matters

Two sources of truth disagree after partial failures, retries, or manual edits. Teams spend capacity syncing and debugging instead of customer outcomes.

Detection

New MD entities whose fields duplicate OMS order fields “for performance” without a BFF cache plan; product attributes stored in MD when Catalog specs would suffice; scheduled jobs to “fix” MD from OMS because they diverged.

Correct

1. Identify the authoritative system (OMS, Catalog, partner API).
2. Read from that source via BFF or IO, with caching (application + HTTP semantics) as needed.
3. Use MD only for data without a native home or after explicit architecture sign-off.

Wrong

"We store order snapshots in MD so the storefront is faster" while OMS remains canonical
and no reconciliation strategy exists — eventual inconsistency is guaranteed.

Preferred pattern

Add the masterdata builder and policies:

{
  "builders": {
    "node": "7.x",
    "graphql": "1.x",
    "masterdata": "1.x"
  },
  "policies": [
    {
      "name": "outbound-access",
      "attrs": {
        "host": "api.vtex.com",
        "path": "/api/*"
      }
    },
    {
      "name": "ADMIN_DS"
    }
  ]
}

Define data entity schemas:

{
  "$schema": "http://json-schema.org/schema#",
  "title": "review-schema-v1",
  "type": "object",
  "properties": {
    "productId": {
      "type": "string"
    },
    "author": {
      "type": "string"
    },
    "email": {
      "type": "string",
      "format": "email"
    },
    "rating": {
      "type": "integer",
      "minimum": 1,
      "maximum": 5
    },
    "title": {
      "type": "string",
      "maxLength": 200
    },
    "text": {
      "type": "string",
      "maxLength": 5000
    },
    "approved": {
      "type": "boolean"
    },
    "createdAt": {
      "type": "string",
      "format": "date-time"
    }
  },
  "required": ["productId", "rating", "title", "text"],
  "v-default-fields": [
    "productId",
    "author",
    "rating",
    "title",
    "approved",
    "createdAt"
  ],
  "v-indexed": ["productId", "author", "approved", "rating", "createdAt"],
  "v-cache": false
}

Set up the client with masterDataFor:

// node/clients/index.ts
import { IOClients } from "@vtex/api";
import { masterDataFor } from "@vtex/clients";

interface Review {
  id: string;
  productId: string;
  author: string;
  email: string;
  rating: number;
  title: string;
  text: string;
  approved: boolean;
  createdAt: string;
}

export class Clients extends IOClients {
  public get reviews() {
    return this.getOrSet("reviews", masterDataFor<Review>("reviews"));
  }
}

Implement CRUD operations:

// node/resolvers/reviews.ts
import type { ServiceContext } from "@vtex/api";
import type { Clients } from "../clients";

type Context = ServiceContext<Clients>;

export const queries = {
  reviews: async (
    _root: unknown,
    args: { productId: string; page?: number; pageSize?: number },
    ctx: Context,
  ) => {
    const { productId, page = 1, pageSize = 10 } = args;

    const results = await ctx.clients.reviews.search(
      { page, pageSize },
      [
        "id",
        "productId",
        "author",
        "rating",
        "title",
        "text",
        "createdAt",
        "approved",
      ],
      "", // sort
      `productId=${productId} AND approved=true`,
    );

    return results;
  },
};

export const mutations = {
  createReview: async (
    _root: unknown,
    args: {
      input: { productId: string; rating: number; title: string; text: string };
    },
    ctx: Context,
  ) => {
    const { input } = args;
    const email = ctx.vtex.storeUserEmail ?? "anonymous@store.com";

    const response = await ctx.clients.reviews.save({
      ...input,
      author: email.split("@")[0],
      email,
      approved: false,
      createdAt: new Date().toISOString(),
    });

    return ctx.clients.reviews.get(response.DocumentId, [
      "id",
      "productId",
      "author",
      "rating",
      "title",
      "text",
      "createdAt",
      "approved",
    ]);
  },

  deleteReview: async (_root: unknown, args: { id: string }, ctx: Context) => {
    await ctx.clients.reviews.delete(args.id);
    return true;
  },
};

Configure triggers (optional):

{
  "name": "notify-moderator-on-new-review",
  "active": true,
  "condition": "approved=false",
  "action": {
    "type": "email",
    "provider": "default",
    "subject": "New review pending moderation",
    "to": ["moderator@mystore.com"],
    "body": "A new review has been submitted for product {{productId}} by {{author}}."
  },
  "retry": {
    "times": 3,
    "delay": { "addMinutes": 5 }
  }
}

Wire into Service:

// node/index.ts
import type { ParamsContext, RecorderState } from "@vtex/api";
import { Service } from "@vtex/api";

import { Clients } from "./clients";
import { queries, mutations } from "./resolvers/reviews";

export default new Service<Clients, RecorderState, ParamsContext>({
  clients: {
    implementation: Clients,
    options: {
      default: {
        retries: 2,
        timeout: 5000,
      },
    },
  },
  graphql: {
    resolvers: {
      Query: queries,
      Mutation: mutations,
    },
  },
});

Common failure modes

Direct REST calls to /api/dataentities/: Using axios or fetch to call Master Data endpoints bypasses the client infrastructure — no auth, no caching, no retries. Use ctx.clients.masterdata or masterDataFor instead.
Searching without indexed fields: Queries on non-indexed fields trigger full document scans. For large datasets, this causes timeouts and rate limit errors. Ensure all where clause fields are in the schema's v-indexed array.
Not paginating search results: Master Data v2 has a maximum page size of 100 documents. Requesting more silently returns only up to the limit. Use proper pagination or scrollDocuments for large result sets.
Ignoring the 60-schema limit: Each app version linked/installed creates a new schema. After 60 link cycles, the builder fails. Periodically clean up unused schemas via the Delete Schema API.
Using MD for logs or caches: Entities that grow with traffic volume instead of business events. Millions of log or cache documents degrade the account's MD performance.
Bulk import without rate limiting: Flooding MD with parallel writes triggers 429 errors and account-wide throttling. Always use bounded concurrency with backoff.
Import without validation: Importing data without cross-referencing the catalog or user store leads to orphaned documents, broken references, and data that fails schema validation silently.
No checkpoint in bulk operations: A 50k-document import that fails at document 30k must restart from zero without a checkpoint file.

Review checklist

Related skills

vtex-io-application-performance — IO performance patterns (cache layers, BFF-facing behavior)
vtex-io-service-paths-and-cdn — Public vs private routes for MD-backed APIs
vtex-io-session-apps — Session transforms that may read from or complement MD-stored state
architecture-well-architected-commerce — Cross-cutting storage and pillar alignment
headless-bff-architecture — BFF boundaries when MD is not accessed from IO

Reference

Creating a Master Data v2 CRUD App — Complete guide for building Master Data apps with the masterdata builder
Working with JSON Schemas in Master Data v2 — Schema structure, validation, and indexing configuration
Schema Lifecycle — How schemas evolve and impact data entities over time
Setting Up Triggers on Master Data v2 — Trigger configuration for automated workflows
Master Data v2 API Reference — Complete API reference for all Master Data v2 endpoints
Clients — MasterDataClient methods and usage in VTEX IO

vtex-io-masterdata

MasterData v2 Integration

When this skill applies

Decision rules

Before you choose Master Data

Entity governance and hygiene

Bulk operations and data migration

Implementation rules

Hard constraints

Constraint: Use MasterDataClient — Never Direct REST Calls

Constraint: Define JSON Schemas for All Data Entities

Constraint: Manage Schema Versions to Avoid the 60-Schema Limit

Constraint: Do not use Master Data as a log, cache, or temporary store

Constraint: Do not create a parallel source of truth in Master Data without justification

Preferred pattern

Common failure modes

Review checklist

Related skills

Reference