What problem does sfcc-sandbox-reducer solve?

It reduces very large SFCC export files into sandbox-sized datasets while preserving catalog relationships, prices, and inventory consistency.

What is the most important step before importing reduced files?

Clean the target sandbox first. If you skip cleanup, SFCC merges old and new data and leaves stale records in place.

Taming giant SFCC catalogs with sfcc-sandbox-reducer

If you've ever worked on a Salesforce Commerce Cloud project, you've probably hit this wall: you export the production catalog to set up a dev sandbox, and you're staring at a 500 MB (sometimes 1 GB+) XML file. Loading that into a sandbox takes forever, often breaks things, and makes working locally a misery.

I built sfcc-sandbox-reducer to fix that.

The problem

SFCC stores its product catalog, pricebooks, and inventory lists as XML. Production catalogs for mid-to-large retailers routinely hit hundreds of megabytes. Sandboxes, especially developer ones, aren't meant to ingest that volume. The result is slow imports, instability, and thousands of discontinued or unpriceable products cluttering your dev data when you didn't need them in the first place.

What you actually want is a representative slice of real data: a few thousand products with prices and inventory, actually visible on the storefront.

Recommended sandbox refresh workflow

Use this order every time you refresh a sandbox:

Start with a full export from a PIG instance such as development, staging, or production.
Run sfcc-sandbox-reducer on that export to generate reduced catalogs, pricebooks, and inventory files.
Before importing reduced files, clean the target sandbox and remove existing product catalogs, pricebooks, and inventory lists.
Import reduced files into the clean sandbox.

If you skip the cleanup step, SFCC merges data. Old records stay in place, and your sandbox keeps stale or oversized catalog data.

What the tool does

sfcc-sandbox-reducer is a Node.js CLI with two main commands.

sfcc-reduce

This is the core command. It takes your full SFCC XML exports and produces trimmed-down versions.

It parses catalog files as a stream, never loading the full file into memory. A 500 MB catalog stays at a flat memory footprint regardless of size.

The tricky part is SFCC's product hierarchy: master products contain variation groups, which contain variants. You can't just drop arbitrary records without orphaning others. The reducer builds the full dependency graph first, then removes groups that don't meet the criteria rather than picking individual products.

A product group is kept if at least one variation group appears in a navigation catalog, at least one variant has both a price record and an inventory record, and the products are marked online for the sites you've specified.

If you set max_products, the tool doesn't just take the first N. It reduces proportionally across categories so the sandbox still has a realistic spread.

The analysis pass is cached, so subsequent runs skip re-parsing entirely. That matters a lot when you're tweaking filter config and running repeatedly.

A typical run on a 1.25 GB catalog set:

[PHASE 1] Collecting product IDs...
  -> 50000 online products (masters + variants)
  -> 2000 product groups kept, 500 removed

[PHASE 2] Reducing files...
  Master catalogs: 5000 kept, 45000 removed

Size Reduction:
  Original size:  1.25 GB
  Reduced size:   625.00 MB
  Saved:          625.00 MB (50.0%)

[DONE] Total processing time: 58.01s

sfcc-download-images

Once you have reduced catalogs, you usually also need matching product images. This command connects to your sandbox WebDAV, lets you browse the navigation catalog interactively, pick a category, and downloads all images for products in that category hierarchy.

Downloads run in parallel with configurable concurrency and skip already-downloaded files, so interrupted runs can be resumed.

Setting up in your project

Quick setup flow:

Install the CLI globally:

npm install -g sfcc-sandbox-reducer

If you want a project-pinned version instead, install locally and add scripts:

npm install --save-dev sfcc-sandbox-reducer

{
  "scripts": {
    "reduce": "sfcc-sandbox-reducer reduce",
    "download-images": "sfcc-sandbox-reducer download-images"
  }
}

Run the init wizard in your project root:

sfcc-sandbox-reducer init

This creates reducer-config.json and dw.json. Add both to .gitignore.

Fill dw.json with your WebDAV credentials:

{
  "username": "your-webdav-username",
  "password": "your-webdav-password"
}

Update reducer-config.json.

input.* accepts glob patterns. For example, ./input/catalogs/BRAND_*_navigation/catalog.xml matches all navigation catalogs.
Set sites_to_check to the site IDs you test in sandbox.
Set max_products to cap output size. For most teams, 5,000 to 10,000 products is enough for development.
Use always_keep and always_remove for product IDs tied to tests, homepage slots, or other fixed scenarios.

Put SFCC exports from your PIG export under input/:

input/
  catalogs/
    BRAND_master/
      catalog.xml
    BRAND_FR_navigation/
      catalog.xml
  pricebooks/
    pricebook-fr.xml
  inventory-lists/
    inventory-fr.xml

Run reducer:

sfcc-sandbox-reducer reduce

First pass can take a few minutes on large datasets. Analysis is cached in .reducer-cache/, so next runs are faster. Output files go to output/.

In Business Manager, clean the target sandbox before import.

Remove existing product catalogs, pricebooks, and inventory lists first, then import reduced files from output/.

If you import without cleanup, SFCC merges with old data and the sandbox still contains stale or oversized catalogs.

Use --dry-run if you want to inspect what would be kept before writing files.

For full command usage, all options, and complete documentation, see the npm package page: sfcc-sandbox-reducer.

Why I built this

On a project with a 1.2 GB master catalog, importing the full export into a sandbox took over an hour and frequently timed out. We needed a repeatable way to generate a slim but realistic dataset.

Three things made it non-trivial. Files this size break DOM parsers outright, so the tool uses SAX-style streaming to process records one at a time. You also can't filter products in isolation: building the master/variant/variation-group graph upfront is what prevents corrupt output. And a product kept in the master catalog needs to be kept (or removed) from pricebooks and inventory lists too, so all files are reduced in one coordinated pass.

Source and license

Code is on GitHub under AGPL-3.0.

Full usage docs and latest README live on npm: sfcc-sandbox-reducer.

npx sfcc-sandbox-reducer --help

Taming giant SFCC catalogs with sfcc-sandbox-reducer

The problem

Recommended sandbox refresh workflow

What the tool does

sfcc-reduce

sfcc-download-images

Setting up in your project

Why I built this

Source and license

FAQ

Author

The problem

Recommended sandbox refresh workflow

What the tool does

sfcc-reduce

sfcc-download-images

Setting up in your project

Why I built this

Source and license

FAQ

Related Articles

Author