html2pptx Unicode Path Fix

Problem

When the html2pptx pipeline processes HTML files located in directories with non-ASCII characters (e.g., 纯日语PPT/, 日本語資料/), PptxGenJS fails with ENOENT errors because the browser URL-encodes the file:// paths and PptxGenJS tries to open the encoded path literally on disk.

Context / Trigger Conditions

Error message: ENOENT: no such file or directory with URL-encoded path like %E7%BA%AF%E6%97%A5%E8%AF%ADPPT
Using html2pptx.js with background-image URLs or <img> tags
HTML files or assets are in directories with CJK, accented, or other non-ASCII characters
The build succeeds for all text/shape extraction but fails at pptx.writeFile() when PptxGenJS tries to read referenced image files

Solution

Copy all build inputs (HTML slides + assets) to a temporary directory with ASCII-only paths before running the build:

const fs = require('fs');
const path = require('path');

const TMP_BUILD_DIR = '/tmp/my-build';
const TMP_SLIDES_DIR = path.join(TMP_BUILD_DIR, 'slides');
const TMP_ASSETS_DIR = path.join(TMP_BUILD_DIR, 'assets');

// 1. Create temp directories
fs.mkdirSync(TMP_SLIDES_DIR, { recursive: true });
fs.mkdirSync(TMP_ASSETS_DIR, { recursive: true });

// 2. Copy assets to ASCII-path location
fs.copyFileSync(originalAssetPath, path.join(TMP_ASSETS_DIR, 'image.png'));

// 3. Copy HTML files, rewriting relative paths to absolute ASCII paths
for (const slideFile of slideFiles) {
  let html = fs.readFileSync(slideFile, 'utf8');
  html = html.replace(/\.\.\/assets\//g, `${TMP_ASSETS_DIR}/`);
  html = html.replace(/\.\.\/\.\.\/Logo\.png/g, `${TMP_ASSETS_DIR}/Logo.png`);
  fs.writeFileSync(path.join(TMP_SLIDES_DIR, path.basename(slideFile)), html, 'utf8');
}

// 4. Run html2pptx on the temp copies
const result = await html2pptx(path.join(TMP_SLIDES_DIR, 'slide01.html'), pptx);

// 5. Save output to temp, then copy to final Unicode destination
const tmpOutput = path.join(TMP_BUILD_DIR, 'output.pptx');
await pptx.writeFile({ fileName: tmpOutput });
fs.copyFileSync(tmpOutput, finalUnicodePath);

Key points:

The OUTPUT path also needs to be ASCII if PptxGenJS writes there
Only image/background paths are affected (text extraction works fine with Unicode paths)
The browser's file:// protocol always URL-encodes non-ASCII characters

Verification

Build completes without ENOENT errors
Background images render correctly in the output PPTX
<img> tags (logos, icons) appear in the correct slides

Example

Original failing path:

/Users/joe/project/纯日语PPT/workspace/assets/gradient-dark.png

Browser converts to:

file:///Users/joe/project/%E7%BA%AF%E6%97%A5%E8%AF%ADPPT/workspace/assets/gradient-dark.png

PptxGenJS tries to open (fails):

/Users/joe/project/%E7%BA%AF%E6%97%A5%E8%AF%ADPPT/workspace/assets/gradient-dark.png

Fix: copy to /tmp/my-build/assets/gradient-dark.png and reference that path instead.

Notes

This affects any Playwright-based HTML-to-PPTX pipeline, not just html2pptx specifically
The issue is in the handoff between browser (which URL-encodes) and Node.js fs (which expects raw paths)
An alternative fix would be to decodeURIComponent() paths in the html2pptx library itself, but the temp-copy approach is safer and doesn't require modifying library code
Also set NODE_PATH to point to your workspace's node_modules if html2pptx.js is located outside the project directory