html2pptx-unicode-path-fix
Installation
SKILL.md
html2pptx Unicode Path Fix
Problem
When the html2pptx pipeline processes HTML files located in directories with non-ASCII
characters (e.g., 纯日语PPT/, 日本語資料/), PptxGenJS fails with ENOENT errors
because the browser URL-encodes the file:// paths and PptxGenJS tries to open the
encoded path literally on disk.
Context / Trigger Conditions
- Error message:
ENOENT: no such file or directorywith URL-encoded path like%E7%BA%AF%E6%97%A5%E8%AF%ADPPT - Using html2pptx.js with background-image URLs or
<img>tags - HTML files or assets are in directories with CJK, accented, or other non-ASCII characters
- The build succeeds for all text/shape extraction but fails at
pptx.writeFile()when PptxGenJS tries to read referenced image files
Solution
Copy all build inputs (HTML slides + assets) to a temporary directory with ASCII-only paths before running the build:
const fs = require('fs');
const path = require('path');
const TMP_BUILD_DIR = '/tmp/my-build';
const TMP_SLIDES_DIR = path.join(TMP_BUILD_DIR, 'slides');
const TMP_ASSETS_DIR = path.join(TMP_BUILD_DIR, 'assets');
// 1. Create temp directories
fs.mkdirSync(TMP_SLIDES_DIR, { recursive: true });
fs.mkdirSync(TMP_ASSETS_DIR, { recursive: true });
// 2. Copy assets to ASCII-path location
fs.copyFileSync(originalAssetPath, path.join(TMP_ASSETS_DIR, 'image.png'));
// 3. Copy HTML files, rewriting relative paths to absolute ASCII paths
for (const slideFile of slideFiles) {
let html = fs.readFileSync(slideFile, 'utf8');
html = html.replace(/\.\.\/assets\//g, `${TMP_ASSETS_DIR}/`);
html = html.replace(/\.\.\/\.\.\/Logo\.png/g, `${TMP_ASSETS_DIR}/Logo.png`);
fs.writeFileSync(path.join(TMP_SLIDES_DIR, path.basename(slideFile)), html, 'utf8');
}
// 4. Run html2pptx on the temp copies
const result = await html2pptx(path.join(TMP_SLIDES_DIR, 'slide01.html'), pptx);
// 5. Save output to temp, then copy to final Unicode destination
const tmpOutput = path.join(TMP_BUILD_DIR, 'output.pptx');
await pptx.writeFile({ fileName: tmpOutput });
fs.copyFileSync(tmpOutput, finalUnicodePath);
Key points:
- The OUTPUT path also needs to be ASCII if PptxGenJS writes there
- Only image/background paths are affected (text extraction works fine with Unicode paths)
- The browser's
file://protocol always URL-encodes non-ASCII characters
Verification
- Build completes without ENOENT errors
- Background images render correctly in the output PPTX
<img>tags (logos, icons) appear in the correct slides
Example
Original failing path:
/Users/joe/project/纯日语PPT/workspace/assets/gradient-dark.png
Browser converts to:
file:///Users/joe/project/%E7%BA%AF%E6%97%A5%E8%AF%ADPPT/workspace/assets/gradient-dark.png
PptxGenJS tries to open (fails):
/Users/joe/project/%E7%BA%AF%E6%97%A5%E8%AF%ADPPT/workspace/assets/gradient-dark.png
Fix: copy to /tmp/my-build/assets/gradient-dark.png and reference that path instead.
Notes
- This affects any Playwright-based HTML-to-PPTX pipeline, not just html2pptx specifically
- The issue is in the handoff between browser (which URL-encodes) and Node.js fs (which expects raw paths)
- An alternative fix would be to
decodeURIComponent()paths in the html2pptx library itself, but the temp-copy approach is safer and doesn't require modifying library code - Also set
NODE_PATHto point to your workspace'snode_modulesif html2pptx.js is located outside the project directory