skills/hubeiqiao/skills/html2pptx-unicode-path-fix

html2pptx-unicode-path-fix

Installation
SKILL.md

html2pptx Unicode Path Fix

Problem

When the html2pptx pipeline processes HTML files located in directories with non-ASCII characters (e.g., 纯日语PPT/, 日本語資料/), PptxGenJS fails with ENOENT errors because the browser URL-encodes the file:// paths and PptxGenJS tries to open the encoded path literally on disk.

Context / Trigger Conditions

  • Error message: ENOENT: no such file or directory with URL-encoded path like %E7%BA%AF%E6%97%A5%E8%AF%ADPPT
  • Using html2pptx.js with background-image URLs or <img> tags
  • HTML files or assets are in directories with CJK, accented, or other non-ASCII characters
  • The build succeeds for all text/shape extraction but fails at pptx.writeFile() when PptxGenJS tries to read referenced image files

Solution

Copy all build inputs (HTML slides + assets) to a temporary directory with ASCII-only paths before running the build:

const fs = require('fs');
const path = require('path');

const TMP_BUILD_DIR = '/tmp/my-build';
const TMP_SLIDES_DIR = path.join(TMP_BUILD_DIR, 'slides');
const TMP_ASSETS_DIR = path.join(TMP_BUILD_DIR, 'assets');

// 1. Create temp directories
fs.mkdirSync(TMP_SLIDES_DIR, { recursive: true });
fs.mkdirSync(TMP_ASSETS_DIR, { recursive: true });

// 2. Copy assets to ASCII-path location
fs.copyFileSync(originalAssetPath, path.join(TMP_ASSETS_DIR, 'image.png'));

// 3. Copy HTML files, rewriting relative paths to absolute ASCII paths
for (const slideFile of slideFiles) {
  let html = fs.readFileSync(slideFile, 'utf8');
  html = html.replace(/\.\.\/assets\//g, `${TMP_ASSETS_DIR}/`);
  html = html.replace(/\.\.\/\.\.\/Logo\.png/g, `${TMP_ASSETS_DIR}/Logo.png`);
  fs.writeFileSync(path.join(TMP_SLIDES_DIR, path.basename(slideFile)), html, 'utf8');
}

// 4. Run html2pptx on the temp copies
const result = await html2pptx(path.join(TMP_SLIDES_DIR, 'slide01.html'), pptx);

// 5. Save output to temp, then copy to final Unicode destination
const tmpOutput = path.join(TMP_BUILD_DIR, 'output.pptx');
await pptx.writeFile({ fileName: tmpOutput });
fs.copyFileSync(tmpOutput, finalUnicodePath);

Key points:

  • The OUTPUT path also needs to be ASCII if PptxGenJS writes there
  • Only image/background paths are affected (text extraction works fine with Unicode paths)
  • The browser's file:// protocol always URL-encodes non-ASCII characters

Verification

  • Build completes without ENOENT errors
  • Background images render correctly in the output PPTX
  • <img> tags (logos, icons) appear in the correct slides

Example

Original failing path:

/Users/joe/project/纯日语PPT/workspace/assets/gradient-dark.png

Browser converts to:

file:///Users/joe/project/%E7%BA%AF%E6%97%A5%E8%AF%ADPPT/workspace/assets/gradient-dark.png

PptxGenJS tries to open (fails):

/Users/joe/project/%E7%BA%AF%E6%97%A5%E8%AF%ADPPT/workspace/assets/gradient-dark.png

Fix: copy to /tmp/my-build/assets/gradient-dark.png and reference that path instead.

Notes

  • This affects any Playwright-based HTML-to-PPTX pipeline, not just html2pptx specifically
  • The issue is in the handoff between browser (which URL-encodes) and Node.js fs (which expects raw paths)
  • An alternative fix would be to decodeURIComponent() paths in the html2pptx library itself, but the temp-copy approach is safer and doesn't require modifying library code
  • Also set NODE_PATH to point to your workspace's node_modules if html2pptx.js is located outside the project directory
Weekly Installs
1
First Seen
7 days ago