binary-size-analysis
hermesvm Binary Size Analysis
Analyze per-commit binary size changes of the hermesvm shared library across a
git commit range. Produces a markdown report with per-commit sizes and summary
tables of significant increases and decreases.
When This Skill Applies
- Analyzing which commits caused hermesvm binary size growth
- Investigating size regressions in the hermesvm library
- Producing a size report for a range of commits
Pre-Run Prompt
Before starting, inform the user how size is measured and ask about verification:
-
Explain the measurement method: Tell the user that sizes are measured by summing ALLOC section sizes (via
readelf -SWon Linux orotool -lon macOS), not by measuring the file size withstat. This excludes alignment padding and non-shipping sections (symbol tables, string tables, code signatures), matching what actually ships on-device afterstrip. -
Ask about parent-commit verification: This verification is enabled by default. It builds each BUILD commit's parent to confirm no skipped commit changed the binary size. It catches miscategorized SKIP commits but roughly doubles the build time. If the user declines, skip all verification substeps in Step 2.
Required User Input
- Commit range: A base commit and an end commit (or HEAD). For example:
b48b6cc..HEADorv0.290.0..v0.300.0 - CMake flags (optional): Additional CMake configuration flags. Defaults are listed below.
- Parent-commit verification (optional): Enabled by default. Disable to halve build time at the risk of missing miscategorized SKIP commits.
Default Build Configuration
cmake -B cmake-build-minsizerel -GNinja \
-DHERMESVM_GCKIND=HADES \
-DHERMESVM_HEAP_HV_MODE=HEAP_HV_PREFER32 \
-DCMAKE_BUILD_TYPE=MinSizeRel \
-DHERMESVM_ALLOW_JIT=2 \
-DHERMESVM_ALLOW_CONTIGUOUS_HEAP=ON \
-DHERMES_IS_MOBILE_BUILD=ON
If the user provides additional flags, append them to the cmake command. If the user provides a different value to flags in default build configuration above, replace it with the user-provided value (e.g., -DHERMESVM_ALLOW_JIT=0).
Procedure
Step 0: Verify Git Repository and Tools
This skill requires a GitHub checkout of the static_h repo (commit hashes are git hashes). Before proceeding, verify the working directory is a valid git repo:
git rev-parse --is-inside-work-tree
If this fails (e.g., the repo is a Sapling/Mercurial checkout or not a repo at all), stop and inform the user that this skill only works in a git checkout of the Hermes repository.
Bloaty setup: This skill uses bloaty for detailed per-section diffs on large size increases. Download and build it:
git clone https://github.com/google/bloaty.git ~/bloaty
cd ~/bloaty && cmake -B build -GNinja && cmake --build build
If the download or build fails (e.g., no network access, missing dependencies),
ask the user to provide the path to a pre-built bloaty binary.
Step 1: Categorize Commits
Generate the list of commits in the range and categorize each as BUILD or SKIP.
A commit is SKIP if it only touches files outside the directories that contribute to the hermesvm binary. The relevant source directories are:
lib/(VM, Parser, Sema, IR, BCGen, Support, Regex, etc.)API/hermes/public/hermes/include/hermes/external/CMakeLists.txt(exact match, not filtered by.txtsuffix),cmake/,lib/config/
Files that never affect the binary: test/, unittests/, .github/, docs
(.md, .txt, .rst), tools/, npm packages (hermes-parser,
prettier-plugin, etc.), utils/, benchmarks/, website/, blog/.
Use this Python script to categorize:
#!/usr/bin/env python3
import subprocess, sys
RELEVANT_PREFIXES = [
"lib/", "API/hermes/", "API/jsi/", "public/hermes/", "include/hermes/",
"external/", "cmake/", "lib/config/",
]
# Files that match exactly (not as prefixes) and should always be BUILD.
RELEVANT_EXACT = ["CMakeLists.txt"]
ALWAYS_SKIP_SUFFIXES = [".md", ".txt", ".rst"]
base_commit = sys.argv[1]
end_commit = sys.argv[2] if len(sys.argv) > 2 else "HEAD"
result = subprocess.run(
["git", "log", "--reverse", "--format=%H %s", f"{base_commit}..{end_commit}"],
capture_output=True, text=True
)
for line in result.stdout.strip().split("\n"):
if not line:
continue
commit, subject = line.split(" ", 1)
files_result = subprocess.run(
["git", "diff-tree", "--no-commit-id", "--name-only", "-r", commit],
capture_output=True, text=True
)
files = files_result.stdout.strip().split("\n") if files_result.stdout.strip() else []
has_relevant = False
for f in files:
if f in RELEVANT_EXACT:
has_relevant = True
break
if any(f.endswith(s) for s in ALWAYS_SKIP_SUFFIXES):
continue
if any(f.startswith(p) for p in RELEVANT_PREFIXES):
has_relevant = True
break
category = "BUILD" if has_relevant else "SKIP"
print(f"{category}\t{commit[:11]}\t0\t0\t{subject}")
Save the BUILD-only lines to a file for the build loop.
Step 2: Build and Measure
Critical: Always run cmake -B ... before each ninja hermesvm to ensure
new/removed source files are detected. Without reconfiguration, incremental
builds miss new files (especially JS polyfills in lib/InternalJavaScript/),
causing deltas to be misattributed to later commits.
Size measurement: Use the sum of ALLOC section sizes, not stat on the
library file. This is important for two reasons:
- Excludes alignment padding: The linker aligns loadable segments to page boundaries (64KB on Linux, 16KB on macOS). A 1-byte code change can shift a segment boundary and cause up to one page of padding change in the file size, creating false deltas.
- Matches production: ALLOC sections are exactly the sections that remain
after
stripand ship on-device. Non-ALLOC sections (.symtab,.strtab,.shstrtab,.comment,.gnu.build.attributes) are stripped in production.
Use the following Python helper to measure size on both platforms:
import platform, re, subprocess
def get_section_size(library_path):
"""Sum section sizes that ship in production, excluding symbol/string
tables and inter-segment alignment padding.
- Linux (ELF): sums all sections with the ALLOC flag via readelf.
This excludes .symtab, .strtab, .shstrtab, .comment, and
.gnu.build.attributes which are stripped in production.
- macOS (Mach-O): sums all individual section sizes via otool.
Sections are listed under __TEXT, __DATA, __DATA_CONST, etc.
The __LINKEDIT segment (which contains symbol tables, string
tables, code signatures) has no sub-sections so it is naturally
excluded.
"""
if platform.system() == "Darwin":
return _get_section_size_macho(library_path)
else:
return _get_section_size_elf(library_path)
def _get_section_size_elf(library_path):
output = subprocess.check_output(
["readelf", "-SW", library_path], text=True
)
total = 0
for line in output.split('\n'):
# Match section lines: [Nr] Name Type Addr Off Size ES Flg ...
m = re.match(
r'\s*\[\s*\d+\]\s+\S+\s+\S+\s+[0-9a-f]+\s+[0-9a-f]+\s+'
r'([0-9a-f]+)\s+[0-9a-f]+\s+(.*)', line
)
if m and 'A' in m.group(2):
total += int(m.group(1), 16)
return total
def _get_section_size_macho(library_path):
output = subprocess.check_output(
["otool", "-l", library_path], text=True
)
total = 0
in_section = False
for line in output.split('\n'):
# Each section entry in otool -l contains both "sectname" and
# "segname" lines, followed by "size 0x...". Only reset on
# "cmd " (a new load command), NOT on "segname" — otherwise
# the segname line inside each section entry clears the flag
# before the size line is reached.
if 'sectname' in line:
in_section = True
elif 'cmd ' in line:
in_section = False
elif in_section:
m = re.match(r'\s+size\s+(0x[0-9a-f]+)', line)
if m:
total += int(m.group(1), 16)
return total
Initialize the markdown results file immediately, then iterate through BUILD commits only:
- Checkout the base commit, configure, build, record baseline size.
- For each BUILD commit in chronological order:
a.
git checkout <hash> --force --quietb. Parent-commit verification (if enabled): Before building this commit, build its parent (<hash>~1) to verify the categorization of intervening SKIP commits:git checkout <hash>~1 --force --quietcmake -B <build_dir> <CMAKE_ARGS>cd <build_dir> && ninja hermesvm- Record the parent's ALLOC size
- Compare to the previous BUILD commit's ALLOC size. If they differ, a SKIP commit between the previous BUILD and this one changed the binary. Log a VERIFICATION FAILURE with the two sizes and the commit range that needs re-examination. Stop the analysis and report the failure so Step 1's categorization can be fixed.
git checkout <hash> --force --quiet(return to the BUILD commit) c.cmake -B <build_dir> <CMAKE_ARGS>d.cd <build_dir> && ninja hermesvme. Record ALLOC section size sum usingget_section_size()on<build_dir>/lib/libhermesvm.so(Linux) orlibhermesvm.dylib(macOS) f. Compute delta from previous BUILD commit's size g. Write the row to the markdown file immediately before proceeding h. Bloaty diff (if delta >= 3000 bytes): Runbloatyto get a per-section breakdown of the size increase. This helps identify whether the growth is in code (.text), data (.rodata), or elsewhere.- The previous BUILD commit's binary must be saved before building the
current commit. Copy it at the start of each iteration (before step a):
cp <build_dir>/lib/libhermesvm.so /tmp/libhermesvm_prev.soNote: when verification is enabled, the parent binary from step (b) can be used instead, but using the previous BUILD binary is simpler and works in both modes. - Run:
bloaty <current_binary> -- /tmp/libhermesvm_prev.so - Filter the output to keep only ALLOC sections (section names are the
last token on each line):
.text,.rodata,.data,.data.rel.ro,.bss,.tbss,.gcc_except_table,.eh_frame,.eh_frame_hdr,.init_array,.fini_array,.plt,.got,.got.plt,.rela.dyn,.rela.plt,.dynsym,.dynstr,.gnu.hash,.gnu.version,.init,.fini, and theTOTALrow. - Append the filtered bloaty output to the end of the markdown file under
a
## Bloaty Diffsection, with a heading for each commit. i. If build fails, record BUILD_FAIL and continue
Do not include SKIP commits in the output table.
Step 3: Generate Summary
After all commits are processed, append to the markdown file:
- Overall summary (baseline, final, net change, percentage)
- Significant Size Increases table (delta >= +1000), sorted by delta descending
- Significant Size Decreases table (delta <= -1000), sorted by delta ascending (largest decrease first)
Important: When parsing the markdown table to extract deltas, handle pipe
characters (|) in commit subjects carefully. Use regex anchored to the table
structure rather than naive pipe splitting.
Step 4: Restore Branch
After analysis, restore the original branch:
git checkout <original_branch> --force --quiet
Output Format
The markdown file (hermesvm_size_analysis.md in the repo root) contains:
- Build configuration header
- Per-commit table:
| # | Commit | Subject | Size (bytes) | Delta | Total Delta | - Summary section with significant changes tables
- Bloaty diff section with per-section breakdowns for commits with delta >= 3KB
Notes
- The build directory is
cmake-build-minsizerelinside the repo. - Each cmake reconfiguration + incremental build takes ~20-30 seconds per commit. For 500+ BUILD commits, expect ~3-5 hours total.
- Run the build loop as a background task and monitor with periodic tail checks.
- Build failures near HEAD are common when commits depend on later fixes; record them and continue.