parallel-computing
SKILL.md
Parallel Computing
Use this skill to convert parallel performance work into reproducible scaling decisions.
Workflow
- Define scaling objective and constraints.
- Capture workload shape, data size, and latency/throughput targets.
- Define hardware assumptions (core count, SMT policy, NUMA context).
- Choose parallel model and partitioning.
- Select task/data/pipeline parallelism intentionally.
- Set chunk size and scheduling strategy to minimize overhead and imbalance.
- Define shared-state boundaries before coding.
- Diagnose bottlenecks.
- Check lock contention, false sharing, synchronization frequency, and memory bandwidth pressure.
- Separate algorithmic limits from runtime/scheduler overhead.
- Validate scaling behavior.
- Compare baseline vs current throughput by thread count.
- Evaluate parallel efficiency and regressions at each thread level.
- Treat regressions above threshold as blockers.
- Deliver implementation handoff.
- Include tuning deltas, tradeoffs, and reproducible benchmark commands.
- Provide clear patch plan for runtime/algorithm changes.
Commands
python3 scripts/compare_parallel_scaling.py \
--baseline <baseline.json> \
--current <current.json> \
--regression-threshold-pct 5 \
--efficiency-drop-threshold-pct 10
Treat non-zero exits as blocker regressions.
Output Contract
Return:
Scaling Context: workload and hardware assumptions.Findings: thread-level throughput/speedup/efficiency deltas.Optimization Plan: concrete runtime/algorithm changes.Verification: benchmark commands and thresholds.Residual Risks: unresolved contention or scaling ceilings.
References
references/workflow.md: detailed parallel optimization sequence.references/scaling-playbook.md: common bottlenecks and remedies.references/signoff-template.md: concise scaling sign-off format.
Execution Rules
- Compare like-for-like workloads and environments only.
- Report both speedup and efficiency, not throughput alone.
- Flag thread-level regressions above thresholds as blockers.
- Avoid overfitting to one thread count; evaluate full scaling curve.
Weekly Installs
1
Repository
egorfedorov/slo…e-engineGitHub Stars
2
First Seen
7 days ago
Security Audits
Installed on
zencoder1
amp1
cline1
openclaw1
opencode1
cursor1