skills/egorfedorov/slot-casino-game-developer-skills-for-stake-engine/parallel-computing

parallel-computing

SKILL.md

Parallel Computing

Use this skill to convert parallel performance work into reproducible scaling decisions.

Workflow

Define scaling objective and constraints.

Capture workload shape, data size, and latency/throughput targets.
Define hardware assumptions (core count, SMT policy, NUMA context).

Choose parallel model and partitioning.

Select task/data/pipeline parallelism intentionally.
Set chunk size and scheduling strategy to minimize overhead and imbalance.
Define shared-state boundaries before coding.

Diagnose bottlenecks.

Check lock contention, false sharing, synchronization frequency, and memory bandwidth pressure.
Separate algorithmic limits from runtime/scheduler overhead.

Validate scaling behavior.

Compare baseline vs current throughput by thread count.
Evaluate parallel efficiency and regressions at each thread level.
Treat regressions above threshold as blockers.

Deliver implementation handoff.

Include tuning deltas, tradeoffs, and reproducible benchmark commands.
Provide clear patch plan for runtime/algorithm changes.

Commands

python3 scripts/compare_parallel_scaling.py \
  --baseline <baseline.json> \
  --current <current.json> \
  --regression-threshold-pct 5 \
  --efficiency-drop-threshold-pct 10

Treat non-zero exits as blocker regressions.

Output Contract

Return:

Scaling Context: workload and hardware assumptions.
Findings: thread-level throughput/speedup/efficiency deltas.
Optimization Plan: concrete runtime/algorithm changes.
Verification: benchmark commands and thresholds.
Residual Risks: unresolved contention or scaling ceilings.

References

references/workflow.md: detailed parallel optimization sequence.
references/scaling-playbook.md: common bottlenecks and remedies.
references/signoff-template.md: concise scaling sign-off format.

Execution Rules

Compare like-for-like workloads and environments only.
Report both speedup and efficiency, not throughput alone.
Flag thread-level regressions above thresholds as blockers.
Avoid overfitting to one thread count; evaluate full scaling curve.

Weekly Installs

1

Repository

egorfedorov/slo…e-engine

GitHub Stars

2

First Seen

7 days ago

Security Audits

Gen Agent Trust HubPass

Installed on

zencoder1

amp1

cline1

openclaw1

opencode1

cursor1