spark-principal-engineer

Installation
SKILL.md

Spark Mastery (Senior → Principal)

Operate

  • Start from data volume, compute economics, shuffle behavior, and correctness requirements.
  • Treat Spark as a distributed execution system with real storage, network, and scheduling tradeoffs.
  • Prefer explicit workload design over vague “big data” assumptions.
  • Optimize for predictable cost, reliability, and debuggable pipelines.

Default Standards

  • Data layout and partitioning must match workload reality.
  • Shuffle-heavy patterns require scrutiny.
  • Memory and executor tuning should follow evidence.
  • Streaming and batch semantics must be separated clearly.
  • Platform cost and job performance should be evaluated together.

References

Weekly Installs
2
GitHub Stars
5
First Seen
5 days ago