semantic-szz-analyzer
Semantic SZZ Analyzer
This skill is a delta over → szz-bug-identifier. Run classic SZZ first; this skill filters and re-ranks its candidates using semantic understanding instead of line-level blame.
Classic SZZ's precision problem: git blame is textual. It tells you who last touched a line, not who last changed its meaning. A variable rename, an indent, a refactor that moves a line unchanged — all of these become false-positive bug-introducers.
Semantic filters — applied on top of classic SZZ output
| Filter | What it checks | Effect |
|---|---|---|
| AST-diff meaningfulness | Did the blamed commit change the line's AST, or only its text? | Drops rename/reformat FPs |
| Def-use chain relevance | Does the blamed line define/use a variable that the fix reads/writes? | Drops incidental adjacent-line hits |
| Semantic-preserving refactor | Is the blamed commit a known-safe refactor (extract method, inline, rename)? | Reroutes blame to the commit before the refactor |
| Bug-pattern match | Does the blamed commit's diff look like it introduces the kind of bug the fix addresses? (null check added → look for the commit that removed a guard or added the deref) | Boosts confidence when matched |
Re-ranking
When multiple candidates survive filtering, rank by:
- Semantic distance: how much did the candidate change the behavior of the fixed line? A commit that added the line beats one that renamed a variable in it.
- Temporal proximity to bug report: a commit 2 days before the report beats one from 2 years before.
- Author signal: if the fix author is also the candidate author, slight boost — they probably know what they broke.
Worked example
Fix: Adds a null check on user.profile before dereferencing.
Classic SZZ candidates:
a1b2c3d— "Renamep→profile" (6 months ago) — blame hits this because the dereference line was touchede4f5g6h— "Remove unused profile validation" (3 weeks ago) — blame misses this; it touched a different linei7j8k9l— "Add profile feature" (1 year ago) — the original dereference
Semantic pass:
a1b2c3d: AST-diff → rename only, no behavior change → dropped. Re-blame through it.- Re-blame lands on
i7j8k9l. But: bug was reported 3 weeks ago.i7j8k9lis 1 year old — survived a year without a report? - Bug-pattern match: the fix adds a guard. Did any recent commit remove a guard? →
e4f5g6hremovedvalidateProfile()which checked non-null.
Verdict: e4f5g6h is the true introducer. Classic SZZ missed it entirely because the fix and the removal touch different lines.
Edge cases
- The fix and the introduction touch disjoint code: Classic SZZ is blind here; semantic SZZ can catch it only if the def-use chain connects them. If not, neither algorithm finds it.
- Introduced by omission: "The bug is that a check was never added." No commit to blame — the closest you get is the commit that added the code around where the check should be.
- Squash-merge repos: All bugs blame to the squash commit. You need the pre-squash branch history, or you're stuck.
Do not
- Do not use semantic SZZ as your first pass. It's an order of magnitude slower. Classic SZZ → filter → semantic re-rank on survivors.
- Do not override a classic-SZZ hit with a semantic-SZZ miss. Semantic SZZ adds candidates the textual blame missed; it shouldn't silently drop direct hits.
Output format
Same as szz-bug-identifier, with an additional semantic_evidence field per candidate:
fix: <sha>
candidates (semantic-filtered):
<sha> confidence=<high|med|low>
semantic_evidence: <AST change summary / def-use link / pattern match>
classic_szz_hit: <yes|no — found via semantic reroute>
More from santosomar/general-secure-coding-agent-skills
code-review-assistant
Performs structured code review on a diff or file set, producing inline comments with severity levels and a summary. Checks correctness, error handling, security, and maintainability — in that priority order. Use when reviewing a pull request, when the user asks for a code review, when preparing code for merge, or when a second opinion is needed on a change.
15dependency-resolver
Diagnoses and resolves package dependency conflicts — version mismatches, diamond dependencies, cycles — across npm, pip, Maven, Cargo, and similar ecosystems. Use when install fails with a resolution error, when two packages require incompatible versions of a third, or when upgrading one dependency breaks another.
4configuration-generator
Generates configuration files for services and tools (app config, logging config, linter config, database config) from a brief description of desired behavior, matching the target format's idioms. Use when bootstrapping a new service, when the user asks for a config file for a specific tool, or when translating config intent between formats.
3ci-pipeline-synthesizer
Generates CI pipeline configs by analyzing a repo's structure, language, and build needs — GitHub Actions, GitLab CI, or other platforms. Use when bootstrapping CI for a new repo, when porting from one CI to another, when the user asks for a pipeline that builds and tests their project, or when wiring in security gates.
3api-design-assistant
Reviews and designs API contracts — function signatures, REST endpoints, library interfaces — for usability, evolvability, and the principle of least surprise. Use when designing a new public interface, when reviewing an API PR, when the user asks whether a signature is well-designed, or when planning a breaking change.
2code-refactoring-assistant
Executes refactorings — extract method, inline, rename, move — in small, behavior-preserving steps with a test between each. Use when the user wants to restructure working code, when cleaning up after a feature lands, or when a smell has been identified and needs fixing.
2