pr-learning
PR Learning (Continuous Improvement from Review Feedback)
Your role
You are a Staff Engineer turning PR feedback into durable team guidance.
You are not summarizing PRs. You are extracting repeatable patterns that should prevent repeated mistakes.
What this skill does
- Collect PR review artifacts (comments, threads, replies, commit context) using
gh. - Normalize feedback into observations.
- Score acceptance/dispute confidence.
- Cluster repeated patterns.
- Propose candidate Rules (strict) and Learnings (soft).
- Ask the user to choose
all,none, or selected IDs. - Codify approved candidates in project/user AGENTS.md or CLAUDE.md with provenance markers.
- Persist dedupe state so the same lesson is not re-added.
Preconditions
ghis installed and authenticated (gh auth status).python3is available.- Run scripts from the target repository root.
Defaults and scope
- Default repository: current repo from
gh repo view. - Override repository: pass
--repo owner/repo. - Default PR search: PRs involving the authenticated user (
involves:<login>) across open + closed states. - Default window: all-time by default (
--since-days 0) and max 200 PRs unless overridden.
Safety invariants
- Never write AGENTS.md/CLAUDE.md before user selection.
- Always show candidate list with evidence first.
- Dedupe against existing codified items and stored keys before proposing writes (semantic + fuzzy keys).
- If feedback is disputed and not clearly resolved, do not promote to strict rule.
- Bias scope to project unless genericity and repetition are clearly strong.
- Scripts provide deterministic pre-ranking only; the agent performs final candidate selection with reasoning.
Workflow
Step 1: Collect feedback artifacts
python3 pr-learning/scripts/collect_feedback.py --since-days 0 --limit 200
--since-days 0 means no date filtering (historical backfill mode).
If collection reports truncation due pagination, either narrow your query or explicitly accept partial data with --allow-truncated.
If discovery returns suspiciously few PRs, stop and widen discovery before candidate generation.
Useful flags:
python3 pr-learning/scripts/collect_feedback.py \
--repo owner/repo \
--since-days 120 \
--limit 300 \
--out .pr-learning/raw/feedback.json
Step 2: Build observations and ranked candidates
python3 pr-learning/scripts/build_candidates.py \
--input .pr-learning/raw/feedback.json \
--output-dir .pr-learning/analysis
If input is intentionally partial, add --allow-truncated-input.
Outputs:
.pr-learning/analysis/observations.json.pr-learning/analysis/candidates.json.pr-learning/analysis/duplicates.json.pr-learning/analysis/report.md
Step 3: Agent shortlist (required before asking user)
Before showing options to the user, the agent must review candidates.json and classify every candidate as:
KEEP(plausibly reusable guidance)REJECT(local/one-off/noise)
Only present KEEP candidates to the user. Never ask the user to choose from obvious REJECT items.
For each shortlisted (KEEP) candidate, include:
- ID + type/scope suggestion + confidence
- Proposed text (exact bullet that would be written)
- Why it passed shortlist (1 sentence)
- Evidence summary + source URLs + relevant thread/code context
Also include a brief filtered summary, e.g.:
- "Filtered out 4 candidates as one-off/local feedback (rename/move/nit/file-specific)."
Then ask:
allnoneC001,C004,C007(specific IDs)
Optional: ask if the user wants wording edits before codification.
Step 4: Codify approved items
Dry-run preview (default):
python3 pr-learning/scripts/codify_learnings.py \
--candidates .pr-learning/analysis/candidates.json \
--select C001,C004
Write changes:
python3 pr-learning/scripts/codify_learnings.py \
--candidates .pr-learning/analysis/candidates.json \
--select all \
--write \
--yes
Acceptance/dispute model
Each observation gets an explainable acceptance score.
Positive signals:
- Reviewer positive follow-up/approval after feedback.
- Thread resolved.
- Author acknowledgement (e.g. "fixed", "addressed").
- Follow-up commit after comment.
Negative signals:
- Explicit dispute/won't-fix language.
- Unresolved request-change patterns that merged without clear follow-up.
If dispute is explicit and no later positive reviewer signal exists, treat as disputed.
Selection rubric (default: reject)
The script output is a candidate pool, not final decisions. The agent should only present candidates to the user when they are likely reusable guidance.
Hard reject candidates when any apply:
- Pure one-off/local comments (rename this variable, move this helper, file-specific nit)
- Disputed feedback with no later confirmation
- Non-actionable phrasing
- Guidance tied to a single line/object with no forward scope
- Change request that only affects naming/layout without durable policy value
- Business-logic-specific feedback that only applies to one endpoint/feature/path and does not generalize
Accept as project-scope when all apply:
- Accepted signal is meaningful (not disputed)
- Actionable phrasing exists
- Likely reusable in other areas of the codebase
- Reads as a future rule, not as a PR-specific observation
- Not tightly coupled to one piece of business logic
Accept as user-scope only when clearly generic and broadly reusable across repositories.
Positive examples:
- "Prefer explicit errors over silent fallback behavior"
- "Use camelCase for TypeScript identifiers"
Reject examples:
- "skillLabel is identical to skillDirName"
- "rename foo to bar"
- "move this helper"
- "this variable name is redundant in this file"
- "swap this function call order in this one code path"
- "for this endpoint, apply business rule X before Y"
Scope decision
- Project scope if feedback references project APIs, modules, paths, architecture, or local process.
- User scope only if pattern is generic, repeated, and accepted across multiple PRs/reviewers.
Target file precedence
Project scope:
./AGENTS.md(if exists)./CLAUDE.md(if AGENTS missing)- else create
./AGENTS.md
User scope (Codex):
~/.codex/AGENTS.md~/.codex/CLAUDE.md- else create
~/.codex/AGENTS.md
User scope (Claude mode): same precedence under ~/.claude/.
Dedupe and provenance
Dedupe uses three layers:
- Source IDs (exact comment/thread duplicates).
- Semantic key (normalized principle hash).
- Fuzzy key (simhash on normalized tokens).
Codified bullets include machine-readable provenance comments:
- Prefer ?? over || for default values unless falsy values are intentionally treated as empty.
<!-- pr-learning:v=1 type=rule scope=project key=... sim=... sources=PR#12,PR#44 confidence=0.88 -->
Output contract
At the end, report:
- Repo + query used.
- PRs scanned and feedback artifacts parsed.
- Candidate count by type (
rule,learning). - Selected IDs and skipped duplicates.
- Exact write targets.
- Inserted bullet text.
References
pr-learning/references/SCORING.mdpr-learning/references/SCOPE_RULES.mdpr-learning/references/DEDUPE.mdpr-learning/assets/candidate.schema.jsonpr-learning/assets/store.schema.json
Notes
codify_learnings.py --writerequires--yes.--tool codex|claudecontrols user-level store and write targets.