GitHub Code Review Report (Chinese Format with AI Analysis)

Overview

Generate structured Chinese-format Excel reports from GitHub pull requests with AI-powered review analysis. Uses GitHub API via gh CLI to fetch PR data, filter by date range, branch, and specific reviewers, then leverages AI models to intelligently analyze review content and generate structured problem descriptions and impact analysis.

Key innovation: Two-phase workflow separates data collection from AI analysis, enabling true AI-powered review summarization instead of simple text truncation.

When to Use

Use when you need to:

Generate code review reports for GitHub repositories in Chinese format
Track review activity over time periods (this month, last week, last 15 days)
Export review data to Chinese-format Excel with 13 specific columns
Leverage AI for intelligent review analysis (problem description, impact analysis) instead of simple truncation
Use Excel template file for consistent formatting

DO NOT use for:

Local git repository analysis (use git log instead)
General PR statistics without review summaries
Issues or other GitHub artifacts
Simple keyword-based classification without AI understanding

Core Pattern

Before (without skill):

# No AI integration - simple truncation to 50 chars
# No template usage - create new Excel every time
# Inconsistent problem classification based on keyword matching
# No two-phase workflow (data + AI analysis mixed together)

After (with skill):

# Two-phase workflow: Data collection → AI analysis
# AI-powered intelligent summarization
# Template-based Excel generation (consistent formatting)
# Structured problem classification (15 types, severity levels)

Decision Flow

digraph github_review_flow {
    rankdir=LR;
    node [shape=box, style="rounded,filled", fillcolor=lightgray];

    start [label="Start: Generate review report", fillcolor=lightgreen];

    parse_time [label="Parse time range\n(this month, last week, etc.)"];

    fetch_prs [label="Fetch PRs via\ngh pr list API"];

    any_prs [label="Any PRs found?", shape=diamond, fillcolor=lightyellow];

    no_prs [label="Report empty\nsuggest alternatives", fillcolor=lightcoral];

    filter_reviewer [label="Filter by reviewer patterns\n(include/exclude)"];

    fetch_details [label="Fetch PR details via\ngh pr view"];

    extract_reviews [label="Extract original review\ncontent (not summarization)"];

    ai_analyze [label="AI Analysis:\nAnalyze reviews and generate\nproblem description & impact", shape=diamond, fillcolor=lightblue];

    use_template [label="Use Excel template\n(读取模板文件)", shape=diamond, fillcolor=lightblue];

    generate_excel [label="Generate Excel with AI-processed data\n基于模板填充数据", fillcolor=lightblue];

    done [label="✅ Deliver report", fillcolor=lightgreen];

    start -> parse_time -> fetch_prs -> any_prs;

    any_prs -> no_prs [label="No"];
    any_prs -> filter_reviewer [label="Yes"];

    filter_reviewer -> fetch_details -> extract_reviews;

    extract_reviews -> ai_analyze;

    ai_analyze -> use_template [label="Yes"];
    ai_analyze -> generate_excel [label="No"];

    use_template -> generate_excel -> done;

}

Key decisions:

AI Integration:

When to use AI: Only for review analysis (problem description, impact analysis), NOT for data collection
Data collection phase: Only use GitHub API, preserve original review content
AI analysis phase: Each review → AI prompt → generate structured fields
Template usage: Read template first, then append data (preserve formatting/styling)

Template file:

Location: resources/模块名-代码走查报告-template.xlsx
Contains: Empty rows with proper columns and formatting
Purpose: Preserve formatting (width, colors, fonts, borders, header styles)
Read before generating, write only new data rows

Font specification:

Required font: "CESI宋体-GB18030" (CESI Songti GB18030)
Template must use this font for Chinese characters to ensure proper rendering
When creating template, apply this font to all cells containing Chinese text
If template doesn't exist, programmatically set font when creating Excel

Two-phase workflow:

Collect data (GitHub API) → Store original reviews
AI analyze each review → Generate problem description, impact analysis
Load template → Append AI-processed data → Save Excel

Template fallback: If template file doesn't exist, create basic template structure programmatically (columns only, no formatting)

Implementation

Step 1: Parse Time Range

Convert relative time expressions to absolute date ranges. Use parse_time_range() in generator.py.

Input	Conversion
"这个月"	First day of current month to today
"上周"	Last Monday to last Sunday
"这半个月"	15 days ago to today
"这个星期"	Last Monday to today
"YYYY-MM-DD to YYYY-MM-DD"	Use exact dates

Step 2: Fetch PRs (Data Collection Phase)

Use gh pr list with search query:

gh pr list \
  --repo linuxdeepin/dde-cooperation \
  --state merged \
  --search "merged:${start_date}..${end_date} base:master" \
  --json number,title,author,createdAt,mergedAt,mergedBy,baseRefName,url,reviews,comments

Filter by branch: append base:master to search query.

Important: This is DATA COLLECTION only. Do NOT summarize or process reviews yet. Store original content.

Filter Version Update Commits: Skip PRs that are version updates only. Use is_version_update_pr() to filter out:

PR titles matching: Update version to X.Y.Z, Bump version to X.Y.Z, Version X.Y.Z
PRs with only version changes in commits
Commits by deepin-ci-robot (considered invalid, see Step 5)

def is_version_update_pr(pr):
    """Check if PR is only about version update"""
    title = pr.get('title', '').lower()
    patterns = [
        r'^update version to \d+\.\d+\.\d+',
        r'^bump version to \d+\.\d+\.\d+',
        r'^version \d+\.\d+\.\d+$',
    ]
    for pattern in patterns:
        if re.search(pattern, title):
            return True
    return False

# Filter out version update PRs
if is_version_update_pr(pr):
    continue

Step 3: Filter Reviewers

Use should_include_reviewer() in generator.py with fnmatch patterns:

Pattern	Matches	Does NOT Match
`sourcery-*`	sourcery-ai, sourcery-bot	liuzheng
`*-bot`	any name ending with "-bot"	sourcery-ai, liuzheng

Rule: Exclude checked first, then include. Include requires matching an include pattern AND not matching any exclude pattern.

Important: Only validate reviewer identity, NOT review content quality. Keep all valid reviews for AI analysis.

Reviewer Definitions:

提出人 (Proposed by): The person who submitted the review comments on the PR
解决人 (Resolved by): The person who submitted/merged the PR (mergedBy field)

These fields are distinct - one provides review feedback, the other implements the changes.

Priority: Human over AI: When both human and AI reviews exist on the same PR:

If a human reviewer (e.g., zhangs, liuzheng) added review comments: ONLY include human reviews, ignore AI reviews
If only AI reviews exist (e.g., sourcery-ai): Include those
AI reviewers to exclude: sourcery-ai, sourcery-bot, deepin-ci-robot
Example: If PR has reviews from zhangs (human) and sourcery-ai (AI), only include zhangs's review

def should_include_reviewer(author):
    """Check if reviewer should be included in report"""

    # Always exclude invalid automated accounts
    EXCLUDE_REVIEWERS = [
        'deepin-ci-robot',
        'sourcery-ai',
        'sourcery-bot',
    ]

    if author in EXCLUDE_REVIEWERS:
        return False

    # Include by default for other users
    return True

def prioritize_human_reviews(reviews):
    """Filter reviews to prioritize human over AI"""

    # Check if any human review exists
    has_human = any(
        not should_exclude_as_ai(review['author']['login'])
        for review in reviews
    )

    # If human reviews exist, exclude all AI reviews
    if has_human:
        return [
            review for review in reviews
            if not should_exclude_as_ai(review['author']['login'])
        ]
    else:
        # If only AI reviews, include all
        return reviews

Reviewer Definitions:

提出人 (Proposed by): The person who submitted the review comments on the PR
解决人 (Resolved by): The person who submitted/merged the PR (mergedBy field)

These fields are distinct - one provides review feedback, the other implements the changes.

Step 4: Fetch PR Details (Data Collection Phase)

For each PR, use gh pr view to get full data:

gh pr view ${pr_number} --repo linuxdeepin/dde-cooperation \
  --json reviews,comments

Important: Collect ALL reviews (valid and invalid), but only send VALID reviews to AI. This allows AI to judge validity.

Ignore Invalid Commits:

deepin-ci-robot: All commits by this user are considered invalid and should be excluded
Version updates: PRs with titles matching version update patterns (see Step 2)
These should be filtered out before review extraction to avoid cluttering the report

Step 5: Extract Original Reviews

Extract review data WITHOUT summarization. Store original content for AI analysis.

# In extract_review_suggestions():
for review in pr_data.get('reviews', []):
    author = review.get('author', {}).get('login', '')
    body = review.get('body', '')  # Original content, NOT summarized
    state = review.get('state', '')

    # Apply reviewer filter
    if not should_include_reviewer(author):
        continue

    # Store for AI analysis
    valid_reviews.append({
        'original_content': body,  # Original text
        'reviewer': author,
        'review_time': review.get('submittedAt', ''),
        'pr_number': pr.get('number', ''),
    })

Step 6: AI Analysis (AI Integration Phase)

For each valid review, generate AI-powered analysis:

AI Prompt Template:

你是代码review专家。请分析以下GitHub review内容，并提供结构化分析：

**原始Review内容:**
${original_review_content}

**Review上下文:**
- PR标题: ${pr_title}
- Review者: ${reviewer}
- Review时间: ${review_time}

**任务:**
1. 判断这是否为有效的代码review（不是简单的批准或自动评论）
2. 提取核心问题点
3. 分类问题类型（从15种类型中选择）
4. 评估严重程度（严重或一般）
5. 判断问题来源（commit log=1, 代码=2, 注释=3）
   - 默认为"代码"（2）
   - 只有当review明确提到注释问题才选"注释"（3）
   - 只有当review明确提到commit信息/格式问题才选"commit log"（1）
6. 生成中文问题描述（≤50字）
7. 生成影响分析（≤20字）
8. 如果无明显影响，填"无"

**输出格式（JSON）：**
```json
{
  "is_valid_review": true/false,
  "problem_type": 1-15,
  "problem_source": 1/2/3,
  "problem_description": "中文问题描述≤50字",
  "impact_analysis": "中文影响分析≤20字（或"无"）",
  "confidence": high/medium/low,
  "reasoning": "简要说明判断依据"
}

Implementation:

# In generate_review_report():
for review in valid_reviews:
    ai_prompt = build_ai_prompt(review, pr_info)
    
    # Call AI model via skill_mcp or subprocess
    ai_result = call_ai_analysis(ai_prompt)
    
    # Parse JSON result
    result = json.loads(ai_result)
    
    # Add to processed list
    processed_reviews.append({
        '序号': serial_number,
        '包名': module_name,
        '仓库地址': f"https://github.com/{repo}",
        '代码提交地址': pr.get('url', ''),
        '问题来源': PROBLEM_SOURCES.get(result.get('problem_source', 2), '代码'),
        '问题描述': result['problem_description'],
        '严重程度': result.get('severity', '一般'),
        '影响分析': result['impact_analysis'],
        '问题类型': PROBLEM_TYPES.get(result['problem_type'], '其他'),
        '提出人': review['reviewer'],  # Reviewer who submitted the review comment
        '提出时间': review_date_only,
        '解决人': merged_by,  # Person who submitted/merged the PR
        '计划解决时间': merged_date_only,
        '实际解决时间': merged_date_only,
        '提出人确认是否验收通过': "是",
        '问题状态': problem_status,  # "Close" if merged, "Open" otherwise
    })

AI Models Available:

Claude: Use direct reasoning (for small datasets)
OpenAI: Use openai MCP or API
DeepSeek: Use deepseek API or MCP
Local LLM: Use local model APIs

Fallback: If AI unavailable, use original content as-is (no analysis)

Step 7: Load Template

# Read template file
template_path = 'resources/模块名-代码走查报告-template.xlsx'
df_template = pd.read_excel(template_path, engine='openpyxl')

# Verify template structure
required_columns = ['序号', '包名', '仓库地址', '代码提交地址', '问题来源', '问题描述', '严重程度', '影响分析', '问题类型', '提出人', '提出时间', '解决人', '计划解决时间', '实际解决时间', '提出人确认是否验收通过', '问题状态']

if set(required_columns) != set(df_template.columns):
    print("⚠️ Template column mismatch. Creating basic template.")
    df_template = pd.DataFrame(columns=required_columns)

    # If creating template, set required font
    from openpyxl.styles import Font

    # Load workbook to set styles
    wb = load_workbook(template_path) if os.path.exists(template_path) else Workbook()
    ws = wb.active

    # Set font for header and data rows
    chinese_font = Font(name='CESI宋体-GB18030', size=11)

    for row in ws.iter_rows():
        for cell in row:
            cell.font = chinese_font

    wb.save(template_path)

Template preservation rules:

Read template with formatting
Append data using pd.concat() or direct indexing
Save with to_excel(index=False, engine='openpyxl')
Template styling (widths, borders, colors, header styles) will be preserved
Font requirement: Use "CESI宋体-GB18030" for all Chinese text

Step 8: Generate Excel Report

# Combine template data with AI-processed reviews
df_final = pd.concat([df_template, df_processed], ignore_index=True)

# Save to Excel
output_file = f"{module_name}-代码走查报告-{yyyymmdd}.xlsx"
df_final.to_excel(output_file, index=False, engine='openpyxl')

Quick Reference

Two-Phase Workflow

Phase	Purpose	Tools
数据收集	Fetch PRs via `gh pr list/view`, extract original reviews	GitHub CLI, generator.py
AI分析	Analyze reviews, generate problem description & impact analysis	AI models (Claude, OpenAI, DeepSeek)
报表生成	Load template, append data, preserve formatting	pandas, openpyxl

AI Prompt Template Structure

AI_PROMPT_TEMPLATE = """
你是代码review专家。请分析以下GitHub review内容，并提供结构化分析：

**原始Review内容:**
{original_content}

**Review上下文:**
- PR标题: {pr_title}
- Review者: {reviewer}
- Review时间: {review_time}

**任务:**
1. 判断这是否为有效的代码review（不是简单的批准或自动评论）
2. 提取核心问题点
3. 分类问题类型（从15种类型中选择）
4. 评估严重程度（严重或一般）
5. 判断问题来源（commit log=1, 代码=2, 注释=3）
   - 默认为"代码"（2）
   - 只有当review明确提到注释问题才选"注释"（3）
   - 只有当review明确提到commit信息/格式问题才选"commit log"（1）
6. 生成中文问题描述（≤50字）
7. 生成影响分析（≤20字）
8. 如果无明显影响，填"无"

**输出格式（JSON）：**
{{
  "is_valid_review": true/false,
  "problem_type": 1-15,
  "problem_source": 1/2/3,
  "problem_description": "中文问题描述≤50字",
  "impact_analysis": "中文影响分析≤20字（或"无"）",
  "confidence": high/medium/low,
  "reasoning": "简要说明判断依据"
}
"""

Excel Report Columns (13 total)

列名	说明	来源
序号	自增序号（1,2,3...）	Auto-increment
包名	--module-name 参数	用户指定
仓库地址	`https://github.com/{repo}`	GitHub URL
代码提交地址	`pr['url']`	PR链接
问题来源	映射表（1-3，默认为代码2）	commit log(1) / 代码(2) / 注释(3)
问题描述	AI生成的中文描述（≤50字）	AI分析结果
严重程度	问题类型8(安全)、12(内存)="严重"，其他="一般"	AI判断
影响分析	AI生成的影响分析（≤20字或"无"）	AI分析结果
问题类型	映射表（1-15）	AI分类结果
提出人	reviewer登录名（提出review备注的人）	GitHub API
提出时间	YYYY-MM-DD格式	review['submittedAt']
解决人	mergedBy登录名（提交PR的人）	GitHub API
计划解决时间	mergedAt YYYY-MM-DD	GitHub API
实际解决时间	mergedAt YYYY-MM-DD	GitHub API
提出人确认是否验收通过	固定值"是"	硬编码
问题状态	merged="Close", not merged="Open"	判断逻辑

Time Range Examples

Expression	Meaning
`this month`	1st of current month to today
`last month`	1st of last month to last day of last month
`last week`	Last Monday to last Sunday
`this week`	Last Monday to today
`last 15 days`	15 days ago to today

gh PR Commands

# Basic query
gh pr list --repo linuxdeepin/dde-cooperation --state merged

# With date and branch filters
gh pr list --repo linuxdeepin/dde-cooperation --state merged \
  --search "merged:2026-01-01..2026-01-31 base:master"

# With JSON output
gh pr list --repo linuxdeepin/dde-cooperation --state merged \
  --json number,title,author,createdAt,mergedAt,mergedBy,baseRefName,url,reviews,comments

# Fetch single PR details
gh pr view 123 --repo linuxdeepin/dde-cooperation \
  --json reviews,comments

Common Mistakes

Mistake	Why It's Wrong	Fix
Not using AI for review analysis	AI analysis is the core feature, not optional	Use AI to analyze each review and generate structured fields
Creating Excel without template	Loses formatting consistency	Read template first, then append data
Summarizing reviews manually	Misses AI's intelligent analysis	Let AI do the analysis
Not preserving template formatting	Template has styling for a reason	Use pd.concat() to append, preserve formatting
Mixing data collection with AI analysis	Two separate phases	Separate concerns: first collect all data, then analyze with AI
Not validating reviews before AI	AI can judge validity itself	Send all to AI, let it determine validity
Processing reviews individually	Slows down AI	Batch process: collect all, then send to AI as array
Missing error handling	AI may fail	Try-catch and provide meaningful error messages
Wrong default problem source value	Default should be "代码"(2), not "注释"(3)	Use result.get('problem_source', 2) and map correctly
Using Chinese for "问题状态"	Must be English "Close" or "Open"	Change merged="关闭" to merged="Close"
Not filtering version update PRs	Clutters report with non-code changes	Filter out "Update version to X.Y.Z" PRs
Including deepin-ci-robot commits	Automated commits are not real reviews	Exclude deepin-ci-robot entirely
Not prioritizing human reviews	AI reviews are lower quality than humans	Filter out AI reviews when human reviews exist
Not using correct Chinese font	Text rendering may break	Use "CESI宋体-GB18030" for all Chinese text

Rationalization Table

Excuse	Reality
"AI integration is complex"	Two-phase design is cleaner and more powerful
"Template file is unused"	It preserves formatting and ensures consistency
"AI might be slow"	AI analysis is worth the quality improvement
"Can just use keyword matching"	Simple matching misses nuance
"Need to implement AI model"	Use available AI tools or services
"JSON parsing is error-prone"	AI returns structured JSON
"Chinese text works with default font"	Chinese characters may not render correctly
"deepin-ci-robot commits should be included"	They are automated, not human reviews
"AI reviews are useful too"	Human reviews have higher quality and priority
"Version updates need review too"	They are routine changes, not code review

Red Flags - STOP and Start Over

Creating Excel without reading template
Processing reviews before AI analysis (should be two-phase)
Generating "问题描述" and "影响分析" without AI
Not preserving template formatting when using it
Validating reviews in data collection phase (AI should do it)
Not batching AI requests (slow and inefficient)
Using simple truncation instead of AI analysis
Changing reviewer/column specifications without skill update
Hardcoding problem source default value to anything other than "代码"(2)
Not letting AI determine problem source (use AI's judgment, don't assume)
Using Chinese for "问题状态" (must be "Close" or "Open")
Not ignoring deepin-ci-robot commits
Including AI reviews when human reviews exist
Not using "CESI宋体-GB18030" font for Chinese text

All of these mean: STOP. Re-read skill. Start over with correct two-phase approach.

github-review-report