change-management
Installation
SKILL.md
Change Management
Implement structured change management processes covering change classification, CAB workflows, emergency change procedures, and automation for compliance with SOC 2, ITIL, and regulatory frameworks.
When to Use
- Establishing change management processes for production environments
- Implementing change advisory board (CAB) workflows
- Defining change classification and approval requirements
- Configuring automated change tracking in CI/CD pipelines
- Handling emergency changes with proper controls and documentation
Change Classification
change_types:
standard:
risk: Low
approval: Pre-approved (no per-change approval needed)
lead_time: None (within maintenance window)
examples:
- Routine patching within tested patch sets
- Certificate rotation with established procedure
- Scaling operations (adding/removing instances within limits)
- Pre-approved configuration changes
- Log rotation and archival
requirements:
- Change must match an approved Standard Change template
- Automated testing must pass
- Documented rollback procedure exists
- Within defined maintenance window
normal_low:
risk: Low
approval: Peer review (1 approver)
lead_time: 2 business days
examples:
- Non-critical configuration changes
- Feature flag toggles
- Documentation updates to production systems
- Adding monitoring dashboards or alerts
normal_medium:
risk: Medium
approval: Team lead + peer review (2 approvers)
lead_time: 5 business days
examples:
- Application deployments with new features
- Database schema changes (non-breaking)
- Network rule modifications
- Integration endpoint changes
- Dependency version upgrades
normal_high:
risk: High
approval: CAB review required
lead_time: 10 business days
examples:
- Infrastructure migrations
- Breaking database schema changes
- Major version upgrades (OS, runtime, database engine)
- Changes to authentication or authorization systems
- Multi-service coordinated deployments
- Changes affecting data processing or compliance controls
emergency:
risk: Variable
approval: Emergency CAB (minimum 2 approvers from on-call)
lead_time: None (immediate implementation)
examples:
- Security vulnerability remediation (active exploitation)
- Production outage resolution
- Data integrity emergency fixes
- Regulatory compliance deadline fixes
requirements:
- Retroactive full documentation within 48 hours
- Post-implementation review required
- CAB retroactive review at next meeting
Change Request Template
change_request:
metadata:
id: "CR-YYYY-NNNN"
title: ""
requestor: ""
date_submitted: ""
target_date: ""
change_type: "" # standard | normal_low | normal_medium | normal_high | emergency
description:
summary: "Brief description of the change"
detailed_description: "Full technical details of what will change"
business_justification: "Why this change is needed"
affected_systems: []
affected_services: []
affected_users: "Description of user impact"
risk_assessment:
risk_level: "" # low | medium | high
impact_if_failed: "What happens if the change fails"
likelihood_of_failure: "" # low | medium | high
risk_mitigation: "Steps to reduce risk"
dependencies: "Other systems or changes this depends on"
implementation:
change_window:
start: ""
end: ""
maintenance_window: true
implementation_steps:
- step: "Step 1 description"
responsible: "Person/team"
estimated_duration: "X minutes"
- step: "Step 2 description"
responsible: "Person/team"
estimated_duration: "X minutes"
testing:
pre_change_testing:
- "Unit tests pass"
- "Integration tests pass"
- "Staging deployment verified"
post_change_verification:
- "Health check endpoints responding"
- "Key transactions processing successfully"
- "No error rate increase in monitoring"
- "Performance metrics within baseline"
rollback:
rollback_plan: "Detailed steps to revert the change"
rollback_trigger: "Conditions that trigger rollback"
rollback_estimated_time: "X minutes"
rollback_steps:
- "Step 1: Revert deployment to previous version"
- "Step 2: Verify rollback successful"
- "Step 3: Notify stakeholders"
data_rollback: "Describe any data migration rollback needed"
communication:
stakeholders_notified: []
notification_sent_date: ""
status_page_update: true
customer_notification_required: false
approvals:
technical_reviewer: ""
technical_approval_date: ""
security_reviewer: ""
security_approval_date: ""
cab_approval_date: ""
cab_notes: ""
closure:
implementation_date: ""
implementation_result: "" # success | partial | failed | rolled_back
post_implementation_review: ""
lessons_learned: ""
follow_up_actions: []
CAB Workflow
cab_workflow:
meeting_schedule:
regular_cab: "Weekly, Thursday 2:00 PM"
emergency_cab: "On-demand, minimum 2 members required"
cab_members:
permanent:
- Engineering Manager (Chair)
- Security Team Representative
- Infrastructure/SRE Lead
- Release Manager
advisory:
- Business stakeholder (invited per change)
- Database administrator (for DB changes)
- Network engineer (for network changes)
agenda:
1: "Review emergency changes from prior week"
2: "Review high-risk change requests for upcoming window"
3: "Review failed changes and lessons learned"
4: "Discuss upcoming change freeze periods"
5: "Review change metrics and trends"
decision_criteria:
approve_when:
- Risk assessment is complete and accurate
- Testing evidence is provided
- Rollback plan is documented and feasible
- Change window is appropriate
- Required approvals obtained
- No conflicts with other scheduled changes
request_changes_when:
- Rollback plan is missing or incomplete
- Testing is insufficient for the risk level
- Impact assessment needs clarification
- Change conflicts with another scheduled change
deny_when:
- Risk is unacceptable without mitigation
- Change window conflicts with freeze period
- Dependencies are not resolved
- Compliance concerns are unaddressed
Emergency Change Procedure
emergency_change_process:
definition: "A change required to restore service or prevent imminent security compromise"
step_1_declare:
actions:
- On-call engineer identifies need for emergency change
- Incident commander approves emergency classification
- Minimum 2 approvers from emergency CAB roster contacted
- Document initial justification in incident channel
step_2_approve:
approval_method:
- Slack/Teams approval with screenshots preserved
- Verbal approval over bridge call (documented in notes)
- Emergency approvers can be any 2 of the following roles:
- Engineering Manager
- SRE/Infrastructure Lead
- Security Team Lead
- VP of Engineering
timeout: "If no response in 15 minutes, escalate to next tier"
step_3_implement:
actions:
- Implement the minimum change needed to resolve the issue
- Record all actions taken with timestamps
- Monitor for successful resolution
- Document any deviations from planned change
step_4_verify:
actions:
- Confirm service restoration
- Verify no unintended side effects
- Run post-change verification checks
- Update status page and stakeholders
step_5_document:
deadline: "Within 48 hours of implementation"
required_documentation:
- Complete change request form (retroactive)
- Timeline of events and actions
- Justification for emergency classification
- Approval records (messages, emails)
- Post-implementation verification results
- Root cause analysis (what made it an emergency)
- Preventive actions to avoid future emergency
step_6_review:
actions:
- CAB review at next regular meeting
- Assess if emergency classification was appropriate
- Identify process improvements
- Track emergency change trends
Pull Request Template for Changes
## Change Request
### Type
- [ ] Standard (pre-approved, low risk)
- [ ] Normal - Low Risk
- [ ] Normal - Medium Risk
- [ ] Normal - High Risk (CAB required)
- [ ] Emergency (retroactive documentation required)
### Description
<!-- What is being changed and why? -->
### Risk Assessment
**Impact if failed:** <!-- What breaks? -->
**Likelihood of failure:** Low / Medium / High
**Affected services:** <!-- List services -->
**User impact:** <!-- Will users notice? -->
### Testing Evidence
- [ ] Unit tests pass
- [ ] Integration tests pass
- [ ] Staging deployment verified
- [ ] Performance test completed (if applicable)
- [ ] Security scan clean (if applicable)
### Rollback Plan
<!-- How to revert if something goes wrong -->
**Estimated rollback time:** <!-- X minutes -->
**Data rollback needed:** Yes / No
### Deployment Plan
**Target window:** <!-- Date and time -->
**Estimated duration:** <!-- X minutes -->
### Post-Deployment Verification
- [ ] Health checks passing
- [ ] Error rates within baseline
- [ ] Key transactions working
- [ ] Monitoring dashboards reviewed
### Communication
- [ ] Team notified
- [ ] Stakeholders notified (if user-facing)
- [ ] Status page updated (if applicable)
### Approvals Required
- [ ] Peer review
- [ ] Team lead (medium+ risk)
- [ ] Security review (security-impacting changes)
- [ ] CAB approval (high risk)
CI/CD Change Tracking Automation
# GitHub Actions - Automated change tracking
name: Change Management
on:
pull_request:
types: [opened, synchronize, labeled]
push:
branches: [main]
jobs:
classify-change:
if: github.event_name == 'pull_request'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Classify change risk
id: classify
run: |
FILES_CHANGED=$(gh pr diff ${{ github.event.pull_request.number }} --name-only)
# High risk indicators
if echo "$FILES_CHANGED" | grep -qE 'terraform/|infrastructure/|migrations/|auth/|security/'; then
echo "risk=high" >> $GITHUB_OUTPUT
echo "::warning::High-risk change detected - CAB review may be required"
elif echo "$FILES_CHANGED" | grep -qE 'config/|database/|api/'; then
echo "risk=medium" >> $GITHUB_OUTPUT
else
echo "risk=low" >> $GITHUB_OUTPUT
fi
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Add risk label
run: |
gh pr edit ${{ github.event.pull_request.number }} \
--add-label "risk:${{ steps.classify.outputs.risk }}"
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Enforce approvals by risk
if: steps.classify.outputs.risk == 'high'
run: |
APPROVALS=$(gh pr view ${{ github.event.pull_request.number }} \
--json reviews --jq '[.reviews[] | select(.state=="APPROVED")] | length')
if [ "$APPROVALS" -lt 2 ]; then
echo "::error::High-risk changes require at least 2 approvals"
exit 1
fi
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
record-deployment:
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
steps:
- name: Record deployment
run: |
CHANGE_ID="CR-$(date +%Y)-$(printf '%04d' ${{ github.run_number }})"
echo "Change ID: $CHANGE_ID"
echo "Deployed at: $(date -u +%Y-%m-%dT%H:%M:%SZ)"
echo "Commit: ${{ github.sha }}"
echo "Author: ${{ github.actor }}"
cat > /tmp/deployment-record.json <<EOF
{
"change_id": "$CHANGE_ID",
"timestamp": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
"commit": "${{ github.sha }}",
"author": "${{ github.actor }}",
"environment": "production",
"status": "deployed"
}
EOF
Change Freeze Policy
change_freeze:
definition: "Period during which non-emergency changes are prohibited"
scheduled_freezes:
year_end:
start: "December 15"
end: "January 3"
scope: "All production changes"
major_events:
- "Black Friday through Cyber Monday (e-commerce)"
- "Tax filing deadline periods (financial services)"
- "Open enrollment periods (healthcare)"
exceptions_during_freeze:
allowed:
- Security patches for actively exploited vulnerabilities
- Changes required by regulatory deadline
- Fixes for P1/SEV1 production incidents
approval: "VP of Engineering + Security Lead"
communication:
announcement: "2 weeks before freeze"
reminder: "1 week and 1 day before freeze"
daily_status: "During freeze period"
lift_notification: "When freeze ends"
Change Management Metrics
metrics:
change_success_rate:
description: "Percentage of changes implemented without rollback or incident"
target: ">95%"
formula: "(successful changes / total changes) * 100"
emergency_change_rate:
description: "Percentage of changes classified as emergency"
target: "<5%"
formula: "(emergency changes / total changes) * 100"
rollback_rate:
description: "Percentage of changes that required rollback"
target: "<3%"
mean_time_to_implement:
description: "Average time from approval to implementation"
target: "Varies by type"
cab_approval_time:
description: "Average time from submission to CAB decision"
target: "<5 business days for normal changes"
Change Management Checklist
change_management_checklist:
process_setup:
- [ ] Change types defined with classification criteria
- [ ] Approval matrix documented (who approves what)
- [ ] CAB established with regular meeting schedule
- [ ] Emergency change procedure documented
- [ ] Change request template created
- [ ] Change freeze policy defined
tooling:
- [ ] PR template includes change management fields
- [ ] Automated risk classification in CI/CD
- [ ] Branch protection enforces required approvals
- [ ] Deployment records captured automatically
- [ ] Change audit trail preserved (PR history, approvals)
compliance:
- [ ] All production changes have documented approval
- [ ] Rollback plans exist for every change
- [ ] Post-implementation reviews conducted for failures
- [ ] Emergency changes documented retroactively within 48 hours
- [ ] Change metrics reported monthly
- [ ] Audit trail retained for compliance period (1-3 years)
Best Practices
- Classify changes by risk level to apply proportionate controls without slowing low-risk work
- Automate risk classification based on files changed, services affected, and deployment scope
- Use PR approvals as the native change approval mechanism for code-driven changes
- Require rollback plans for every change and test rollback procedures periodically
- Track emergency changes as a key metric: a high rate indicates systemic process issues
- Implement change freezes during critical business periods to protect stability
- Conduct post-implementation reviews for all failed changes to drive improvement
- Separate duty of implementation from duty of approval (no self-approving changes)
- Capture deployment records automatically in CI/CD rather than relying on manual entry
- Keep the CAB focused on high-risk decisions; do not bottleneck low-risk changes through CAB
Weekly Installs
29
Repository
bagelhole/devop…t-skillsGitHub Stars
18
First Seen
5 days ago
Security Audits