Ethics, Safety & Impact Assessment

Workflow
Common Patterns
Guardrails
Quick Reference

Example

Scenario: Launching credit scoring algorithm for loan approvals

Stakeholders: Loan applicants (diverse demographics), lenders, society (economic mobility)
Harms: Disparate impact from historical bias, opacity preventing appeals, feedback loops perpetuating denials
Vulnerable groups: Racial minorities, immigrants with thin credit files, young adults, people in poverty
Mitigations: Fairness audit across protected classes, reason codes + appeals, alternative data (rent/utilities), human review for edge cases
Monitoring: Approval rate parity within 10% across groups; if disparate impact >20%, escalate to ethics committee

Workflow

Copy this checklist and track your progress:

Ethics & Safety Assessment Progress:
- [ ] Step 1: Map stakeholders and identify vulnerable groups
- [ ] Step 2: Analyze potential harms and benefits
- [ ] Step 3: Assess fairness and differential impacts
- [ ] Step 4: Evaluate severity and likelihood
- [ ] Step 5: Design mitigations and safeguards
- [ ] Step 6: Define monitoring and escalation protocols

Step 1: Map stakeholders and identify vulnerable groups

Identify all affected parties (direct users, indirect, society). Prioritize vulnerable populations most at risk. See resources/template.md for stakeholder analysis framework.

Step 2: Analyze potential harms and benefits

Brainstorm what could go wrong (harms) and what value is created (benefits) for each stakeholder group. See resources/template.md for structured analysis.

Step 3: Assess fairness and differential impacts

Evaluate whether outcomes, treatment, or access differ across groups. Check for disparate impact. See resources/methodology.md for fairness criteria and measurement.

Step 4: Evaluate severity and likelihood

Score each harm on severity (1-5) and likelihood (1-5), prioritize high-risk combinations. See resources/template.md for prioritization framework.

Step 5: Design mitigations and safeguards

For high-priority harms, propose design changes, policy safeguards, oversight mechanisms. See resources/methodology.md for intervention types.

Step 6: Define monitoring and escalation protocols

Set metrics, thresholds, review cadence, escalation triggers. Validate using resources/evaluators/rubric_ethics_safety_impact.json. Minimum standard: Average score ≥ 3.5.

Common Patterns

Pattern 1: Algorithm Fairness Audit

Stakeholders: Users receiving algorithmic decisions (hiring, lending, content ranking), protected groups
Harms: Disparate impact (bias against protected classes), feedback loops amplifying inequality, opacity preventing accountability
Assessment: Test for demographic parity, equalized odds, calibration across groups; analyze training data for historical bias
Mitigations: Debiasing techniques, fairness constraints, explainability, human review for edge cases, regular audits
Monitoring: Disparate impact ratio, false positive/negative rates by group, user appeals and overturn rates

Pattern 2: Data Privacy & Consent

Stakeholders: Data subjects (users whose data is collected), vulnerable groups (children, marginalized communities)
Harms: Privacy violations, surveillance, data breaches, lack of informed consent, secondary use without permission, re-identification risk
Assessment: Map data flows (collection → storage → use → sharing), identify sensitive attributes (PII, health, location), consent adequacy
Mitigations: Data minimization (collect only necessary), anonymization/differential privacy, granular consent, user data controls (export, delete), encryption
Monitoring: Breach incidents, data access logs, consent withdrawal rates, user data requests (GDPR, CCPA)

Pattern 3: Content Moderation & Free Expression

Stakeholders: Content creators, viewers, vulnerable groups (targets of harassment), society (information integrity)
Harms: Over-moderation (silencing legitimate speech, especially marginalized voices), under-moderation (allowing harm, harassment, misinformation), inconsistent enforcement
Assessment: Analyze moderation error rates (false positives/negatives), differential enforcement across groups, cultural context sensitivity
Mitigations: Clear policies with examples, appeals process, human review, diverse moderators, cultural context training, transparency reports
Monitoring: Moderation volume and error rates by category, appeal overturn rates, disparate enforcement across languages/regions

Pattern 4: Accessibility & Inclusive Design

Stakeholders: Users with disabilities (visual, auditory, motor, cognitive), elderly, low-literacy, low-bandwidth users
Harms: Exclusion (cannot use product), degraded experience, safety risks (cannot access critical features), digital divide
Assessment: WCAG compliance audit, assistive technology testing, user research with diverse abilities, cross-cultural usability
Mitigations: Accessible design (WCAG AA/AAA), alt text, keyboard navigation, screen reader support, low-bandwidth mode, multi-language, plain language
Monitoring: Accessibility test coverage, user feedback from disability communities, task completion rates across abilities

Pattern 5: Safety-Critical Systems

Stakeholders: End users (patients, drivers, operators), vulnerable groups (children, elderly, compromised health), public safety
Harms: Physical harm (injury, death), psychological harm (trauma), property damage, cascade failures affecting many
Assessment: Failure mode analysis (FMEA), fault tree analysis, worst-case scenarios, edge cases that break assumptions
Mitigations: Redundancy, fail-safes, human oversight, rigorous testing (stress, chaos, adversarial), incident response plans, staged rollouts
Monitoring: Error rates, near-miss incidents, safety metrics (accidents, adverse events), user-reported issues, compliance audits

Guardrails

Identify vulnerable groups explicitly: Prioritize children, elderly, people with disabilities, marginalized/discriminated groups, low-income, low-literacy, geographically isolated, and politically targeted populations. If none are identified, look harder.
Consider second-order and long-term effects: Look for feedback loops (harm leads to disadvantage leads to more harm), normalization, precedent-setting, and accumulation of small harms over time. Ask "what happens next?"
Assess differential impact, not just average: A feature may help the average user but harm specific groups. Check for disparate impact (outcome differences across groups >20% is a red flag), intersectionality, and distributive justice.
Design mitigations before launch: Build safeguards into design, test with diverse users, use staged rollouts with monitoring, and pre-commit to audits. Reactive fixes come too late for those already harmed.
Provide transparency and recourse: At minimum, explain decisions, provide appeal mechanisms with human review, offer redress for harm, and maintain audit trails.
Monitor outcomes, not just intentions: Measure outcome disparities by group, user-reported harms, error rate distribution, and unintended consequences. Set thresholds that trigger review or shutdown.
Establish clear accountability and escalation: Define who reviews ethics risks before launch, who monitors post-launch, what triggers escalation, and who can halt harmful features.
Respect autonomy and consent: Provide informed choice in plain language, meaningful alternatives (not coerced consent), user control (opt out, delete data), and purpose limitation. Children and vulnerable groups need extra protections.

Common pitfalls:

❌ Assuming "we treat everyone the same" = fairness: Equal treatment of unequal groups perpetuates inequality. Fairness often requires differential treatment.
❌ Optimization without constraints: Maximizing engagement/revenue unconstrained leads to amplifying outrage, addiction, polarization. Set ethical boundaries.
❌ Moving fast and apologizing later: For safety/ethics, prevention > apology. Harms to vulnerable groups are not acceptable experiments.
❌ Privacy theater: Requiring consent without explaining risks, or making consent mandatory for service, is not meaningful consent.
❌ Sampling bias in testing: Testing only on employees (young, educated, English-speaking) misses how diverse users experience harm.
❌ Ethics washing: Performative statements without material changes. Impact assessments must change decisions, not just document them.

Quick Reference

Key resources:

resources/template.md: Stakeholder mapping, harm/benefit analysis, risk matrix, mitigation planning, monitoring framework
resources/methodology.md: Fairness metrics, privacy analysis, safety assessment, bias detection, participatory design
resources/evaluators/rubric_ethics_safety_impact.json: Quality criteria for stakeholder analysis, harm identification, mitigation design, monitoring

Stakeholder Priorities:

High-risk groups to always consider:

Children (<18, especially <13)
People with disabilities (visual, auditory, motor, cognitive)
Racial/ethnic minorities, especially historically discriminated groups
Low-income, unhoused, financially precarious
LGBTQ+, especially in hostile jurisdictions
Elderly (>65), especially digitally less-skilled
Non-English speakers, low-literacy
Political dissidents, activists, journalists in repressive contexts
Refugees, immigrants, undocumented
Mentally ill, cognitively impaired

Harm Categories:

Physical: Injury, death, health deterioration
Psychological: Trauma, stress, anxiety, depression, addiction
Economic: Lost income, debt, poverty, exclusion from opportunity
Social: Discrimination, harassment, ostracism, loss of relationships
Autonomy: Coercion, manipulation, loss of control, dignity violation
Privacy: Surveillance, exposure, data breach, re-identification
Reputational: Stigma, defamation, loss of standing
Epistemic: Misinformation, loss of knowledge access, filter bubbles
Political: Disenfranchisement, censorship, targeted repression

Fairness Definitions (choose appropriate for context):

Demographic parity: Outcome rates equal across groups (e.g., 40% approval rate for all)
Equalized odds: False positive and false negative rates equal across groups
Equal opportunity: True positive rate equal across groups (equal access to benefit)
Calibration: Predicted probabilities match observed frequencies for all groups
Individual fairness: Similar individuals treated similarly (Lipschitz condition)
Counterfactual fairness: Outcome same if sensitive attribute (race, gender) were different

Mitigation Strategies:

Prevent: Design change eliminates harm (e.g., don't collect sensitive data)
Reduce: Decrease likelihood or severity (e.g., rate limiting, friction for risky actions)
Detect: Monitor and alert when harm occurs (e.g., bias dashboard, anomaly detection)
Respond: Process to address harm when found (e.g., appeals, human review, compensation)
Safeguard: Redundancy, fail-safes, circuit breakers for critical failures
Transparency: Explain, educate, build understanding and trust
Empower: Give users control, choice, ability to opt out or customize

Monitoring Metrics:

Outcome disparities: Measure by protected class (approval rates, error rates, treatment quality)
Error distribution: False positives/negatives, who bears burden?
User complaints: Volume, categories, resolution rates, disparities
Engagement/retention: Differences across groups (are some excluded?)
Safety incidents: Volume, severity, affected populations
Consent/opt-outs: How many decline? Demographics of decliners?

Escalation Triggers:

Disparate impact >20% without justification
Safety incidents causing serious harm (injury, death)
Vulnerable group disproportionately affected (>2× harm rate)
User complaints spike (>2× baseline)
Press/regulator attention
Internal ethics concerns raised

When to escalate beyond this skill:

Legal compliance required (GDPR, ADA, Civil Rights Act, industry regulations)
Life-or-death safety-critical system (medical, transportation)
Children or vulnerable populations primary users
High controversy or political salience
Novel ethical terrain (new technology, no precedent) → Consult: Legal counsel, ethics board, domain experts, affected communities, regulators

Inputs required:

Feature or decision (what is being proposed? what changes?)
Affected groups (who is impacted? direct and indirect?)
Context (what problem does this solve? why now?)

Outputs produced:

ethics-safety-impact.md: Stakeholder analysis, harm/benefit assessment, fairness evaluation, risk prioritization, mitigation plan, monitoring framework, escalation protocol