traceability-matrix-generator
Traceability Matrix Generator
Build comprehensive traceability matrices linking requirements → design → implementation across the software development lifecycle.
What is a Traceability Matrix?
A traceability matrix documents relationships between:
- Requirements: What the system must do (user stories, specs, features)
- Design: How the system will be structured (architecture, APIs, components)
- Implementation: Where requirements are coded (functions, classes, modules)
Benefits:
- Ensure all requirements are implemented
- Identify missing implementations or tests
- Support compliance and auditing
- Track impact of requirement changes
- Find orphaned code without requirements
Workflow
Step 1: Identify and Collect Artifacts
Gather all traceability sources from the project.
Requirements Sources:
requirements.md,REQUIREMENTS.txt- User story documents
- Issue tracker exports (Jira, GitHub Issues)
- Product requirement documents (PRDs)
- Feature specifications
Design Sources:
DESIGN.md, architecture documents- API specifications (OpenAPI, Swagger)
- Database schemas
- UML diagrams, architecture diagrams
- Design decision records (ADRs)
Implementation Sources:
- Source code files (
*.py,*.java,*.js, etc.) - Module docstrings
- Function/class comments with requirement IDs
- Configuration files
Checklist:
- Locate requirements documents
- Find design documentation
- Identify source code directories
- Check for existing ID/tagging conventions
- Verify file access and permissions
Step 2: Extract Requirements
Parse requirements and assign unique identifiers.
Common Requirement Formats:
Markdown with IDs:
## REQ-001: User Authentication
The system shall allow users to log in with email and password.
## REQ-002: Password Reset
Users shall be able to reset forgotten passwords via email.
User Stories:
### US-123: As a user, I want to search products
So that I can find items quickly
**Acceptance Criteria:**
- Search box on homepage
- Results display in < 1 second
- Filter by category
Numbered Lists:
1. **REQ-AUTH-001**: System must support OAuth 2.0
2. **REQ-AUTH-002**: Sessions expire after 24 hours
3. **REQ-DATA-001**: Data must be encrypted at rest
Extraction Script (Python):
import re
from pathlib import Path
def extract_requirements(file_path):
"""Extract requirements with IDs from markdown file."""
requirements = []
with open(file_path, 'r') as f:
content = f.read()
# Pattern: REQ-XXX or US-XXX or similar
pattern = r'^#+\s*([A-Z]+-[A-Z0-9-]+):\s*(.+?)$'
for match in re.finditer(pattern, content, re.MULTILINE):
req_id = match.group(1)
req_title = match.group(2)
requirements.append({
'id': req_id,
'title': req_title,
'source': file_path.name,
'type': 'requirement'
})
return requirements
# Usage
reqs = extract_requirements(Path('requirements.md'))
for req in reqs:
print(f"{req['id']}: {req['title']}")
Manual Extraction:
If documents lack IDs, assign them:
Original: "Users can filter search results"
→ Assign: REQ-SEARCH-001: Users can filter search results
For detailed requirement extraction patterns, see references/extraction_patterns.md.
Step 3: Extract Design Artifacts
Identify design elements and link to requirements.
Design Linking Patterns:
Explicit References in Design Docs:
## Authentication Service (REQ-001, REQ-002)
**Architecture:**
- OAuth 2.0 provider integration (REQ-AUTH-001)
- Session management module (REQ-AUTH-002)
- Password reset workflow (REQ-002)
**API Endpoints:**
- `POST /auth/login` - Implements REQ-001
- `POST /auth/reset` - Implements REQ-002
API Specifications:
# openapi.yaml
paths:
/auth/login:
post:
summary: User login endpoint
x-requirements: [REQ-001, REQ-AUTH-001]
description: Implements user authentication
Architecture Diagrams:
[Component Diagram]
- AuthService → Implements REQ-001, REQ-002
- UserDatabase → Supports REQ-DATA-001
- EmailService → Enables REQ-002
Extraction Example:
def extract_design_links(design_file):
"""Extract design artifacts and linked requirements."""
design_artifacts = []
with open(design_file, 'r') as f:
content = f.read()
# Find headers with requirement references
pattern = r'^#+\s*(.+?)\s*\((.+?)\)$'
for match in re.finditer(pattern, content, re.MULTILINE):
artifact_name = match.group(1)
req_refs = match.group(2)
# Parse requirement IDs
req_ids = re.findall(r'[A-Z]+-[A-Z0-9-]+', req_refs)
design_artifacts.append({
'name': artifact_name,
'requirements': req_ids,
'source': design_file.name,
'type': 'design'
})
return design_artifacts
Step 4: Scan Implementation
Search source code for requirement references.
Code Annotation Patterns:
Docstrings (Python):
def authenticate_user(email, password):
"""Authenticate user credentials.
Implements: REQ-001, REQ-AUTH-001
Args:
email: User email address
password: User password
Returns:
Authentication token if successful
"""
# Implementation...
Comments (Java):
/**
* User authentication service
* @implements REQ-001 User login
* @implements REQ-AUTH-001 OAuth support
*/
public class AuthenticationService {
// Implementation...
}
Comments (JavaScript):
/**
* Password reset functionality
* Implements: REQ-002
*/
function resetPassword(email) {
// Implementation...
}
Scanning Script:
def scan_code_for_requirements(code_dir):
"""Scan source code for requirement references."""
implementations = []
for file_path in Path(code_dir).rglob('*.py'):
with open(file_path, 'r') as f:
content = f.read()
# Find requirement references in comments/docstrings
matches = re.finditer(
r'(?:Implements?|Satisfies|Covers):\s*([A-Z]+-[A-Z0-9-]+(?:,\s*[A-Z]+-[A-Z0-9-]+)*)',
content,
re.IGNORECASE
)
for match in matches:
req_ids = [r.strip() for r in match.group(1).split(',')]
# Find containing function/class
lines_before = content[:match.start()].split('\n')
for i in range(len(lines_before) - 1, -1, -1):
if 'def ' in lines_before[i] or 'class ' in lines_before[i]:
code_element = lines_before[i].strip()
break
else:
code_element = "Unknown"
implementations.append({
'file': str(file_path),
'element': code_element,
'requirements': req_ids,
'type': 'implementation'
})
return implementations
For comprehensive code scanning patterns, see references/code_scanning.md.
Step 5: Build the Traceability Matrix
Combine all extracted data into a structured matrix.
Data Structure:
traceability_matrix = {
'REQ-001': {
'requirement': {
'id': 'REQ-001',
'title': 'User Authentication',
'source': 'requirements.md'
},
'design': [
{
'name': 'Authentication Service',
'source': 'design.md'
}
],
'implementation': [
{
'file': 'auth/service.py',
'element': 'def authenticate_user()'
}
]
},
# ... more requirements
}
Building Script:
def build_traceability_matrix(requirements, design_artifacts, implementations):
"""Build complete traceability matrix."""
matrix = {}
# Initialize with requirements
for req in requirements:
matrix[req['id']] = {
'requirement': req,
'design': [],
'implementation': []
}
# Link design artifacts
for design in design_artifacts:
for req_id in design.get('requirements', []):
if req_id in matrix:
matrix[req_id]['design'].append(design)
# Link implementations
for impl in implementations:
for req_id in impl.get('requirements', []):
if req_id in matrix:
matrix[req_id]['implementation'].append(impl)
return matrix
Step 6: Generate Output Formats
Export matrix in multiple formats for different audiences.
Markdown Table:
# Traceability Matrix
| Requirement | Title | Design | Implementation | Status |
|-------------|-------|--------|----------------|--------|
| REQ-001 | User Authentication | Authentication Service | auth/service.py::authenticate_user() | ✓ Complete |
| REQ-002 | Password Reset | Auth Service | auth/service.py::reset_password() | ✓ Complete |
| REQ-003 | Data Encryption | - | - | ⚠ Missing |
Generation Script:
def generate_markdown_table(matrix):
"""Generate markdown traceability table."""
lines = [
"# Traceability Matrix\n",
"| Requirement | Title | Design | Implementation | Status |",
"|-------------|-------|--------|----------------|--------|"
]
for req_id, data in sorted(matrix.items()):
req = data['requirement']
design = ', '.join([d['name'] for d in data['design']]) or '-'
impl = ', '.join([f"{i['file']}::{i['element']}" for i in data['implementation']]) or '-'
# Determine status
if data['design'] and data['implementation']:
status = '✓ Complete'
elif data['design'] or data['implementation']:
status = '⚠ Partial'
else:
status = '❌ Missing'
lines.append(f"| {req_id} | {req['title']} | {design} | {impl} | {status} |")
return '\n'.join(lines)
CSV Export:
import csv
def generate_csv(matrix, output_file):
"""Generate CSV traceability matrix."""
with open(output_file, 'w', newline='') as f:
writer = csv.writer(f)
# Header
writer.writerow([
'Requirement ID',
'Title',
'Source',
'Design Artifacts',
'Implementation Files',
'Status'
])
# Data rows
for req_id, data in sorted(matrix.items()):
req = data['requirement']
design_str = '; '.join([d['name'] for d in data['design']])
impl_str = '; '.join([f"{i['file']}" for i in data['implementation']])
if data['design'] and data['implementation']:
status = 'Complete'
elif data['design'] or data['implementation']:
status = 'Partial'
else:
status = 'Missing'
writer.writerow([
req_id,
req['title'],
req['source'],
design_str,
impl_str,
status
])
HTML Interactive Visualization:
def generate_html_visualization(matrix, output_file):
"""Generate interactive HTML traceability matrix."""
html = """
<!DOCTYPE html>
<html>
<head>
<title>Traceability Matrix</title>
<style>
table { border-collapse: collapse; width: 100%; }
th, td { border: 1px solid #ddd; padding: 8px; text-align: left; }
th { background-color: #4CAF50; color: white; }
tr:nth-child(even) { background-color: #f2f2f2; }
.complete { color: green; }
.partial { color: orange; }
.missing { color: red; }
.filter { margin: 20px 0; }
</style>
</head>
<body>
<h1>Traceability Matrix</h1>
<div class="filter">
<label>Filter by status:</label>
<select id="statusFilter" onchange="filterTable()">
<option value="all">All</option>
<option value="complete">Complete</option>
<option value="partial">Partial</option>
<option value="missing">Missing</option>
</select>
</div>
<table id="matrixTable">
<thead>
<tr>
<th>Requirement</th>
<th>Title</th>
<th>Design</th>
<th>Implementation</th>
<th>Status</th>
</tr>
</thead>
<tbody>
"""
for req_id, data in sorted(matrix.items()):
req = data['requirement']
design = '<br>'.join([d['name'] for d in data['design']]) or '-'
impl = '<br>'.join([f"{i['file']}" for i in data['implementation']]) or '-'
if data['design'] and data['implementation']:
status_class = 'complete'
status_text = '✓ Complete'
elif data['design'] or data['implementation']:
status_class = 'partial'
status_text = '⚠ Partial'
else:
status_class = 'missing'
status_text = '❌ Missing'
html += f"""
<tr class="{status_class}">
<td>{req_id}</td>
<td>{req['title']}</td>
<td>{design}</td>
<td>{impl}</td>
<td class="{status_class}">{status_text}</td>
</tr>
"""
html += """
</tbody>
</table>
<script>
function filterTable() {
var filter = document.getElementById('statusFilter').value;
var rows = document.querySelectorAll('#matrixTable tbody tr');
rows.forEach(function(row) {
if (filter === 'all' || row.classList.contains(filter)) {
row.style.display = '';
} else {
row.style.display = 'none';
}
});
}
</script>
</body>
</html>
"""
with open(output_file, 'w') as f:
f.write(html)
Step 7: Analyze and Report Gaps
Identify incomplete traceability and generate recommendations.
Gap Analysis:
def analyze_gaps(matrix):
"""Identify gaps in traceability."""
gaps = {
'missing_design': [], # Requirements without design
'missing_implementation': [], # Requirements without code
'complete': [], # Fully traced requirements
'orphaned_code': [] # Code without requirements (if tracked)
}
for req_id, data in matrix.items():
req = data['requirement']
if not data['design'] and not data['implementation']:
# Completely untraced
gaps['missing_design'].append(req_id)
gaps['missing_implementation'].append(req_id)
elif not data['design']:
gaps['missing_design'].append(req_id)
elif not data['implementation']:
gaps['missing_implementation'].append(req_id)
else:
gaps['complete'].append(req_id)
return gaps
def generate_gap_report(gaps):
"""Generate gap analysis report."""
report = ["# Traceability Gap Analysis\n"]
report.append(f"## Summary")
report.append(f"- ✓ Complete: {len(gaps['complete'])} requirements")
report.append(f"- ⚠ Missing Design: {len(gaps['missing_design'])} requirements")
report.append(f"- ⚠ Missing Implementation: {len(gaps['missing_implementation'])} requirements\n")
if gaps['missing_design']:
report.append("## Requirements Without Design")
for req_id in gaps['missing_design']:
report.append(f"- {req_id}")
report.append("")
if gaps['missing_implementation']:
report.append("## Requirements Without Implementation")
for req_id in gaps['missing_implementation']:
report.append(f"- {req_id}")
report.append("")
report.append("## Recommendations")
if gaps['missing_design']:
report.append("- Create design documents for undesigned requirements")
if gaps['missing_implementation']:
report.append("- Implement missing requirements or update requirement status")
return '\n'.join(report)
Coverage Metrics:
def calculate_coverage(matrix):
"""Calculate traceability coverage metrics."""
total = len(matrix)
with_design = sum(1 for data in matrix.values() if data['design'])
with_impl = sum(1 for data in matrix.values() if data['implementation'])
complete = sum(1 for data in matrix.values()
if data['design'] and data['implementation'])
return {
'total_requirements': total,
'design_coverage': (with_design / total * 100) if total > 0 else 0,
'implementation_coverage': (with_impl / total * 100) if total > 0 else 0,
'complete_coverage': (complete / total * 100) if total > 0 else 0
}
Complete Example
from pathlib import Path
# Step 1: Collect artifacts
req_file = Path('requirements.md')
design_file = Path('design.md')
code_dir = Path('src/')
# Step 2-4: Extract data
requirements = extract_requirements(req_file)
design_artifacts = extract_design_links(design_file)
implementations = scan_code_for_requirements(code_dir)
# Step 5: Build matrix
matrix = build_traceability_matrix(requirements, design_artifacts, implementations)
# Step 6: Generate outputs
with open('traceability.md', 'w') as f:
f.write(generate_markdown_table(matrix))
generate_csv(matrix, 'traceability.csv')
generate_html_visualization(matrix, 'traceability.html')
# Step 7: Analyze gaps
gaps = analyze_gaps(matrix)
coverage = calculate_coverage(matrix)
print(f"Coverage: {coverage['complete_coverage']:.1f}% complete")
print(generate_gap_report(gaps))
Tips
- Establish ID conventions early: Use consistent prefixes (REQ-, US-, FEAT-)
- Automate where possible: Use scripts for large codebases
- Keep matrix updated: Regenerate when requirements or code changes
- Tag code consistently: Enforce requirement references in code reviews
- Start simple: Begin with requirements→implementation, add design later
- Use version control: Track matrix changes alongside code changes
- Integrate with CI/CD: Auto-generate matrix on commits
Common Use Cases
Compliance Auditing:
- Demonstrate all requirements are implemented
- Provide evidence for regulatory reviews
- Track safety-critical requirement coverage
Impact Analysis:
- Identify affected code when requirements change
- Find which tests need updating
- Assess scope of feature modifications
Quality Assurance:
- Verify implementation completeness
- Ensure no orphaned code
- Track requirement fulfillment
References
For detailed information:
- references/extraction_patterns.md - Requirement parsing patterns for various formats
- references/code_scanning.md - Code annotation conventions and scanning techniques