web-archive-analysis
This skill uses Claude hooks which can execute code automatically in response to events. Review carefully before installing.
Web Archive Analysis Skill
Purpose
Query the Wayback Machine to discover historical technology usage and detect technology migrations over time.
Operations
1. query_cdx_api
Get historical snapshots from the Wayback Machine CDX API.
Endpoint:
GET http://web.archive.org/cdx/search/cdx
Parameters:
url: {domain}
output: json
filter: statuscode:200
collapse: timestamp:6 # Group by month (YYYYMM)
limit: 100
from: {start_year}
to: {end_year}
Example Request:
curl "http://web.archive.org/cdx/search/cdx?url=example.com&output=json&filter=statuscode:200&collapse=timestamp:6&limit=100"
Response Format:
[
["urlkey", "timestamp", "original", "mimetype", "statuscode", "digest", "length"],
["com,example)/", "20240115120000", "https://example.com/", "text/html", "200", "ABC123...", "45678"]
]
2. select_snapshots
Choose representative snapshots for analysis.
Selection Strategy:
def select_snapshots(all_snapshots):
# Get snapshots at regular intervals
intervals = [
"6 months ago",
"1 year ago",
"2 years ago",
"3 years ago",
"5 years ago"
]
selected = []
for interval in intervals:
target_date = calculate_date(interval)
closest = find_closest_snapshot(all_snapshots, target_date)
if closest:
selected.append(closest)
return selected
Snapshot Priority:
- Recent (baseline for comparison)
- 1 year ago (detect recent changes)
- 2-3 years ago (medium-term evolution)
- 5+ years ago (historical context)
3. fetch_archived_content
Retrieve archived pages for analysis.
Wayback URL Format:
https://web.archive.org/web/{timestamp}/{original_url}
Example:
https://web.archive.org/web/20230115120000/https://example.com/
Headers to Request:
Accept: text/html
User-Agent: TechStackAgent/1.0 (OSINT research)
4. compare_snapshots
Detect technology changes between snapshots.
Comparison Points:
{
"headers_to_compare": [
"Server",
"X-Powered-By",
"Set-Cookie"
],
"html_elements": [
"meta[name=generator]",
"script[src]",
"link[href]"
],
"patterns_to_track": [
"/wp-content/",
"/_next/",
"/_nuxt/",
"/static/js/"
]
}
Change Detection:
def detect_changes(old_snapshot, new_snapshot):
changes = []
# Compare technologies
old_tech = extract_technologies(old_snapshot)
new_tech = extract_technologies(new_snapshot)
added = new_tech - old_tech
removed = old_tech - new_tech
for tech in added:
changes.append({
"type": "technology_added",
"technology": tech,
"first_seen": new_snapshot.timestamp
})
for tech in removed:
changes.append({
"type": "technology_removed",
"technology": tech,
"last_seen": old_snapshot.timestamp
})
return changes
5. detect_migrations
Identify framework/platform migrations.
Common Migration Patterns:
{
"WordPress → Custom/React": {
"indicators": [
"/wp-content/ disappears",
"React globals appear",
"/_next/ or /static/js/ paths"
],
"typical_timeline": "6-18 months"
},
"AngularJS → Angular": {
"indicators": [
"ng-app disappears",
"ng-version appears",
"Angular 2+ patterns"
],
"typical_timeline": "12-24 months"
},
"jQuery → React/Vue": {
"indicators": [
"jQuery CDN removed",
"Modern framework globals",
"SPA patterns"
],
"typical_timeline": "6-12 months"
},
"On-prem → Cloud": {
"indicators": [
"CloudFront/Cloudflare headers appear",
"AWS/GCP/Azure signatures",
"CDN usage"
],
"typical_timeline": "3-12 months"
}
}
6. extract_historical_tech
Parse archived HTML for technology signals.
Process:
- Fetch archived page
- Apply same analysis as html_content_analysis skill
- Record technologies with timestamp
- Build timeline of technology usage
Output
{
"skill": "web_archive_analysis",
"domain": "string",
"results": {
"archive_coverage": {
"oldest_snapshot": "2015-03-15",
"newest_snapshot": "2024-01-10",
"total_snapshots": 450,
"snapshots_analyzed": 5
},
"snapshots_analyzed": [
{
"timestamp": "2024-01-10",
"url": "https://web.archive.org/web/20240110/...",
"technologies_detected": ["Next.js", "React", "Vercel"]
},
{
"timestamp": "2022-06-15",
"url": "https://web.archive.org/web/20220615/...",
"technologies_detected": ["React", "Create React App", "Heroku"]
},
{
"timestamp": "2020-01-20",
"url": "https://web.archive.org/web/20200120/...",
"technologies_detected": ["WordPress", "PHP"]
}
],
"technology_timeline": [
{
"technology": "WordPress",
"first_seen": "2015-03-15",
"last_seen": "2020-06-01",
"status": "removed"
},
{
"technology": "React",
"first_seen": "2020-03-01",
"last_seen": "present",
"status": "current"
},
{
"technology": "Next.js",
"first_seen": "2023-01-15",
"last_seen": "present",
"status": "current"
}
],
"migrations_detected": [
{
"type": "CMS → Modern Framework",
"from": "WordPress",
"to": "React/Next.js",
"approximate_date": "2020-Q1 to 2020-Q2",
"confidence": 85
},
{
"type": "Hosting Migration",
"from": "Heroku",
"to": "Vercel",
"approximate_date": "2023-Q1",
"confidence": 80
}
],
"current_vs_historical": {
"current_stack": ["Next.js", "React", "Vercel"],
"historical_stack": ["WordPress", "PHP", "Heroku"],
"major_changes": 2
}
},
"evidence": [
{
"type": "archived_snapshot",
"timestamp": "string",
"archive_url": "string",
"technologies": ["array"],
"analysis_timestamp": "ISO-8601"
}
]
}
Rate Limiting
- Wayback CDX API: 15 requests/minute
- Archived page fetches: 10/minute
- Cache CDX results to avoid repeated queries
Error Handling
- 404: Domain not archived
- 503: Wayback Machine overloaded - retry with backoff
- Timeout: Increase timeout for archived pages (can be slow)
- Continue with available snapshots on partial failures
Security Considerations
- Only access public archives
- Respect Wayback Machine rate limits
- Do not store archived content beyond analysis
- Note that archived content may contain outdated security vulnerabilities
- Log all queries for audit
Confidence Notes
Historical data provides contextual signals:
- Confirms technology transitions
- Validates current technology choices
- Lower weight than current direct evidence
- Base confidence: 60-75%
More from transilienceai/communitytools
hackerone
HackerOne bug bounty automation - parses scope CSVs, deploys parallel pentesting agents for each asset, validates PoCs, and generates platform-ready submission reports. Use when testing HackerOne programs or preparing professional vulnerability submissions.
50reconnaissance
Domain assessment and web application mapping - subdomain discovery, port scanning, endpoint enumeration, API discovery, and attack surface analysis.
40ai-threat-testing
Offensive AI security testing and exploitation framework. Systematically tests LLM applications for OWASP Top 10 vulnerabilities including prompt injection, model extraction, data poisoning, and supply chain attacks. Integrates with pentest workflows to discover and exploit AI-specific threats.
38osint
Open-source intelligence gathering - company repository enumeration, secret scanning, git history analysis, employee footprint, and code exposure discovery.
37social-engineering
Social engineering testing - phishing, pretexting, vishing, and physical security assessment techniques.
37source-code-scanning
Security-focused source code review and SAST. Scans for vulnerabilities (OWASP Top 10, CWE Top 25), CVEs in third-party dependencies/packages, hardcoded secrets, malicious code, and insecure patterns. Use when given source code, a repo path, or asked to "audit", "scan", "review" code security, or "check dependencies for CVEs".
35