/vm-snapshot-restore Skill

Restore virtual machines from snapshots in OpenShift Virtualization. CRITICAL: This operation replaces current VM state with snapshot data. ALL changes since the snapshot will be LOST.

Implementation Note: This skill uses generic Kubernetes resource tools (resources_create_or_update) to create VirtualMachineRestore resources. Dedicated restore tools do not currently exist in the openshift-virtualization MCP server.

Prerequisites

Required MCP Server: openshift-virtualization (OpenShift MCP Server)

Required MCP Tools:

resources_create_or_update (from openshift-virtualization) - Create VirtualMachineRestore
resources_get (from openshift-virtualization) - Verify VM/snapshot exists, monitor restore
vm_lifecycle (from openshift-virtualization) - Stop VM if running

Required Environment Variables:

KUBECONFIG - Path to Kubernetes configuration file with cluster access

Required Cluster Setup:

OpenShift cluster (>= 4.19)
OpenShift Virtualization operator installed
ServiceAccount with RBAC permissions to create VirtualMachineRestore resources

When to Use This Skill

Trigger this skill when:

User wants to restore a VM to a previous state
User wants to recover from failed changes/upgrades
User explicitly requests snapshot restore

User phrases that trigger this skill:

"Restore VM api-server from snapshot snapshot-20240115"
"Roll back database-01 to pre-upgrade snapshot"
"Recover VM web-server from backup"

Do NOT use this skill when:

User wants to create snapshots → Use vm-snapshot-create skill
User wants to list snapshots → Use vm-snapshot-list skill
User wants to clone a VM → Use vm-clone skill

Workflow

Step 1: Gather Restore Information

Required Information from User:

VM Name - VM to restore
Namespace - Namespace where VM exists
Snapshot Name - Snapshot to restore from

If any information missing, ask for it.

Step 2: Verify VM Exists

MCP Tool: resources_get (from openshift-virtualization)

Parameters:

{
  "apiVersion": "kubevirt.io/v1",
  "kind": "VirtualMachine",
  "namespace": "<namespace>",
  "name": "<vm-name>"
}

Error Handling:

If VM not found → Report error
If permission denied → Report RBAC error

Step 3: Check VM Running State

From the VM resource in Step 2, check status.printableStatus.

If VM is Running:

⚠️ VM Must Be Stopped Before Restore

**VM**: `<vm-name>` (namespace: `<namespace>`)
**Status**: Running

**Safety Requirement**: VMs must be stopped before restore to prevent data corruption.

**Options:**
1. "stop-and-restore" - Stop the VM first, then restore from snapshot
2. "cancel" - Cancel restore operation

How would you like to proceed?

Wait for user response.

If "stop-and-restore" → Stop VM using vm_lifecycle, then continue
If "cancel" → Stop workflow

Step 4: Verify Snapshot Exists

MCP Tool: resources_get (from openshift-virtualization)

Parameters:

{
  "apiVersion": "snapshot.kubevirt.io/v1beta1",
  "kind": "VirtualMachineSnapshot",
  "namespace": "<namespace>",
  "name": "<snapshot-name>"
}

If snapshot not found:

❌ Snapshot Not Found

**Snapshot**: `<snapshot-name>` does not exist in namespace `<namespace>`.

**To list available snapshots:**
"List snapshots for VM <vm-name>"

Restore operation cancelled.

STOP workflow.

Extract snapshot details:

metadata.creationTimestamp - Creation time
status.phase - Must be "Succeeded"
status.readyToUse - Must be true
spec.source.name - Verify it matches the VM name

If snapshot status is not Ready:

❌ Snapshot Not Ready

**Snapshot**: `<snapshot-name>`
**Status**: <status.phase>
**Ready to Use**: <status.readyToUse>

Snapshot is not ready for restore. Only snapshots with "Succeeded" phase and readyToUse=true can be used.

Restore operation cancelled.

STOP workflow.

Step 5: Present Restore Preview and Get Typed Confirmation

CRITICAL: User must type the snapshot name to confirm.

## 🔴 VM RESTORE - Data Loss Warning

**⚠️ THIS WILL REPLACE CURRENT VM STATE WITH SNAPSHOT DATA ⚠️**

### What Will Happen

**VM to Restore**: `<vm-name>` (namespace: `<namespace>`)
**Snapshot to Restore From**: `<snapshot-name>`

**Current VM State** (WILL BE LOST):
- **Last Modified**: <current-timestamp>
- **Changes Since Snapshot**: ALL changes made after <snapshot-creation-timestamp> WILL BE PERMANENTLY LOST

**Snapshot State** (WILL BE RESTORED):
- **Created**: <snapshot-creation-timestamp>
- **Age**: <snapshot-age>

**Time Range of Data Loss**:
- **⚠️ ALL CHANGES in the last <time-diff> WILL BE LOST ⚠️**

### What Will Be Restored
- ✓ VM configuration (from snapshot time)
- ✓ Disk data (from snapshot time)

### What Will Be Lost
- ✗ **ALL disk changes** made after <snapshot-creation-timestamp>
- ✗ **ALL configuration changes** made after <snapshot-creation-timestamp>

---

**⚠️ CRITICAL: This restore is permanent. Current VM state cannot be recovered unless you create a snapshot now.**

**To proceed with restore, type the snapshot name exactly as shown:**

Type `<snapshot-name>` to confirm: _____

Wait for user to type the snapshot name.

Validation:

Compare user input with snapshot name (case-sensitive, exact match)
If match: Proceed to Step 6
If mismatch: Cancel operation

On mismatch:

❌ Confirmation Failed

**You typed**: `<user-input>`
**Expected**: `<snapshot-name>`

Names do not match. Restore cancelled for safety.

Operation cancelled. Current VM state preserved.

STOP workflow.

Step 6: Final Confirmation Before Restore

After typed verification succeeds, ask for final explicit confirmation.

## ✓ Typed Verification Passed

**Confirmation received for snapshot**: `<snapshot-name>`

### Ready to Restore

**VM**: `<vm-name>` (namespace: `<namespace>`)
**From Snapshot**: `<snapshot-name>`

**Impact**:
- Current VM state will be replaced with snapshot state
- All changes in the last <time-diff> will be permanently lost

---

**Proceed with VM restore? This action cannot be undone.**
- Type "yes" to execute restore
- Type "cancel" to abort

Your choice: _____

Wait for user response.

Handle response:

If "yes" → Proceed to Step 7 (execute restore)
If "cancel", "no", "wait", or anything else → Cancel operation

On cancellation:

Restore operation cancelled by user. Current VM state preserved.

STOP workflow.

Step 7: Execute Restore

ONLY PROCEED AFTER:

✓ VM verified (exists, stopped)
✓ Snapshot verified (exists, ready)
✓ User typed snapshot name correctly
✓ User confirmed "yes"

MCP Tool: resources_create_or_update (from openshift-virtualization)

Construct VirtualMachineRestore YAML:

apiVersion: snapshot.kubevirt.io/v1beta1
kind: VirtualMachineRestore
metadata:
  name: <restore-name>
  namespace: <namespace>
spec:
  target:
    apiGroup: kubevirt.io
    kind: VirtualMachine
    name: <vm-name>
  virtualMachineSnapshotName: <snapshot-name>

Generate restore name:

Format: restore-<vm-name>-<timestamp>
Example: restore-database-01-20260218-143500

Parameters:

{
  "resource": "apiVersion: snapshot.kubevirt.io/v1beta1\nkind: VirtualMachineRestore\nmetadata:\n  name: <restore-name>\n  namespace: <namespace>\nspec:\n  target:\n    apiGroup: kubevirt.io\n    kind: VirtualMachine\n    name: <vm-name>\n  virtualMachineSnapshotName: <snapshot-name>"
}

Report progress:

🔄 Restoring VM from snapshot...
⏳ This may take several minutes...

Step 8: Monitor Restore Progress

Use resources_get to monitor VirtualMachineRestore status.

Check status.complete:

true → Restore completed
false → Restore in progress

Wait up to 10 minutes for restore to complete.

Step 9: Report Restore Results

On success:

## ✓ VM Restored Successfully

**VM**: `<vm-name>` (namespace: `<namespace>`)
**Restored From**: Snapshot `<snapshot-name>`

### Restore Details
- **Snapshot Created**: <snapshot-creation-timestamp>
- **Restore Completed**: <current-timestamp>
- **VM Status**: Stopped (ready to start)

### Data Loss Confirmation
- ⚠️ All changes made after <snapshot-creation-timestamp> have been lost

### Next Steps

**To start the restored VM:**
"Start VM <vm-name> in namespace <namespace>"

On failure:

## ❌ VM Restore Failed

**Error**: <error-message>

**VM**: `<vm-name>`
**Snapshot**: `<snapshot-name>`

**Current VM State**: UNKNOWN - may be partially restored or unchanged

**CRITICAL**: Do not start VM until restore issue is resolved

**Recovery Options:**
1. Try restore again after resolving the error
2. Restore from a different snapshot
3. Contact cluster admin for investigation

Dependencies

Required MCP Servers

openshift-virtualization - OpenShift MCP server with kubevirt toolset

Required MCP Tools

resources_create_or_update (from openshift-virtualization) - Create VirtualMachineRestore
resources_get (from openshift-virtualization) - Verify and monitor
vm_lifecycle (from openshift-virtualization) - Stop VM if running

Related Skills

vm-snapshot-list - List snapshots before restore
vm-snapshot-create - Create snapshots before risky operations
vm-snapshot-delete - Delete old snapshots
vm-lifecycle-manager - Start VM after restore

Reference Documentation

Official Red Hat Documentation:

OpenShift Virtualization Snapshots - OpenShift 4.20

Upstream Documentation:

KubeVirt VM Snapshots

Critical: Human-in-the-Loop Requirements

IMPORTANT: This skill performs DESTRUCTIVE operations. You MUST:

Before Restoring Snapshots (CRITICAL - Data Loss Risk)
- REQUIRE VM to be stopped first if currently running
- Display what will be lost (current VM state since snapshot)
- Show snapshot details (creation time, age)
- Require typed confirmation - user must type snapshot name exactly
- Ask: "Proceed with restore? This will replace current VM state. (yes/cancel)"
- Wait for explicit "yes"
Never Auto-Execute
- NEVER restore without user confirmation
- NEVER restore to running VMs without stopping first
- NEVER skip typed verification for restore operations

Why This Matters:

Data Loss on Restore: Restoring replaces current VM state - all changes since snapshot are PERMANENTLY LOST
No Undo: Restore cannot be reversed - current data cannot be recovered
Typed Confirmation: Prevents accidental restores to wrong snapshots

Common Issues

Issue 1: Restore Fails - Insufficient Storage Capacity

Error: "Failed to restore: insufficient storage capacity" or "PVC provisioning failed"

Cause: The namespace doesn't have enough storage quota or the storage backend is full.

Solution:

Check namespace storage quota: resources_list with kind="ResourceQuota"
Check PVC status: resources_list for PersistentVolumeClaims
Delete unnecessary snapshots: Use vm-snapshot-delete skill
Request quota increase: Contact cluster admin
Retry restore once storage is available

Issue 2: Restore Stuck in Progress

Error: VirtualMachineRestore status shows complete: false for extended period

Cause: The storage backend is slow, the snapshot is corrupted, or there's a CSI driver issue.

Solution:

Check VirtualMachineRestore status.conditions for detailed error messages
Verify snapshot is "Succeeded": Use vm-snapshot-list skill
Wait longer: Large VMs may take 10+ minutes to restore
Cancel and retry: Delete VirtualMachineRestore resource and try again
Try alternative snapshot if restore continues to fail
Check CSI driver logs (requires cluster admin access)

Issue 3: User Typed Wrong Snapshot Name

Error: "Confirmation Failed - Names do not match"

Cause: User made a typo when typing the snapshot name for confirmation.

Solution:

Retry with exact name: Copy-paste the snapshot name from the previous message
Check capitalization: Snapshot names are case-sensitive
Check for spaces: Ensure no leading/trailing spaces in the typed name
Use vm-snapshot-list skill to see all available snapshots and verify the correct name

This safety mechanism is intentional - it prevents accidental data loss from restoring to the wrong snapshot.

Security Considerations

Data Protection: Restore requires TWO confirmations (typed + yes) to prevent accidental data loss
VM Stop Requirement: VMs must be stopped before restore
Typed Verification: User must type exact snapshot name
RBAC Enforcement: Requires permissions for VirtualMachineRestore resources
Namespace Isolation: Restore scoped to namespace boundaries
Audit Trail: All restore operations logged in Kubernetes API audit logs
No Rollback: Restore is permanent - current state cannot be recovered after restore

Example Usage

User: "The database upgrade failed. Restore database-01 from snapshot pre-upgrade-snapshot"

Agent: "Which namespace is database-01 in?"

User: "production"

Agent: [Verifies VM - Running status detected] "⚠️ VM Must Be Stopped Before Restore. Options: 1) stop-and-restore 2) cancel"

User: "stop-and-restore"

Agent: [Stops VM, verifies snapshot exists] "🔴 VM RESTORE - Data Loss Warning

VM: database-01 (production)
Snapshot: pre-upgrade-snapshot (created 2026-02-18 10:00, age 6h 30m)
⚠️ ALL CHANGES in last 6h 30m WILL BE LOST Type pre-upgrade-snapshot to confirm: _____"

User: "pre-upgrade-snapshot"

Agent: "✓ Typed Verification Passed. Proceed with restore? (yes/cancel)"

User: "yes"

Agent: "🔄 Restoring VM from snapshot... ⏳ This may take several minutes..." [Monitors progress] "✓ VM Restored Successfully

VM: database-01 (production)
Restored From: pre-upgrade-snapshot (created 2026-02-18 10:00)
Status: Stopped (ready to start)
⚠️ All changes after 2026-02-18 10:00 have been lost

To start: 'Start VM database-01 in namespace production'"

vm-snapshot-restore

/vm-snapshot-restore Skill

Prerequisites

When to Use This Skill

Workflow

Step 1: Gather Restore Information

Step 2: Verify VM Exists

Step 3: Check VM Running State

Step 4: Verify Snapshot Exists

Step 5: Present Restore Preview and Get Typed Confirmation

Step 6: Final Confirmation Before Restore

Step 7: Execute Restore

Step 8: Monitor Restore Progress

Step 9: Report Restore Results

Dependencies

Required MCP Servers

Required MCP Tools

Related Skills

Reference Documentation

Critical: Human-in-the-Loop Requirements

Common Issues

Issue 1: Restore Fails - Insufficient Storage Capacity

Issue 2: Restore Stuck in Progress

Issue 3: User Typed Wrong Snapshot Name

Security Considerations

Example Usage

More from rhecosystemappeng/agentic-collections

fleet-inventory

cve-impact

playbook-generator

playbook-executor

cve-validation

system-context