vm-snapshot-restore

SKILL.md

/vm-snapshot-restore Skill

Restore virtual machines from snapshots in OpenShift Virtualization. CRITICAL: This operation replaces current VM state with snapshot data. ALL changes since the snapshot will be LOST.

Implementation Note: This skill uses generic Kubernetes resource tools (resources_create_or_update) to create VirtualMachineRestore resources. Dedicated restore tools do not currently exist in the openshift-virtualization MCP server.

Prerequisites

Required MCP Server: openshift-virtualization (OpenShift MCP Server)

Required MCP Tools:

  • resources_create_or_update (from openshift-virtualization) - Create VirtualMachineRestore
  • resources_get (from openshift-virtualization) - Verify VM/snapshot exists, monitor restore
  • vm_lifecycle (from openshift-virtualization) - Stop VM if running

Required Environment Variables:

  • KUBECONFIG - Path to Kubernetes configuration file with cluster access

Required Cluster Setup:

  • OpenShift cluster (>= 4.19)
  • OpenShift Virtualization operator installed
  • ServiceAccount with RBAC permissions to create VirtualMachineRestore resources

When to Use This Skill

Trigger this skill when:

  • User wants to restore a VM to a previous state
  • User wants to recover from failed changes/upgrades
  • User explicitly requests snapshot restore

User phrases that trigger this skill:

  • "Restore VM api-server from snapshot snapshot-20240115"
  • "Roll back database-01 to pre-upgrade snapshot"
  • "Recover VM web-server from backup"

Do NOT use this skill when:

  • User wants to create snapshots → Use vm-snapshot-create skill
  • User wants to list snapshots → Use vm-snapshot-list skill
  • User wants to clone a VM → Use vm-clone skill

Workflow

Step 1: Gather Restore Information

Required Information from User:

  1. VM Name - VM to restore
  2. Namespace - Namespace where VM exists
  3. Snapshot Name - Snapshot to restore from

If any information missing, ask for it.

Step 2: Verify VM Exists

MCP Tool: resources_get (from openshift-virtualization)

Parameters:

{
  "apiVersion": "kubevirt.io/v1",
  "kind": "VirtualMachine",
  "namespace": "<namespace>",
  "name": "<vm-name>"
}

Error Handling:

  • If VM not found → Report error
  • If permission denied → Report RBAC error

Step 3: Check VM Running State

From the VM resource in Step 2, check status.printableStatus.

If VM is Running:

⚠️ VM Must Be Stopped Before Restore

**VM**: `<vm-name>` (namespace: `<namespace>`)
**Status**: Running

**Safety Requirement**: VMs must be stopped before restore to prevent data corruption.

**Options:**
1. "stop-and-restore" - Stop the VM first, then restore from snapshot
2. "cancel" - Cancel restore operation

How would you like to proceed?

Wait for user response.

  • If "stop-and-restore" → Stop VM using vm_lifecycle, then continue
  • If "cancel" → Stop workflow

Step 4: Verify Snapshot Exists

MCP Tool: resources_get (from openshift-virtualization)

Parameters:

{
  "apiVersion": "snapshot.kubevirt.io/v1beta1",
  "kind": "VirtualMachineSnapshot",
  "namespace": "<namespace>",
  "name": "<snapshot-name>"
}

If snapshot not found:

❌ Snapshot Not Found

**Snapshot**: `<snapshot-name>` does not exist in namespace `<namespace>`.

**To list available snapshots:**
"List snapshots for VM <vm-name>"

Restore operation cancelled.

STOP workflow.

Extract snapshot details:

  • metadata.creationTimestamp - Creation time
  • status.phase - Must be "Succeeded"
  • status.readyToUse - Must be true
  • spec.source.name - Verify it matches the VM name

If snapshot status is not Ready:

❌ Snapshot Not Ready

**Snapshot**: `<snapshot-name>`
**Status**: <status.phase>
**Ready to Use**: <status.readyToUse>

Snapshot is not ready for restore. Only snapshots with "Succeeded" phase and readyToUse=true can be used.

Restore operation cancelled.

STOP workflow.

Step 5: Present Restore Preview and Get Typed Confirmation

CRITICAL: User must type the snapshot name to confirm.

## 🔴 VM RESTORE - Data Loss Warning

**⚠️ THIS WILL REPLACE CURRENT VM STATE WITH SNAPSHOT DATA ⚠️**

### What Will Happen

**VM to Restore**: `<vm-name>` (namespace: `<namespace>`)
**Snapshot to Restore From**: `<snapshot-name>`

**Current VM State** (WILL BE LOST):
- **Last Modified**: <current-timestamp>
- **Changes Since Snapshot**: ALL changes made after <snapshot-creation-timestamp> WILL BE PERMANENTLY LOST

**Snapshot State** (WILL BE RESTORED):
- **Created**: <snapshot-creation-timestamp>
- **Age**: <snapshot-age>

**Time Range of Data Loss**:
- **⚠️ ALL CHANGES in the last <time-diff> WILL BE LOST ⚠️**

### What Will Be Restored
- ✓ VM configuration (from snapshot time)
- ✓ Disk data (from snapshot time)

### What Will Be Lost
-**ALL disk changes** made after <snapshot-creation-timestamp>
-**ALL configuration changes** made after <snapshot-creation-timestamp>

---

**⚠️ CRITICAL: This restore is permanent. Current VM state cannot be recovered unless you create a snapshot now.**

**To proceed with restore, type the snapshot name exactly as shown:**

Type `<snapshot-name>` to confirm: _____

Wait for user to type the snapshot name.

Validation:

  • Compare user input with snapshot name (case-sensitive, exact match)
  • If match: Proceed to Step 6
  • If mismatch: Cancel operation

On mismatch:

❌ Confirmation Failed

**You typed**: `<user-input>`
**Expected**: `<snapshot-name>`

Names do not match. Restore cancelled for safety.

Operation cancelled. Current VM state preserved.

STOP workflow.

Step 6: Final Confirmation Before Restore

After typed verification succeeds, ask for final explicit confirmation.

## ✓ Typed Verification Passed

**Confirmation received for snapshot**: `<snapshot-name>`

### Ready to Restore

**VM**: `<vm-name>` (namespace: `<namespace>`)
**From Snapshot**: `<snapshot-name>`

**Impact**:
- Current VM state will be replaced with snapshot state
- All changes in the last <time-diff> will be permanently lost

---

**Proceed with VM restore? This action cannot be undone.**
- Type "yes" to execute restore
- Type "cancel" to abort

Your choice: _____

Wait for user response.

Handle response:

  • If "yes" → Proceed to Step 7 (execute restore)
  • If "cancel", "no", "wait", or anything else → Cancel operation

On cancellation:

Restore operation cancelled by user. Current VM state preserved.

STOP workflow.

Step 7: Execute Restore

ONLY PROCEED AFTER:

  • ✓ VM verified (exists, stopped)
  • ✓ Snapshot verified (exists, ready)
  • ✓ User typed snapshot name correctly
  • ✓ User confirmed "yes"

MCP Tool: resources_create_or_update (from openshift-virtualization)

Construct VirtualMachineRestore YAML:

apiVersion: snapshot.kubevirt.io/v1beta1
kind: VirtualMachineRestore
metadata:
  name: <restore-name>
  namespace: <namespace>
spec:
  target:
    apiGroup: kubevirt.io
    kind: VirtualMachine
    name: <vm-name>
  virtualMachineSnapshotName: <snapshot-name>

Generate restore name:

  • Format: restore-<vm-name>-<timestamp>
  • Example: restore-database-01-20260218-143500

Parameters:

{
  "resource": "apiVersion: snapshot.kubevirt.io/v1beta1\nkind: VirtualMachineRestore\nmetadata:\n  name: <restore-name>\n  namespace: <namespace>\nspec:\n  target:\n    apiGroup: kubevirt.io\n    kind: VirtualMachine\n    name: <vm-name>\n  virtualMachineSnapshotName: <snapshot-name>"
}

Report progress:

🔄 Restoring VM from snapshot...
⏳ This may take several minutes...

Step 8: Monitor Restore Progress

Use resources_get to monitor VirtualMachineRestore status.

Check status.complete:

  • true → Restore completed
  • false → Restore in progress

Wait up to 10 minutes for restore to complete.

Step 9: Report Restore Results

On success:

## ✓ VM Restored Successfully

**VM**: `<vm-name>` (namespace: `<namespace>`)
**Restored From**: Snapshot `<snapshot-name>`

### Restore Details
- **Snapshot Created**: <snapshot-creation-timestamp>
- **Restore Completed**: <current-timestamp>
- **VM Status**: Stopped (ready to start)

### Data Loss Confirmation
- ⚠️ All changes made after <snapshot-creation-timestamp> have been lost

### Next Steps

**To start the restored VM:**
"Start VM <vm-name> in namespace <namespace>"

On failure:

## ❌ VM Restore Failed

**Error**: <error-message>

**VM**: `<vm-name>`
**Snapshot**: `<snapshot-name>`

**Current VM State**: UNKNOWN - may be partially restored or unchanged

**CRITICAL**: Do not start VM until restore issue is resolved

**Recovery Options:**
1. Try restore again after resolving the error
2. Restore from a different snapshot
3. Contact cluster admin for investigation

Dependencies

Required MCP Servers

  • openshift-virtualization - OpenShift MCP server with kubevirt toolset

Required MCP Tools

  • resources_create_or_update (from openshift-virtualization) - Create VirtualMachineRestore
  • resources_get (from openshift-virtualization) - Verify and monitor
  • vm_lifecycle (from openshift-virtualization) - Stop VM if running

Related Skills

  • vm-snapshot-list - List snapshots before restore
  • vm-snapshot-create - Create snapshots before risky operations
  • vm-snapshot-delete - Delete old snapshots
  • vm-lifecycle-manager - Start VM after restore

Reference Documentation

Official Red Hat Documentation:

Upstream Documentation:

Critical: Human-in-the-Loop Requirements

IMPORTANT: This skill performs DESTRUCTIVE operations. You MUST:

  1. Before Restoring Snapshots (CRITICAL - Data Loss Risk)

    • REQUIRE VM to be stopped first if currently running
    • Display what will be lost (current VM state since snapshot)
    • Show snapshot details (creation time, age)
    • Require typed confirmation - user must type snapshot name exactly
    • Ask: "Proceed with restore? This will replace current VM state. (yes/cancel)"
    • Wait for explicit "yes"
  2. Never Auto-Execute

    • NEVER restore without user confirmation
    • NEVER restore to running VMs without stopping first
    • NEVER skip typed verification for restore operations

Why This Matters:

  • Data Loss on Restore: Restoring replaces current VM state - all changes since snapshot are PERMANENTLY LOST
  • No Undo: Restore cannot be reversed - current data cannot be recovered
  • Typed Confirmation: Prevents accidental restores to wrong snapshots

Common Issues

Issue 1: Restore Fails - Insufficient Storage Capacity

Error: "Failed to restore: insufficient storage capacity" or "PVC provisioning failed"

Cause: The namespace doesn't have enough storage quota or the storage backend is full.

Solution:

  1. Check namespace storage quota: resources_list with kind="ResourceQuota"
  2. Check PVC status: resources_list for PersistentVolumeClaims
  3. Delete unnecessary snapshots: Use vm-snapshot-delete skill
  4. Request quota increase: Contact cluster admin
  5. Retry restore once storage is available

Issue 2: Restore Stuck in Progress

Error: VirtualMachineRestore status shows complete: false for extended period

Cause: The storage backend is slow, the snapshot is corrupted, or there's a CSI driver issue.

Solution:

  1. Check VirtualMachineRestore status.conditions for detailed error messages
  2. Verify snapshot is "Succeeded": Use vm-snapshot-list skill
  3. Wait longer: Large VMs may take 10+ minutes to restore
  4. Cancel and retry: Delete VirtualMachineRestore resource and try again
  5. Try alternative snapshot if restore continues to fail
  6. Check CSI driver logs (requires cluster admin access)

Issue 3: User Typed Wrong Snapshot Name

Error: "Confirmation Failed - Names do not match"

Cause: User made a typo when typing the snapshot name for confirmation.

Solution:

  1. Retry with exact name: Copy-paste the snapshot name from the previous message
  2. Check capitalization: Snapshot names are case-sensitive
  3. Check for spaces: Ensure no leading/trailing spaces in the typed name
  4. Use vm-snapshot-list skill to see all available snapshots and verify the correct name

This safety mechanism is intentional - it prevents accidental data loss from restoring to the wrong snapshot.

Security Considerations

  • Data Protection: Restore requires TWO confirmations (typed + yes) to prevent accidental data loss
  • VM Stop Requirement: VMs must be stopped before restore
  • Typed Verification: User must type exact snapshot name
  • RBAC Enforcement: Requires permissions for VirtualMachineRestore resources
  • Namespace Isolation: Restore scoped to namespace boundaries
  • Audit Trail: All restore operations logged in Kubernetes API audit logs
  • No Rollback: Restore is permanent - current state cannot be recovered after restore

Example Usage

User: "The database upgrade failed. Restore database-01 from snapshot pre-upgrade-snapshot"

Agent: "Which namespace is database-01 in?"

User: "production"

Agent: [Verifies VM - Running status detected] "⚠️ VM Must Be Stopped Before Restore. Options: 1) stop-and-restore 2) cancel"

User: "stop-and-restore"

Agent: [Stops VM, verifies snapshot exists] "🔴 VM RESTORE - Data Loss Warning

  • VM: database-01 (production)
  • Snapshot: pre-upgrade-snapshot (created 2026-02-18 10:00, age 6h 30m)
  • ⚠️ ALL CHANGES in last 6h 30m WILL BE LOST Type pre-upgrade-snapshot to confirm: _____"

User: "pre-upgrade-snapshot"

Agent: "✓ Typed Verification Passed. Proceed with restore? (yes/cancel)"

User: "yes"

Agent: "🔄 Restoring VM from snapshot... ⏳ This may take several minutes..." [Monitors progress] "✓ VM Restored Successfully

  • VM: database-01 (production)
  • Restored From: pre-upgrade-snapshot (created 2026-02-18 10:00)
  • Status: Stopped (ready to start)
  • ⚠️ All changes after 2026-02-18 10:00 have been lost

To start: 'Start VM database-01 in namespace production'"

Weekly Installs
1
GitHub Stars
4
First Seen
4 days ago
Installed on
claude-code1