/vm-lifecycle-manager Skill

Control virtual machine power state in OpenShift Virtualization using the vm_lifecycle tool.

Prerequisites

Required MCP Server: openshift-virtualization (OpenShift MCP Server)

Required MCP Tools:

vm_lifecycle (from openshift-virtualization) - Manage VM power state

Required Environment Variables:

KUBECONFIG - Path to Kubernetes configuration file with cluster access

Required Cluster Setup:

OpenShift cluster (>= 4.19)
OpenShift Virtualization operator installed
ServiceAccount with RBAC permissions to update VirtualMachine resources

Prerequisite Verification

Before executing:

Check openshift-virtualization exists in .mcp.json → If missing, report setup
Verify KUBECONFIG is set (presence only, never expose value) → If missing, report

Human Notification Protocol: ❌ Cannot execute vm-lifecycle-manager: MCP server not available. Setup: Add to .mcp.json, set KUBECONFIG, restart Claude Code. Docs: https://github.com/openshift/openshift-mcp-server

⚠️ SECURITY: Never display KUBECONFIG path or credential values.

When to Use This Skill

Trigger when:

User explicitly invokes /vm-lifecycle-manager command
User requests starting/stopping/restarting a VM
User wants to change VM power state

User phrases:

"Start VM web-server in namespace vms"
"Stop the database VM"
"Restart test-vm"
"Power on the VM called api-server"
"/vm-lifecycle-manager" (explicit command)

Do NOT use when:

Create VM → /vm-create
List VMs → /vm-inventory
Delete VM → /vm-delete

Workflow

Step 1: Gather Parameters and Confirm

Required from user: VM Name, Namespace, Action (start|stop|restart)

Present for confirmation:

## VM Lifecycle Operation

| Parameter | Value | Impact |
|-----------|-------|--------|
| VM Name | `<vm>` | from user |
| Namespace | `<ns>` | from user |
| Action | `<action>` | start: consumes resources / stop: graceful shutdown / restart: brief interruption (~1-2min) |

Confirm: yes/no

WAIT for explicit "yes" before proceeding to Step 2.

Step 2: Execute Lifecycle Operation

ONLY AFTER user confirmation in Step 1.

For start or stop actions:

MCP Tool: vm_lifecycle (namespace=<ns>, name=<vm>, action=<start|stop>)

For restart action (composite operation):

CRITICAL: Implement restart as two separate operations to avoid resourceVersion conflicts:

Stop VM: vm_lifecycle (namespace=<ns>, name=<vm>, action="stop")
Verify stopped: resources_get (apiVersion="kubevirt.io/v1", kind="VirtualMachine", namespace=<ns>, name=<vm>) → Check status.printableStatus == "Stopped"
Wait: 5 seconds for VM to fully stop
Start VM: vm_lifecycle (namespace=<ns>, name=<vm>, action="start")
Verify started: resources_get → Check status.printableStatus == "Running"

Errors:

VM not found → Report, suggest vm-inventory
Permission denied → Report RBAC error
Already in desired state → Inform user
Stop fails during restart → Report, do not proceed to start
Start fails during restart → Report, VM is stopped
Transition fails → Report details

Step 3: Report Operation Status

On Success:

## ✓ VM <Action> Successful

**VM**: `<vm>` | **Namespace**: `<ns>` | **Action**: <action> | **RunStrategy**: <Always|Halted>

**Impact**:
- **start**: Running, consuming resources (CPU/memory). Access: virtctl console or SSH. RunStrategy: Always (auto-restart on crash)
- **stop**: Stopped, resources freed. State preserved. Start: "Start VM <vm>". RunStrategy: Halted (stays off)
- **restart**: Running after stop+start. Brief interruption (~1-2min). Monitor app logs. RunStrategy: Always

**Next**: "Show status of VM <vm>" or "List VMs in namespace <ns>"

On Failure:

OPTIONAL: Read lifecycle-errors.md for start/stop failures or scheduling-errors.md for ErrorUnschedulable. Output: "Consulted lifecycle-errors.md for failure."

When to consult: Start/stop failures, stuck transitions, unexpected errors. NOT: Already in state, not found, RBAC errors.

## ❌ Lifecycle Operation Failed

**Error**: <error>

**Causes**: VM not found | RBAC denied | Already in desired state | VM in transition (wait 30-60s) | Resource constraints (start)

**Troubleshoot**:
1. vm-inventory to verify VM exists
2. Check RBAC: `oc auth can-i update virtualmachines -n <ns>`
3. View VM status and events
4. Check node capacity (for start operations)

Common Issues

Issue 1: VM Not Found

Error: "VirtualMachine 'xyz' not found in namespace 'abc'" Solution: Verify spelling, check namespace, use vm-inventory, VM may be deleted

Issue 2: VM Already in Desired State

Warning: "VM is already running" (when attempting start) Solution: Not an error - VM already in desired state. Use restart if intended to restart

Issue 3: Permission Denied

Error: "Forbidden: User cannot update VirtualMachines" Solution: Verify RBAC permissions (update VirtualMachine resources), contact admin

Issue 4: VM Stuck in Transitioning State

Error: "VM stuck in 'Terminating' or 'Starting'" Solution: Wait 30-60s, check events (oc describe vm), use vm-troubleshooter, check virt-launcher pod

Issue 5: Insufficient Resources (Start)

Error: "Insufficient CPU/memory to start VM" Solution: Check cluster availability, stop other VMs, scale nodes, resize VM to smaller instance type

Issue 6: Restart Implementation

Note: Restart is implemented as two separate operations (stop → verify → start → verify) Reason: Avoids Kubernetes resourceVersion conflicts when using single restart action Behavior: If stop succeeds but start fails, VM remains stopped. Check VM status with vm-inventory

Understanding RunStrategy

Action	RunStrategy	Behavior
start	Always	Runs, auto-restarts on crash
stop	Halted	Stops, stays off
restart	Always	Stops, starts, auto-restarts

Dependencies

Required MCP Servers

openshift-virtualization - OpenShift MCP server with KubeVirt toolset

Required MCP Tools

vm_lifecycle - Manage VM power state (start/stop/restart)

Related Skills

vm-create - Create VMs
vm-inventory - Check VM status
vm-troubleshooter (planned) - Diagnose startup/shutdown issues

Reference Documentation

lifecycle-errors.md - Start/stop failures, stuck transitions (consulted on failures)
scheduling-errors.md - ErrorUnschedulable, resource constraints (consulted when VM won't start)
Troubleshooting INDEX - Navigation hub for error categories
OpenShift Virt Docs
KubeVirt Lifecycle
RunStrategy Docs
OpenShift MCP

Critical: Human-in-the-Loop Requirements

IMPORTANT: This skill requires explicit user confirmation before executing. You MUST:

Wait for user confirmation on specific action (start/stop/restart) before executing vm_lifecycle
Do NOT proceed until user explicitly approves
Present action clearly with VM name, namespace, intended state change
Never auto-execute - these operations impact running services
Warn about consequences - stopping interrupts services, starting consumes resources, restarting causes downtime

If user says "no" or wants to reconsider, do not proceed.

Why: start (consumes resources), stop (interrupts services), restart (brief downtime). User should verify correct VM and understand impact.

Security Considerations

Requires RBAC permissions (update VirtualMachine resources)
Stop operations attempt graceful shutdown; guest OS/app data consistency depends on workload
Start operations consume resources (quota limits apply)
All operations audited in Kubernetes API logs
KUBECONFIG credentials never exposed
ServiceAccount permissions limit controllable VMs

Example Usage

Example 1: Start Stopped VM

User: "Start VM web-server in namespace vms"
Agent: [vm-lifecycle-manager skill]
## VM Lifecycle Operation
| VM Name | `web-server` | Namespace | `vms` | Action | `start` | consumes resources |
Confirm: yes/no
User: "yes"
Agent: [vm_lifecycle(namespace="vms", name="web-server", action="start")]
## ✓ VM Start Successful
VM: `web-server` | Namespace: `vms` | RunStrategy: Always
Impact: Running, consuming resources. Access: virtctl console or SSH

Example 2: Stop with User Reconsidering

User: "Stop the database VM in production namespace"
Agent: [vm-lifecycle-manager skill]
| VM | `database` | Namespace | `production` | Action | `stop` | graceful shutdown |
Confirm: yes/no
User: "wait, no, don't stop it"
Agent: Operation cancelled. VM 'database' in namespace 'production' remains running.

Example 3: Restart VM

User: "Restart test-vm in namespace dev"
Agent: [vm-lifecycle-manager skill]
| VM | `test-vm` | Namespace | `dev` | Action | `restart` | brief interruption ~1-2min |
User: "yes"
Agent: [Composite restart operation]
       1. vm_lifecycle(namespace="dev", name="test-vm", action="stop")
       2. Verify stopped: resources_get → status.printableStatus="Stopped"
       3. Wait 5s
       4. vm_lifecycle(namespace="dev", name="test-vm", action="start")
       5. Verify started: resources_get → status.printableStatus="Running"
## ✓ VM Restart Successful
VM: `test-vm` | Namespace: `dev` | RunStrategy: Always
Impact: Running after stop+start. Brief interruption. Monitor app logs.

Example 4: VM Already in Desired State

User: "Start web-server in namespace vms"
Agent: [vm-lifecycle-manager skill]
       [vm_lifecycle(action="start")]
## ℹ️ VM Already Running
VM: `web-server` | Namespace: `vms` | Status: Running
Result: No action taken - VM already in desired state.
To restart: "Restart VM web-server in namespace vms"

Example 5: Batch Operations

User: "Stop VMs web-01, web-02, web-03 in namespace production"
Agent: [vm-lifecycle-manager skill - batch mode]
## Batch Lifecycle Operation
Stopping 3 VMs in 'production': web-01, web-02, web-03
Impact: All 3 VMs will shut down, services interrupted.
Confirm: yes/no
User: "yes"
Agent: [Executes vm_lifecycle for each VM sequentially]
## ✓ Batch Stop Successful
- web-01: Stopped
- web-02: Stopped
- web-03: Stopped
All VMs stopped. Resources freed.

vm-lifecycle-manager