terragrunt
SKILL.md
Terragrunt Infrastructure Skill
Manage bare-metal Kubernetes infrastructure from PXE boot to running clusters.
For architecture overview (units vs modules, config centralization), see infrastructure/CLAUDE.md. For detailed unit patterns, see infrastructure/units/CLAUDE.md.
Task Commands (Always Use These)
# Validation (run in order)
task tg:fmt # Format HCL files
task tg:test-<module> # Test specific module (e.g., task tg:test-config)
task tg:validate-<stack> # Validate stack (e.g., task tg:validate-integration)
# Operations
task tg:list # List available stacks
task tg:plan-<stack> # Plan (e.g., task tg:plan-integration)
task tg:apply-<stack> # Apply (REQUIRES HUMAN APPROVAL)
task tg:gen-<stack> # Generate stack files
task tg:clean-<stack> # Clean generated files
NEVER run terragrunt or tofu directly—always use task commands.
How to Add a Machine
- Edit
inventory.hcl:
node50 = {
cluster = "live"
type = "worker"
install = {
selector = "disk.model == 'Samsung'"
architecture = "amd64"
}
interfaces = [{
id = "eth0"
hardwareAddr = "aa:bb:cc:dd:ee:ff" # VERIFY correct
addresses = [{ ip = "192.168.10.50" }] # VERIFY available
}]
}
- Run
task tg:plan-live - Review plan—config module auto-includes machines where
cluster == "live" - Request human approval before apply
How to Add a Feature Flag
- Add version to
versions.hclif needed - Add feature detection in
modules/config/main.tf:
locals {
new_feature_enabled = contains(var.features, "new-feature")
}
- Enable in stack's features list:
features = ["gateway-api", "longhorn", "new-feature"]
How to Create a New Unit
- Create
units/new-unit/terragrunt.hcl:
include "root" {
path = find_in_parent_folders("root.hcl")
}
terraform {
source = "../../../.././/modules/new-unit"
}
dependency "config" {
config_path = "../config"
mock_outputs = { new_unit = {} }
}
inputs = dependency.config.outputs.new_unit
- Create corresponding
modules/new-unit/withvariables.tf,main.tf,outputs.tf,versions.tf - Add output from config module
- Add
unitblock to stacks that need it
How to Write Module Tests
Tests use OpenTofu native testing in modules/<name>/tests/*.tftest.hcl:
# Top-level variables set defaults for ALL run blocks
variables {
name = "test-cluster"
features = ["gateway-api"]
machines = {
node1 = {
cluster = "test-cluster"
type = "controlplane"
# ... complete machine definition
}
}
}
run "feature_enabled" {
command = plan
variables {
features = ["prometheus"] # Only override what differs
}
assert {
condition = output.prometheus_enabled == true
error_message = "Prometheus should be enabled"
}
}
Run with task tg:test-config or task tg:test for all modules.
Safety Rules
- NEVER run apply without explicit human approval
- NEVER use
--auto-approveflags - NEVER guess MAC addresses or IPs—verify against
inventory.hcl - NEVER commit
.terragrunt-cache/or.terragrunt-stack/ - NEVER manually edit Terraform state
State Operations
When removing state entries with indexed resources (e.g., this["rpi4"]), xargs strips the quotes causing errors. Use a while loop instead:
# WRONG - xargs mangles quotes in resource names
terragrunt state list | xargs -n 1 terragrunt state rm
# CORRECT - while loop preserves quotes
terragrunt state list | while read -r resource; do terragrunt state rm "$resource"; done
This applies to any state operation on resources with map keys like data.talos_machine_configuration.this["rpi4"].
Validation Checklist
Before requesting apply approval:
-
task tg:fmtpasses -
task tg:testpasses (if module tests exist) -
task tg:validatepasses for ALL stacks -
task tg:plan-<stack>reviewed - No unexpected destroys in plan
- Network changes won't break connectivity
References
Weekly Installs
51
Repository
ionfury/homelabGitHub Stars
22
First Seen
Jan 25, 2026
Security Audits
Installed on
cursor51
gemini-cli51
github-copilot51
codex51
opencode51
amp50