Deployment Troubleshooting

Installation

SKILL.md

Deployment Troubleshooting Guide

When a deployment fails, the error message is your best starting point — but Terraform and cloud provider error messages are often cryptic or misleading. This guide maps common error patterns to their root causes and fixes, organized by the deployment phase where they typically occur.

When helping a user debug a deployment issue, start by identifying which phase failed, then match the error text against the patterns below.

Phase 1: Terraform Init Errors

Init failures happen before any infrastructure is created. They're usually about provider plugins or backend configuration.

Error Pattern	Root Cause	Fix
`Failed to install provider`	No internet, proxy blocking registry.terraform.io, or provider name typo	Check connectivity: `curl -I https://registry.terraform.io`. If behind proxy, set `HTTPS_PROXY`. Verify provider source string
`Could not load plugin`	Plugin cache corrupted or provider version mismatch	Run `terraform init -upgrade` to re-download. Delete `.terraform/` and retry if persistent
`Backend initialization required`	Remote state bucket doesn't exist or credentials wrong	Create the bucket first, verify credentials have access to it. Check region matches
`Failed to query available provider packages`	DNS resolution failure or firewall blocking	Try `nslookup registry.terraform.io`. Consider using `terraform init -plugin-dir` with pre-downloaded providers

Phase 2: Authentication Errors

These surface during terraform plan when the provider tries to validate credentials against the cloud API.

Error Pattern	Provider	Fix
`NoCredentialProviders`	AWS	`AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` not set or expired. Re-export them or check RedC profile
`AuthorizationFailed`	Azure	Service principal lacks permissions on the subscription. Verify `ARM_SUBSCRIPTION_ID` matches, check role assignments
`googleapi: Error 403`	GCP	Service account doesn't have required permissions. Check IAM roles in GCP Console, verify `GOOGLE_APPLICATION_CREDENTIALS` path
`InvalidAccessKeyId`	AWS/Alibaba	Access key deleted or rotated. Generate a new key pair in the console
`AuthFailure`	Tencent	`TENCENTCLOUD_SECRET_ID` wrong (note: it's "SecretId" not "AccessKey")

Debugging tip: When a user reports an auth error, ask them to verify the environment variables are set in the current shell session. A common mistake is setting them in one terminal and running RedC in another.

Phase 3: Resource Creation Failures

These happen during terraform apply when the cloud provider rejects a resource creation request.

Error Pattern	Root Cause	Fix
`InstanceLimitExceeded`	Account quota reached for this instance type	Request a quota increase via support ticket, or use a different instance type/region
`VPCLimitExceeded`	Default limit is 5 VPCs per region	Clean up unused VPCs in the console, or request a limit increase
`InvalidParameterValue` for instance type	Instance type not available in the selected AZ	Check availability with `aws ec2 describe-instance-type-offerings`, try a different AZ or type
`InsufficientInstanceCapacity`	AWS capacity constraints in that AZ	Retry in a different AZ (`-a`, `-b`, `-c`), or try a different instance family
`Insufficient balance`	Prepaid account ran out of credit	Top up the account. Use `get_balances` to check current balance

Phase 4: Network and Connectivity Issues

These typically appear after instances are created but the user can't reach them.

Symptom	Likely Cause	Investigation Steps
SSH connection refused	Security group doesn't allow inbound SSH from user's IP	Check the security group ingress rules. Verify user's current public IP matches the allowed CIDR
SSH connection timed out	Instance has no public IP, or is in a private subnet without NAT	Verify the instance has a public IP in the console. Check subnet route table has an internet gateway
`timeout awaiting response` during apply	Security group blocks outbound HTTPS (443)	The instance needs outbound access to download packages. Check egress rules
Instance created but tools don't work	user_data script failed silently	SSH in and check `/var/log/cloud-init-output.log` for errors

Phase 5: State Issues

State problems are dangerous because they can cause Terraform to lose track of real infrastructure, leading to orphaned resources you're still paying for.

Error Pattern	Root Cause	Fix
`Error acquiring the state lock`	Another `terraform apply` is running, or a previous run crashed without releasing the lock	Wait for the other process to finish. If it crashed, force-unlock: `terraform force-unlock <LOCK_ID>`
`Resource already exists`	Resource was created outside Terraform (e.g., manually in console)	Import it: `terraform import <resource_address> <resource_id>`
`Unsupported attribute`	Provider version upgraded and the attribute name changed	Pin provider version in `required_providers`, or update your `.tf` to use the new attribute name
Drift between state and reality	Manual changes in cloud console	Run `terraform plan` to see the diff, then decide: apply to overwrite manual changes, or `terraform refresh` to update state

Phase 6: User Data and Provisioning

Cloud-init runs on first boot and its failures are silent from Terraform's perspective — the instance is "created" but not properly configured.

Symptom	Investigation	Fix
Packages not installed	Check `/var/log/cloud-init-output.log`	Usually DNS or proxy issues. Add `apt update` retry logic to the script
Script didn't run at all	Check `/var/log/cloud-init.log` for YAML parse errors	Validate the cloud-init YAML syntax. Common issue: wrong indentation in `write_files`
Script timed out	Long-running operations (compiling, large downloads)	Break into smaller scripts, or increase timeout. Consider using RedC's `exec_command` for post-deploy setup instead
Wrong permissions on files	`write_files` defaults to root ownership	Set `owner` and `permissions` explicitly in the cloud-init config

Related skills

More from wgpsec/redc-template

Installs

–

Repository

wgpsec/redc-template

GitHub Stars

First Seen

–

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykPass

Deployment Troubleshooting

Deployment Troubleshooting Guide

Phase 1: Terraform Init Errors

Phase 2: Authentication Errors

Phase 3: Resource Creation Failures

Phase 4: Network and Connectivity Issues

Phase 5: State Issues

Phase 6: User Data and Provisioning

More from wgpsec/redc-template

multi-cloud deployment

terraform-provider-docs

aws security hardening

cloud cost optimization

terraform best practices