troubleshooting-efs
Troubleshooting EFS
Overview
Domain expertise for diagnosing and resolving Amazon EFS issues. Covers mount failures, NFS connectivity, IAM and POSIX permissions, throughput and performance, and encryption problems.
For authoritative guidance, see EFS Troubleshooting.
Common Tasks
0. Verify Dependencies
- You MUST verify
awsCLI is available - You MUST check if
amazon-efs-utilsornfs-utilsis installed on the instance - You MUST ONLY check for tool existence and version — MUST NOT execute destructive or mutating commands during verification
- You MUST inform the user if any required tools are missing
- You MUST respect the user's decision to abort if tools are unavailable
- You SHOULD explain what each step does and why before executing it
- You SHOULD display write commands and wait for user confirmation before executing
1. Classify the Issue
| Symptom | Category |
|---|---|
| "wrong fs type" or mount command fails | A: Missing NFS Client |
| Connection timed out (hangs 2+ min) | B: Network/Security Group |
| "access denied by server" | C: IAM/Permissions |
| Slow throughput or high latency | D: Performance |
| NFS server error on encrypted FS | E: Encryption/KMS |
| DNS name resolution fails | F: VPC DNS |
2. Category A — Missing NFS Client
# Amazon Linux / RHEL / CentOS
sudo yum -y install amazon-efs-utils # preferred (includes mount helper + TLS)
# OR
sudo yum -y install nfs-utils
# Ubuntu / Debian
sudo apt-get install nfs-common
3. Category B — Network/Security Group
Connection timeout is the #1 EFS mount failure — almost always security groups.
- Verify mount target exists in the instance's AZ:
aws efs describe-mount-targets --file-system-id fs-ID --region REGION
-
Verify security groups — check BOTH directions:
- Mount target SG:
aws ec2 describe-security-groups --group-ids sg-MT— MUST have inbound TCP 2049 from compute SG - Compute SG: MUST have outbound TCP 2049 to mount target SG
- Quick fix:
aws ec2 authorize-security-group-ingress --group-id sg-MT --protocol tcp --port 2049 --source-group sg-COMPUTE
- Mount target SG:
-
Test connectivity:
nc -zv fs-ID.efs.REGION.amazonaws.com 2049
Note: These security group troubleshooting steps also apply to S3 Files. The only difference is S3 Files uses
aws s3files list-mount-targetsinstead ofaws efs describe-mount-targets.
4. Category C — IAM/Permissions
"access denied by server" with -o iam:
- Check identity-based IAM policy has
elasticfilesystem:ClientMount - Check file system resource policy:
aws efs describe-file-system-policy --file-system-id fs-ID --region REGION
Note: IAM authorization is only enforced when a file system policy exists that requires it. Without a file system policy, any client in the VPC with port 2049 access can mount — even with -o iam. To enforce IAM, you MUST create a file system policy that denies anonymous access.
POSIX permission denied (not IAM):
- Check file/directory ownership:
ls -la /mnt/efs/ - Use access points to enforce UID/GID for consistent permissions
5. Category D — Performance
Check throughput mode:
aws efs describe-file-systems --file-system-id fs-ID --region REGION --query 'FileSystems[0].ThroughputMode'
Burst credit exhaustion (Bursting mode only):
aws cloudwatch get-metric-statistics --namespace AWS/EFS --metric-name BurstCreditBalance --dimensions Name=FileSystemId,Value=fs-ID --period 3600 --statistics Average --start-time $(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%S) --end-time $(date -u +%Y-%m-%dT%H:%M:%S)
If credits near zero, switch to Elastic throughput:
aws efs update-file-system --file-system-id fs-ID --throughput-mode elastic --region REGION
General Purpose vs Max I/O:
- Check
PercentIOLimitmetric — if consistently >80%, consider Max I/O - Note: performance mode is IMMUTABLE — must create new FS and migrate
6. Category E — Encryption/KMS
NFS server error on encrypted FS = KMS key issue.
- Verify key is enabled in KMS console
- Verify EFS service-linked role has KMS permissions
- If key deleted: cancel deletion if within grace period
7. Category F — VPC DNS
DNS resolution failure = VPC DNS settings disabled.
aws ec2 describe-vpc-attribute --vpc-id vpc-ID --attribute enableDnsHostnames
aws ec2 describe-vpc-attribute --vpc-id vpc-ID --attribute enableDnsSupport
Both MUST be true. If not:
aws ec2 modify-vpc-attribute --vpc-id vpc-ID --enable-dns-hostnames Value=true
aws ec2 modify-vpc-attribute --vpc-id vpc-ID --enable-dns-support Value=true
Troubleshooting
Mount hangs then times out
Most common cause: security group. Verify TCP 2049 is open between compute and mount target.
Auto-mount fails on reboot
/etc/fstab entry MUST include _netdev option to wait for network before mounting.
"nfs not responding" after reconnect
Old kernel bug with TCP port reuse. Update kernel or add noresvport mount option.
Enable Debug Logs
Set logging_level = DEBUG in /etc/amazon/efs/efs-utils.conf. Logs at /var/log/amazon/efs/mount.log.
Collect Logs for AWS Support
sudo tar -czf /tmp/efs-logs.tar.gz /var/log/amazon/efs/ /etc/amazon/efs/efs-utils.conf
Security Considerations
- IAM authorization is only enforced when a file system policy exists — without one, any VPC client with port 2049 access can mount
- When troubleshooting access denied, verify both identity-based and resource-based policies
- Use
-o tlsfor encryption in transit — unencrypted NFS traffic is visible on the network - Restrict
/var/log/amazon/efs/access — logs may contain file system IDs and mount target IPs
Additional Resources
More from aws/agent-toolkit-for-aws
aws-iam
Verified corrections for IAM behaviors that AI agents frequently get\
323amazon-bedrock
Builds generative AI applications on Amazon Bedrock. Covers model invocation (Converse API, InvokeModel), RAG with Knowledge Bases, Bedrock Agents, Guardrails, and AgentCore. Use when invoking models, setting up Knowledge Bases, creating agents, applying guardrails, deploying to AgentCore, troubleshooting Bedrock errors (ThrottlingException, AccessDeniedException), or choosing models (Claude, Llama, Nova, Titan). ALSO USE for prompt caching setup and debugging, quota health checks and throttling diagnosis, cost attribution and tracking, migrating between Claude model generations (4.5 to 4.6 to 4.7), chunking strategies, API selection (Converse vs InvokeModel), guardrail capabilities, and model selection. NOT for custom model training, Rekognition, or Comprehend.
308aws-serverless
Builds, deploys, manages, debugs, configures, and optimizes serverless applications on AWS using Lambda, API Gateway, Step Functions, EventBridge, and SAM/CDK. Covers cold starts, CORS debugging, event source mappings, troubleshooting, concurrency, SnapStart, Powertools, function URLs, EventBridge Scheduler, Lambda layers, Durable Functions, durable execution, checkpoint-and-replay, and production readiness. Use when the user mentions Lambda, API Gateway, Step Functions, SAM templates, CDK serverless stacks, DynamoDB stream triggers, SQS event sources, cold starts, timeouts, 502/504 errors, throttling, concurrency, CORS, Powertools, Durable Functions, durable execution, checkpoint-and-replay, or any event-driven architecture on AWS, even if they don't say "serverless." Do NOT use for EC2, ECS/Fargate containers, or Amplify hosting.
298aws-cdk
Authors, deploys, and troubleshoots AWS infrastructure using CDK with TypeScript or Python. Covers best practices, stack architecture, and construct patterns. Always use when writing CDK constructs, bootstrapping environments, running cdk deploy/synth/diff, fixing CDK or CloudFormation errors, planning stack structure, importing existing resources, resolving drift, or refactoring stacks without resource replacement.
290aws-cloudformation
Author, validate, and troubleshoot AWS CloudFormation templates. Covers template authoring with secure defaults, pre-deployment validation (cfn-lint, cfn-guard, change sets), and root-cause diagnosis of failed stacks using CloudFormation events and CloudTrail correlation.
284aws-observability
Builds, configures, debugs, and optimizes AWS observability using CloudWatch (Logs Insights, Metrics, Alarms, Dashboards, EMF), X-Ray, CloudTrail, and ADOT. Covers Log Insights query syntax (fields, filter, stats, parse, pattern, join, subqueries), alarm configuration (metric, composite, anomaly detection, missing data treatment), dashboard design, custom metrics (PutMetricData, EMF, metric filters), X-Ray tracing (ADOT, sampling rules, annotations vs metadata), ADOT collector config, and CloudTrail auditing. Use when the user mentions CloudWatch, Log Insights, alarms, INSUFFICIENT_DATA, dashboards, custom metrics, EMF, X-Ray, traces, sampling, CloudTrail, who deleted, ADOT, OpenTelemetry, observability, monitoring, synthetics, canaries, or troubleshooting alarm behavior. Do NOT use for application logging setup, container log drivers, or security threat detection.
282