managing-dns
Scannednpx machina-cli add skill ancoleman/ai-design-components/managing-dns --openclawDNS Management
Configure and automate DNS records with proper TTL strategies, DNS-as-code patterns, and troubleshooting techniques.
Purpose
Guide DNS configuration for applications, infrastructure, and services with focus on:
- Record type selection (A, AAAA, CNAME, MX, TXT, SRV, CAA)
- TTL strategies for propagation and caching
- DNS-as-code automation (external-dns, OctoDNS, DNSControl)
- Cloud DNS services comparison and selection
- DNS-based load balancing patterns
- Troubleshooting tools and techniques
When to Use This Skill
Apply DNS management patterns when:
- Setting up DNS for new applications or services
- Automating DNS updates from Kubernetes workloads
- Configuring DNS-based failover or load balancing
- Troubleshooting DNS propagation or resolution issues
- Migrating DNS between providers
- Planning DNS changes with minimal downtime
- Implementing GeoDNS for global users
Record Type Selection
Quick Reference
Address Resolution:
- A Record: Map hostname to IPv4 address (example.com → 192.0.2.1)
- AAAA Record: Map hostname to IPv6 address (example.com → 2001:db8::1)
- CNAME Record: Alias to another domain (www.example.com → example.com)
- Cannot use at zone apex (@)
- Cannot coexist with other records at same name
Email Configuration:
- MX Record: Direct email to mail servers with priority
- TXT Record: Email authentication (SPF, DKIM, DMARC) and verification
Service Discovery:
- SRV Record: Specify service location (protocol, priority, weight, port, target)
Delegation and Security:
- NS Record: Delegate subdomain to different nameservers
- CAA Record: Restrict which Certificate Authorities can issue certificates
Cloud-Specific:
- ALIAS Record: Like CNAME but works at zone apex (Route53, Cloudflare)
Decision Tree
Need to point domain to:
├─ IPv4 Address? → A record
├─ IPv6 Address? → AAAA record
├─ Another Domain?
│ ├─ Zone apex (@) → ALIAS/ANAME or A record
│ └─ Subdomain → CNAME
├─ Mail Server? → MX record (with priority)
├─ Email Authentication? → TXT record (SPF/DKIM/DMARC)
├─ Service Discovery? → SRV record
├─ Domain Verification? → TXT record
├─ Certificate Control? → CAA record
└─ Subdomain Delegation? → NS record
For detailed record type examples and patterns, see references/record-types.md.
TTL Strategy
Standard TTL Values
By Change Frequency:
- Stable records: 3600-86400s (1-24 hours) - NS, stable A/AAAA
- Normal operation: 3600s (1 hour) - Standard websites, MX
- Moderate changes: 300-1800s (5-30 min) - Development, A/B testing
- Failover scenarios: 60-300s (1-5 min) - Critical records needing fast updates
Key Principle: Lower TTL = faster propagation but higher DNS query load
Pre-Change Process
When planning DNS changes:
T-48h: Lower TTL to 300s
T-24h: Verify TTL propagated globally
T-0h: Make DNS change
T+1h: Verify new records propagating
T+6h: Confirm global propagation
T+24h: Raise TTL back to normal (3600s)
Propagation Formula: Max Time = Old TTL + New TTL + Query Time
Example: Changing a record with 3600s TTL takes up to 2 hours to fully propagate.
TTL by Use Case
| Use Case | TTL | Rationale |
|---|---|---|
| Production (stable) | 3600s | Balance speed and load |
| Before planned change | 300s | Fast propagation |
| Development/staging | 300-600s | Frequent changes |
| DNS-based failover | 60-300s | Fast recovery |
| Mail servers | 3600s | Rarely change |
| NS records | 86400s | Very stable |
For detailed TTL scenarios and calculations, see references/ttl-strategies.md.
DNS-as-Code Tools
Tool Selection by Use Case
Kubernetes DNS Automation → external-dns
- Annotation-based configuration on Services/Ingresses
- Automatic sync to DNS providers (20+ supported)
- No manual DNS updates required
- See
examples/external-dns/
Multi-Provider DNS Management → OctoDNS or DNSControl
- Version control for DNS records
- Sync configuration across multiple providers
- Preview changes before applying
- OctoDNS (Python/YAML) - See
examples/octodns/ - DNSControl (JavaScript) - See
examples/dnscontrol/
Infrastructure-as-Code → Terraform
- Manage DNS alongside cloud resources
- Provider-specific resources (aws_route53_record, etc.)
- See
examples/terraform/
Tool Comparison
| Tool | Language | Best For | Kubernetes | Multi-Provider |
|---|---|---|---|---|
| external-dns | Go | K8s automation | ★★★★★ | ★★★★ |
| OctoDNS | Python/YAML | Version control | ★★★ | ★★★★★ |
| DNSControl | JavaScript | Complex logic | ★★ | ★★★★★ |
| Terraform | HCL | IaC integration | ★★★ | ★★★★ |
Quick Start: external-dns
# Kubernetes Service with DNS annotation
apiVersion: v1
kind: Service
metadata:
name: app
annotations:
external-dns.alpha.kubernetes.io/hostname: app.example.com
external-dns.alpha.kubernetes.io/ttl: "300"
spec:
type: LoadBalancer
ports:
- port: 80
Deploy external-dns controller once, then all annotated Services/Ingresses automatically create DNS records.
For complete examples, see examples/external-dns/ and references/dns-as-code-comparison.md.
Cloud DNS Provider Selection
Provider Characteristics
AWS Route53
- Best for AWS-heavy infrastructure
- Advanced routing policies (weighted, latency, geolocation, failover)
- Health checks with automatic failover
- ALIAS records for AWS resources (ELB, CloudFront, S3)
- Pricing: $0.50/month per zone + $0.40 per million queries
Google Cloud DNS
- Best for GCP-native applications
- Strong DNSSEC support with automatic key rotation
- Private zones for VPC internal DNS
- Split-horizon DNS (different internal/external records)
- Pricing: $0.20/month per zone + $0.40 per million queries
Azure DNS
- Best for Azure-native applications
- Integration with Azure Traffic Manager
- Azure Private DNS zones
- Azure RBAC for access control
- Pricing: $0.50/month per zone + $0.40 per million queries
Cloudflare
- Best for multi-cloud or cloud-agnostic
- Fastest DNS query times globally
- Built-in DDoS protection
- Free tier with unlimited queries
- CDN integration
- Pricing: Free tier, $20/month Pro, $200/month Business
Selection Decision Tree
Choose based on:
├─ AWS-heavy? → Route53
├─ GCP-native? → Cloud DNS
├─ Azure-native? → Azure DNS
├─ Multi-cloud? → Cloudflare or OctoDNS/DNSControl
├─ Need fastest global DNS? → Cloudflare
├─ Need DDoS protection? → Cloudflare
└─ Budget-conscious? → Cloudflare (free tier) or Cloud DNS (lowest zone cost)
For detailed provider comparisons and examples, see references/cloud-providers.md.
DNS-Based Load Balancing
GeoDNS (Geographic Routing)
Return different IP addresses based on client location to:
- Reduce latency (route to nearest data center)
- Comply with data residency requirements
- Distribute load across regions
Example Pattern:
Client Location → DNS Response
├─ North America → 192.0.2.1 (US data center)
├─ Europe → 192.0.2.10 (EU data center)
└─ Default → CloudFront edge (global CDN)
Weighted Routing
Distribute traffic by percentage for:
- Blue-green deployments
- Canary releases (10% to new version)
- A/B testing
Example Pattern:
DNS Responses:
├─ 90% → 192.0.2.1 (stable version)
└─ 10% → 192.0.2.2 (canary version)
Health Check-Based Failover
Automatically route traffic away from unhealthy endpoints.
Pattern:
Primary: 192.0.2.1 (health checked every 30s)
├─ Healthy → Return primary IP
└─ Unhealthy → Return secondary IP (192.0.2.2)
Failover time: ~2-3 minutes
= Health check failures (90s) + TTL expiration (60s)
For complete load balancing examples, see examples/load-balancing/.
Troubleshooting
Essential Commands
Check DNS Resolution:
# Basic query
dig example.com
# Clean output (just IP)
dig example.com +short
# Query specific DNS server
dig @8.8.8.8 example.com
dig @1.1.1.1 example.com
# Trace resolution path
dig +trace example.com
Check TTL:
dig example.com | grep -A1 "ANSWER SECTION"
# Look for TTL value (number before IN A)
Check Propagation:
# Multiple resolvers
dig @8.8.8.8 example.com +short # Google
dig @1.1.1.1 example.com +short # Cloudflare
dig @208.67.222.222 example.com +short # OpenDNS
Flush Local DNS Cache:
# macOS
sudo dscacheutil -flushcache; sudo killall -HUP mDNSResponder
# Windows
ipconfig /flushdns
# Linux
sudo systemd-resolve --flush-caches
Common Problems
Slow Propagation:
- Check current TTL (old TTL must expire first)
- Lower TTL 24-48 hours before changes
- Use propagation checkers: whatsmydns.net, dnschecker.org
CNAME at Zone Apex:
- Error: Cannot use CNAME at @ (zone apex)
- Solution: Use ALIAS record (Route53, Cloudflare) or A record
external-dns Not Creating Records:
- Verify annotation spelling:
external-dns.alpha.kubernetes.io/hostname - Check domain filter matches:
--domain-filter=example.com - Review external-dns logs for errors
- Confirm provider credentials configured
For detailed troubleshooting, see references/troubleshooting.md.
Common Patterns
Pattern 1: Kubernetes DNS Automation
# Deploy external-dns (once per cluster)
helm install external-dns external-dns/external-dns \
--set provider=aws \
--set domainFilters[0]=example.com \
--set policy=sync
# Then annotate Services
apiVersion: v1
kind: Service
metadata:
annotations:
external-dns.alpha.kubernetes.io/hostname: api.example.com
external-dns.alpha.kubernetes.io/ttl: "300"
spec:
type: LoadBalancer
Pattern 2: Multi-Provider Sync with OctoDNS
# octodns-config.yaml
providers:
config:
class: octodns.provider.yaml.YamlProvider
directory: ./config
route53:
class: octodns_route53.Route53Provider
cloudflare:
class: octodns_cloudflare.CloudflareProvider
zones:
example.com.:
sources: [config]
targets: [route53, cloudflare]
Pattern 3: DNS-Based Failover
# Route53 with health checks
resource "aws_route53_health_check" "primary" {
fqdn = "primary.example.com"
port = 443
type = "HTTPS"
resource_path = "/health"
failure_threshold = 3
request_interval = 30
}
resource "aws_route53_record" "primary" {
zone_id = aws_route53_zone.main.zone_id
name = "api.example.com"
type = "A"
ttl = 60
set_identifier = "primary"
failover_routing_policy {
type = "PRIMARY"
}
health_check_id = aws_route53_health_check.primary.id
records = ["192.0.2.1"]
}
resource "aws_route53_record" "secondary" {
zone_id = aws_route53_zone.main.zone_id
name = "api.example.com"
type = "A"
ttl = 60
set_identifier = "secondary"
failover_routing_policy {
type = "SECONDARY"
}
records = ["192.0.2.2"]
}
Integration with Other Skills
infrastructure-as-code:
- Manage DNS via Terraform/Pulumi alongside other resources
- Zone configuration in IaC repositories
kubernetes-operations:
- external-dns automates DNS for Kubernetes workloads
- Ingress controller integration for automatic DNS
load-balancing-patterns:
- DNS-based load balancing (GeoDNS, weighted routing)
- Health checks and failover configurations
security-hardening:
- DNSSEC for DNS integrity
- CAA records for certificate authority control
- DNS-based DDoS mitigation
secret-management:
- Store DNS provider API credentials in vaults
- Secure DDNS update mechanisms
Additional Resources
Reference Documentation:
references/record-types.md- Detailed record type guide with examplesreferences/ttl-strategies.md- TTL scenarios and propagation calculationsreferences/cloud-providers.md- Provider comparison and detailed featuresreferences/troubleshooting.md- Common problems and solutionsreferences/dns-as-code-comparison.md- Tool comparison matrix
Examples:
examples/external-dns/- Kubernetes DNS automationexamples/octodns/- Multi-provider sync with YAMLexamples/dnscontrol/- Multi-provider with JavaScript DSLexamples/terraform/- Cloud provider configurationsexamples/load-balancing/- GeoDNS and failover patterns
Scripts:
scripts/check-dns-propagation.sh- Verify propagation across resolversscripts/validate-dns-config.py- Validate DNS configurationscripts/export-dns-records.sh- Export existing DNS recordsscripts/calculate-ttl-propagation.py- Calculate propagation time
Quick Reference
Record Types Cheat Sheet
| Record | Purpose | Example |
|---|---|---|
| A | IPv4 address | example.com → 192.0.2.1 |
| AAAA | IPv6 address | example.com → 2001:db8::1 |
| CNAME | Alias to domain | www → example.com |
| MX | Mail server | 10 mail.example.com |
| TXT | Text/verification | "v=spf1 include:_spf.google.com ~all" |
| SRV | Service location | 10 60 5060 sip.example.com |
| NS | Nameserver delegation | ns1.provider.com |
| CAA | CA authorization | 0 issue "letsencrypt.org" |
TTL Cheat Sheet
| Scenario | TTL | Why |
|---|---|---|
| Stable production | 3600s | Balance speed/load |
| Before change | 300s | Fast propagation |
| Failover | 60-300s | Fast recovery |
| NS records | 86400s | Very stable |
Provider Cheat Sheet
| Provider | Best For | Key Feature |
|---|---|---|
| Route53 | AWS | Advanced routing, health checks |
| Cloud DNS | GCP | DNSSEC, private zones |
| Azure DNS | Azure | Traffic Manager integration |
| Cloudflare | Multi-cloud | Fastest, DDoS protection, free tier |
Tool Cheat Sheet
| Tool | Use When |
|---|---|
| external-dns | Kubernetes DNS automation |
| OctoDNS | Multi-provider, Python shop |
| DNSControl | Multi-provider, JavaScript preference |
| Terraform | Managing DNS with other infrastructure |
Source
git clone https://github.com/ancoleman/ai-design-components/blob/main/skills/managing-dns/SKILL.mdView on GitHub Overview
Manage DNS records, TTL strategies, and DNS-as-code automation for infrastructure. It covers record types, TTL planning, and automation patterns across cloud providers such as Route53, Cloud DNS, Azure DNS, and Cloudflare.
How This Skill Works
You choose appropriate record types (A, AAAA, CNAME, MX, TXT, SRV, CAA), apply TTL strategies to balance propagation speed and load, and implement DNS-as-code automation with tools like external-dns, OctoDNS, and DNSControl. The approach also includes cloud-provider comparisons and DNS-based load balancing patterns, plus troubleshooting techniques.
When to Use It
- Setting up DNS for a new application or service
- Automating DNS updates from Kubernetes workloads via external-dns
- Configuring DNS-based failover or load balancing
- Troubleshooting DNS propagation or resolution issues
- Migrating DNS between providers with minimal downtime
Quick Start
- Step 1: Identify required records (A/AAAA/CNAME/MX/TXT/SRV/CAA) for your app
- Step 2: Choose a TTL strategy based on change frequency and propagation needs
- Step 3: Implement with DNS-as-code tools (external-dns, OctoDNS, DNSControl) and verify propagation
Best Practices
- Match record types to service needs (A/AAAA for hosts, MX for mail, TXT for auth)
- Plan TTLs by change frequency and propagation goals
- Use DNS-as-code tools (external-dns, OctoDNS, DNSControl) for reproducible configurations
- Test changes in staging and monitor global propagation
- Document provider capabilities (ALIAS/ANAME, CAA, NS) and adapt accordingly
Example Use Cases
- Point example.com to an IPv4 address with an A record
- Use ALIAS/ANAME at zone apex for Route53 or Cloudflare for apex mapping
- Publish SPF/DKIM/DMARC via TXT records for email authentication
- Automate DNS updates from Kubernetes workloads using external-dns
- Configure DNS-based load balancing and geo-based routing with low TTLs