Intro
We rotated an auth API secret in production. Error rates spiked within 30 seconds. Some pods had the new secret, others kept the old one. The external API had already invalidated the old token. Half our authentication requests failed.
We reverted in 5 minutes, but it exposed a fundamental problem: updating secrets in place doesn't work at scale.
Here's the versioned rotation pattern we built. No downtime. No emergency rollbacks. Just controlled deployments.
The Problem with In-Place Updates
Traditional approach:
1# Update secret value2kubectl create secret generic auth-api-secrets \3 --from-literal=api_token=<new-value> \4 --dry-run=client -o yaml | kubectl apply -f -56# Restart deployment7kubectl rollout restart deployment/auth-service
This fails because:
- Rolling restarts aren't instant - Pods restart gradually, creating a mixed state
- External dependencies invalidate old tokens immediately - No grace period
- No version tracking - Which secret version is running in production?
- No rollback plan - Old value is gone, requires scrambling to find backups
In dev, this works. In production with gradual rollouts and external dependencies, it breaks.
Versioned Secret Pattern
Treat secrets like immutable deployments. Each rotation creates a new versioned secret.
Naming Convention
1apiVersion: v12kind: Secret3metadata:4 name: auth-api-secrets-20250723110000 # YYYYMMDDHHMMSS5data:6 api_token: <base64-encoded-value>
Timestamp format gives you:
- Clear history - See all versions with
kubectl get secrets | grep auth-api-secrets - Instant rollback - Keep old secret, revert deployment reference
- Audit trail - Match timestamps to deployment history
Progressive Rollout Strategy
Step 1: Create New Secret
1kubectl create secret generic auth-api-secrets-20250723110000 \2 --from-literal=api_token=<new-token> \3 -n production
Old secret stays. Both exist simultaneously.
Step 2: Deploy to Staging
1# staging/deployment.yaml2apiVersion: apps/v13kind: Deployment4metadata:5 name: auth-service6spec:7 template:8 spec:9 containers:10 - name: app11 envFrom:12 - secretRef:13 name: auth-api-secrets-20250723110000
Test thoroughly. If it breaks, old secret is still there.
Step 3: Production Rollout
Option A: Blue-Green Deployment
Run both versions temporarily:
1# Blue (old secret)2apiVersion: apps/v13kind: Deployment4metadata:5 name: auth-service-blue6spec:7 replicas: 38 template:9 spec:10 containers:11 - name: app12 envFrom:13 - secretRef:14 name: auth-api-secrets-20250715100000
1# Green (new secret)2apiVersion: apps/v13kind: Deployment4metadata:5 name: auth-service-green6spec:7 replicas: 38 template:9 spec:10 containers:11 - name: app12 envFrom:13 - secretRef:14 name: auth-api-secrets-20250723110000
Route 10% traffic to green. Monitor. Increase gradually.
Option B: Rolling Update with Monitoring
1apiVersion: apps/v12kind: Deployment3metadata:4 name: auth-service5spec:6 strategy:7 type: RollingUpdate8 rollingUpdate:9 maxUnavailable: 110 maxSurge: 111 template:12 spec:13 containers:14 - name: app15 envFrom:16 - secretRef:17 name: auth-api-secrets-20250723110000
Watch error rates. Rollback immediately if spikes occur.
Step 4: Cleanup (1-2 Weeks Later)
1# Verify no references to old secret2kubectl get deployments --all-namespaces -o yaml | grep "auth-api-secrets-20250715100000"34# Delete old secret5kubectl delete secret auth-api-secrets-20250715100000 -n production
Keep old secrets for at least two weeks. Weekly cronjobs might still reference them.
Instant Rollback
When new secret breaks production:
1# Revert deployment2kubectl rollout undo deployment/auth-service -n production34# Verify old secret exists5kubectl get secret auth-api-secrets-20250715100000 -n production67# Check status8kubectl get pods -n production -l app=auth-service
Because old secret still exists, rollback is instant. No scrambling for old token values.
Environment-Specific Handling
Development
1apiVersion: v12kind: Secret3metadata:4 name: auth-api-secrets # No versioning5data:6 api_token: <dev-token>
No versioning needed. Break things and learn.
Staging
1apiVersion: v12kind: Secret3metadata:4 name: auth-api-secrets-20250723110000 # Versioned5data:6 api_token: <staging-token>
Full rotation process. Test failures here.
Production
- Versioned secrets with timestamps
- Progressive rollout
- Monitoring at every step
- Rollback procedures documented
- Old secrets kept 1-2 weeks
Common Pitfalls
1. Forgetting References
Problem: Rotated secret but forgot one deployment still referenced old name.
Solution: Search before cleanup:
1kubectl get deployments --all-namespaces -o yaml | grep "old-secret-name"2kubectl get pods --all-namespaces -o yaml | grep "old-secret-name"
2. Skipping Staging
Problem: "It's just a token rotation." Updated production directly. Token format changed. Auth broke.
Solution: No exceptions. Every secret change goes through staging.
3. Deleting Old Secrets Too Soon
Problem: Deleted old secret immediately. Weekly cronjob failed three days later.
Solution: Keep old secrets for 2+ weeks. Monthly cronjobs need longer.
4. No Monitoring
Problem: Checked logs manually. Looked fine. Got paged hours later. Errors were sporadic.
Solution: Set up monitors for secret rotations:
1# Datadog monitor (pseudo-config)2monitor:3 name: 'Auth API - High Error Rate'4 query: 'avg(last_5m):sum:api.auth.errors{env:production} > 10'5 message: 'Auth errors spiked. Check recent secret rotation.'
Watch error rates for 24 hours after rotation.
GitOps Integration
Secrets don't belong in Git. Secret references do.
What Goes in Git
1# manifests/production/deployment.yaml2apiVersion: apps/v13kind: Deployment4metadata:5 name: auth-service6spec:7 template:8 spec:9 containers:10 - name: app11 envFrom:12 - secretRef:13 name: auth-api-secrets-20250723110000 # Reference only
Git contains the name, not the value.
What Stays Out of Git
Secret creation happens outside GitOps:
1kubectl create secret generic auth-api-secrets-20250723110000 \2 --from-literal=api_token=<value> \3 -n production
Or use External Secrets Operator:
1apiVersion: external-secrets.io/v1beta12kind: ExternalSecret3metadata:4 name: auth-api-secrets-202507231100005spec:6 secretStoreRef:7 name: aws-secretsmanager8 target:9 name: auth-api-secrets-2025072311000010 data:11 - secretKey: api_token12 remoteRef:13 key: /production/auth-api/token14 version: 20250723110000
Operator syncs from AWS Secrets Manager. Value never touches Git.
Rotation Checklist
Before Rotation
- Create new secret with timestamped name
- Document rollback procedure
- Verify monitoring is in place
- Schedule during low-traffic window
Staging Deployment
- Update staging deployment to new secret
- Deploy and verify functionality
- Check logs for auth errors
- Run integration tests
- Let run for 24 hours
Production Deployment
- Verify old secret still exists
- Update production deployment
- Monitor error rates during rollout
- Check authentication logs
- Verify all pods restarted
- Watch for 24 hours
Cleanup (1-2 Weeks Later)
- Search for all references to old secret
- Verify no pods use old secret
- Delete old secret from Kubernetes
- Delete old secret from external store
- Update documentation
Key Takeaways
Versioning prevents incidents. Timestamped secret names give instant rollback and clear history.
Progressive rollout matters. Staging → Production → Monitor → Cleanup. No shortcuts.
Keep old secrets longer than you think. Weekly cronjobs, monthly jobs, obscure scripts all need secrets.
Automate monitoring. Can't watch logs manually for every rotation. Set up alerts.
Secrets aren't config. They expire. They break unexpectedly. Treat them with extra caution.
TL;DR
- Use versioned secret names with timestamps (
secret-name-20250723110000) - Create new secret, keep old one during transition
- Test in staging first, always
- Deploy to production gradually with monitoring
- Keep old secrets for 1-2 weeks minimum
- Document rollback procedures
- Use External Secrets Operator for GitOps
- Set up monitoring for secret rotations
This pattern eliminated our secret rotation incidents. No emergency rollbacks since implementation.
