Kubernetes Secrets Rotation: Versioned Pattern with Instant Rollback

7 min read
KubernetesAWSSecrets Manager

Table of Contents

Intro

We rotated an auth API secret in production. Error rates spiked within 30 seconds. Some pods had the new secret, others kept the old one. The external API had already invalidated the old token. Half our authentication requests failed.

We reverted in 5 minutes, but it exposed a fundamental problem: updating secrets in place doesn't work at scale.

Here's the versioned rotation pattern we built. No downtime. No emergency rollbacks. Just controlled deployments.

The Problem with In-Place Updates

Traditional approach:

1# Update secret value
2kubectl create secret generic auth-api-secrets \
3 --from-literal=api_token=<new-value> \
4 --dry-run=client -o yaml | kubectl apply -f -
5
6# Restart deployment
7kubectl rollout restart deployment/auth-service

This fails because:

  • Rolling restarts aren't instant - Pods restart gradually, creating a mixed state
  • External dependencies invalidate old tokens immediately - No grace period
  • No version tracking - Which secret version is running in production?
  • No rollback plan - Old value is gone, requires scrambling to find backups

In dev, this works. In production with gradual rollouts and external dependencies, it breaks.

Versioned Secret Pattern

Treat secrets like immutable deployments. Each rotation creates a new versioned secret.

Naming Convention

1apiVersion: v1
2kind: Secret
3metadata:
4 name: auth-api-secrets-20250723110000 # YYYYMMDDHHMMSS
5data:
6 api_token: <base64-encoded-value>

Timestamp format gives you:

  • Clear history - See all versions with kubectl get secrets | grep auth-api-secrets
  • Instant rollback - Keep old secret, revert deployment reference
  • Audit trail - Match timestamps to deployment history

Progressive Rollout Strategy

Step 1: Create New Secret

1kubectl create secret generic auth-api-secrets-20250723110000 \
2 --from-literal=api_token=<new-token> \
3 -n production

Old secret stays. Both exist simultaneously.

Step 2: Deploy to Staging

1# staging/deployment.yaml
2apiVersion: apps/v1
3kind: Deployment
4metadata:
5 name: auth-service
6spec:
7 template:
8 spec:
9 containers:
10 - name: app
11 envFrom:
12 - secretRef:
13 name: auth-api-secrets-20250723110000

Test thoroughly. If it breaks, old secret is still there.

Step 3: Production Rollout

Option A: Blue-Green Deployment

Run both versions temporarily:

1# Blue (old secret)
2apiVersion: apps/v1
3kind: Deployment
4metadata:
5 name: auth-service-blue
6spec:
7 replicas: 3
8 template:
9 spec:
10 containers:
11 - name: app
12 envFrom:
13 - secretRef:
14 name: auth-api-secrets-20250715100000
1# Green (new secret)
2apiVersion: apps/v1
3kind: Deployment
4metadata:
5 name: auth-service-green
6spec:
7 replicas: 3
8 template:
9 spec:
10 containers:
11 - name: app
12 envFrom:
13 - secretRef:
14 name: auth-api-secrets-20250723110000

Route 10% traffic to green. Monitor. Increase gradually.

Option B: Rolling Update with Monitoring

1apiVersion: apps/v1
2kind: Deployment
3metadata:
4 name: auth-service
5spec:
6 strategy:
7 type: RollingUpdate
8 rollingUpdate:
9 maxUnavailable: 1
10 maxSurge: 1
11 template:
12 spec:
13 containers:
14 - name: app
15 envFrom:
16 - secretRef:
17 name: auth-api-secrets-20250723110000

Watch error rates. Rollback immediately if spikes occur.

Step 4: Cleanup (1-2 Weeks Later)

1# Verify no references to old secret
2kubectl get deployments --all-namespaces -o yaml | grep "auth-api-secrets-20250715100000"
3
4# Delete old secret
5kubectl delete secret auth-api-secrets-20250715100000 -n production

Keep old secrets for at least two weeks. Weekly cronjobs might still reference them.

Instant Rollback

When new secret breaks production:

1# Revert deployment
2kubectl rollout undo deployment/auth-service -n production
3
4# Verify old secret exists
5kubectl get secret auth-api-secrets-20250715100000 -n production
6
7# Check status
8kubectl get pods -n production -l app=auth-service

Because old secret still exists, rollback is instant. No scrambling for old token values.

Environment-Specific Handling

Development

1apiVersion: v1
2kind: Secret
3metadata:
4 name: auth-api-secrets # No versioning
5data:
6 api_token: <dev-token>

No versioning needed. Break things and learn.

Staging

1apiVersion: v1
2kind: Secret
3metadata:
4 name: auth-api-secrets-20250723110000 # Versioned
5data:
6 api_token: <staging-token>

Full rotation process. Test failures here.

Production

  • Versioned secrets with timestamps
  • Progressive rollout
  • Monitoring at every step
  • Rollback procedures documented
  • Old secrets kept 1-2 weeks

Common Pitfalls

1. Forgetting References

Problem: Rotated secret but forgot one deployment still referenced old name.

Solution: Search before cleanup:

1kubectl get deployments --all-namespaces -o yaml | grep "old-secret-name"
2kubectl get pods --all-namespaces -o yaml | grep "old-secret-name"

2. Skipping Staging

Problem: "It's just a token rotation." Updated production directly. Token format changed. Auth broke.

Solution: No exceptions. Every secret change goes through staging.

3. Deleting Old Secrets Too Soon

Problem: Deleted old secret immediately. Weekly cronjob failed three days later.

Solution: Keep old secrets for 2+ weeks. Monthly cronjobs need longer.

4. No Monitoring

Problem: Checked logs manually. Looked fine. Got paged hours later. Errors were sporadic.

Solution: Set up monitors for secret rotations:

1# Datadog monitor (pseudo-config)
2monitor:
3 name: 'Auth API - High Error Rate'
4 query: 'avg(last_5m):sum:api.auth.errors{env:production} > 10'
5 message: 'Auth errors spiked. Check recent secret rotation.'

Watch error rates for 24 hours after rotation.

GitOps Integration

Secrets don't belong in Git. Secret references do.

What Goes in Git

1# manifests/production/deployment.yaml
2apiVersion: apps/v1
3kind: Deployment
4metadata:
5 name: auth-service
6spec:
7 template:
8 spec:
9 containers:
10 - name: app
11 envFrom:
12 - secretRef:
13 name: auth-api-secrets-20250723110000 # Reference only

Git contains the name, not the value.

What Stays Out of Git

Secret creation happens outside GitOps:

1kubectl create secret generic auth-api-secrets-20250723110000 \
2 --from-literal=api_token=<value> \
3 -n production

Or use External Secrets Operator:

1apiVersion: external-secrets.io/v1beta1
2kind: ExternalSecret
3metadata:
4 name: auth-api-secrets-20250723110000
5spec:
6 secretStoreRef:
7 name: aws-secretsmanager
8 target:
9 name: auth-api-secrets-20250723110000
10 data:
11 - secretKey: api_token
12 remoteRef:
13 key: /production/auth-api/token
14 version: 20250723110000

Operator syncs from AWS Secrets Manager. Value never touches Git.

Rotation Checklist

Before Rotation

  • Create new secret with timestamped name
  • Document rollback procedure
  • Verify monitoring is in place
  • Schedule during low-traffic window

Staging Deployment

  • Update staging deployment to new secret
  • Deploy and verify functionality
  • Check logs for auth errors
  • Run integration tests
  • Let run for 24 hours

Production Deployment

  • Verify old secret still exists
  • Update production deployment
  • Monitor error rates during rollout
  • Check authentication logs
  • Verify all pods restarted
  • Watch for 24 hours

Cleanup (1-2 Weeks Later)

  • Search for all references to old secret
  • Verify no pods use old secret
  • Delete old secret from Kubernetes
  • Delete old secret from external store
  • Update documentation

Key Takeaways

Versioning prevents incidents. Timestamped secret names give instant rollback and clear history.

Progressive rollout matters. Staging → Production → Monitor → Cleanup. No shortcuts.

Keep old secrets longer than you think. Weekly cronjobs, monthly jobs, obscure scripts all need secrets.

Automate monitoring. Can't watch logs manually for every rotation. Set up alerts.

Secrets aren't config. They expire. They break unexpectedly. Treat them with extra caution.

TL;DR

  1. Use versioned secret names with timestamps (secret-name-20250723110000)
  2. Create new secret, keep old one during transition
  3. Test in staging first, always
  4. Deploy to production gradually with monitoring
  5. Keep old secrets for 1-2 weeks minimum
  6. Document rollback procedures
  7. Use External Secrets Operator for GitOps
  8. Set up monitoring for secret rotations

This pattern eliminated our secret rotation incidents. No emergency rollbacks since implementation.

Related Articles