Logo

Kubernetes Secrets Rotation: Versioned Pattern with Instant Rollback

7 min read
KubernetesAWSSecrets Manager

Table of Contents

Intro

We rotated an auth API secret in production. Error rates spiked within 30 seconds. Some pods had the new secret, others kept the old one. The external API had already invalidated the old token. Half our authentication requests failed.

We reverted in 5 minutes, but it exposed a fundamental problem: updating secrets in place doesn't work at scale.

Here's the versioned rotation pattern we built. No downtime. No emergency rollbacks. Just controlled deployments.

The Problem with In-Place Updates

Traditional approach:

1# Update secret value
2kubectl create secret generic auth-api-secrets \
3 --from-literal=api_token=<new-value> \
4 --dry-run=client -o yaml | kubectl apply -f -
5
6# Restart deployment
7kubectl rollout restart deployment/auth-service

This fails because:

In dev, this works. In production with gradual rollouts and external dependencies, it breaks.

Versioned Secret Pattern

Treat secrets like immutable deployments. Each rotation creates a new versioned secret.

Naming Convention

1apiVersion: v1
2kind: Secret
3metadata:
4 name: auth-api-secrets-20250723110000 # YYYYMMDDHHMMSS
5data:
6 api_token: <base64-encoded-value>

Timestamp format gives you:

Progressive Rollout Strategy

Step 1: Create New Secret

1kubectl create secret generic auth-api-secrets-20250723110000 \
2 --from-literal=api_token=<new-token> \
3 -n production

Old secret stays. Both exist simultaneously.

Step 2: Deploy to Staging

1# staging/deployment.yaml
2apiVersion: apps/v1
3kind: Deployment
4metadata:
5 name: auth-service
6spec:
7 template:
8 spec:
9 containers:
10 - name: app
11 envFrom:
12 - secretRef:
13 name: auth-api-secrets-20250723110000

Test thoroughly. If it breaks, old secret is still there.

Step 3: Production Rollout

Option A: Blue-Green Deployment

Run both versions temporarily:

1# Blue (old secret)
2apiVersion: apps/v1
3kind: Deployment
4metadata:
5 name: auth-service-blue
6spec:
7 replicas: 3
8 template:
9 spec:
10 containers:
11 - name: app
12 envFrom:
13 - secretRef:
14 name: auth-api-secrets-20250715100000
1# Green (new secret)
2apiVersion: apps/v1
3kind: Deployment
4metadata:
5 name: auth-service-green
6spec:
7 replicas: 3
8 template:
9 spec:
10 containers:
11 - name: app
12 envFrom:
13 - secretRef:
14 name: auth-api-secrets-20250723110000

Route 10% traffic to green. Monitor. Increase gradually.

Option B: Rolling Update with Monitoring

1apiVersion: apps/v1
2kind: Deployment
3metadata:
4 name: auth-service
5spec:
6 strategy:
7 type: RollingUpdate
8 rollingUpdate:
9 maxUnavailable: 1
10 maxSurge: 1
11 template:
12 spec:
13 containers:
14 - name: app
15 envFrom:
16 - secretRef:
17 name: auth-api-secrets-20250723110000

Watch error rates. Rollback immediately if spikes occur.

Step 4: Cleanup (1-2 Weeks Later)

1# Verify no references to old secret
2kubectl get deployments --all-namespaces -o yaml | grep "auth-api-secrets-20250715100000"
3
4# Delete old secret
5kubectl delete secret auth-api-secrets-20250715100000 -n production

Keep old secrets for at least two weeks. Weekly cronjobs might still reference them.

Instant Rollback

When new secret breaks production:

1# Revert deployment
2kubectl rollout undo deployment/auth-service -n production
3
4# Verify old secret exists
5kubectl get secret auth-api-secrets-20250715100000 -n production
6
7# Check status
8kubectl get pods -n production -l app=auth-service

Because old secret still exists, rollback is instant. No scrambling for old token values.

Environment-Specific Handling

Development

1apiVersion: v1
2kind: Secret
3metadata:
4 name: auth-api-secrets # No versioning
5data:
6 api_token: <dev-token>

No versioning needed. Break things and learn.

Staging

1apiVersion: v1
2kind: Secret
3metadata:
4 name: auth-api-secrets-20250723110000 # Versioned
5data:
6 api_token: <staging-token>

Full rotation process. Test failures here.

Production

Common Pitfalls

1. Forgetting References

Problem: Rotated secret but forgot one deployment still referenced old name.

Solution: Search before cleanup:

1kubectl get deployments --all-namespaces -o yaml | grep "old-secret-name"
2kubectl get pods --all-namespaces -o yaml | grep "old-secret-name"

2. Skipping Staging

Problem: "It's just a token rotation." Updated production directly. Token format changed. Auth broke.

Solution: No exceptions. Every secret change goes through staging.

3. Deleting Old Secrets Too Soon

Problem: Deleted old secret immediately. Weekly cronjob failed three days later.

Solution: Keep old secrets for 2+ weeks. Monthly cronjobs need longer.

4. No Monitoring

Problem: Checked logs manually. Looked fine. Got paged hours later. Errors were sporadic.

Solution: Set up monitors for secret rotations:

1# Datadog monitor (pseudo-config)
2monitor:
3 name: 'Auth API - High Error Rate'
4 query: 'avg(last_5m):sum:api.auth.errors{env:production} > 10'
5 message: 'Auth errors spiked. Check recent secret rotation.'

Watch error rates for 24 hours after rotation.

GitOps Integration

Secrets don't belong in Git. Secret references do.

What Goes in Git

1# manifests/production/deployment.yaml
2apiVersion: apps/v1
3kind: Deployment
4metadata:
5 name: auth-service
6spec:
7 template:
8 spec:
9 containers:
10 - name: app
11 envFrom:
12 - secretRef:
13 name: auth-api-secrets-20250723110000 # Reference only

Git contains the name, not the value.

What Stays Out of Git

Secret creation happens outside GitOps:

1kubectl create secret generic auth-api-secrets-20250723110000 \
2 --from-literal=api_token=<value> \
3 -n production

Or use External Secrets Operator:

1apiVersion: external-secrets.io/v1beta1
2kind: ExternalSecret
3metadata:
4 name: auth-api-secrets-20250723110000
5spec:
6 secretStoreRef:
7 name: aws-secretsmanager
8 target:
9 name: auth-api-secrets-20250723110000
10 data:
11 - secretKey: api_token
12 remoteRef:
13 key: /production/auth-api/token
14 version: 20250723110000

Operator syncs from AWS Secrets Manager. Value never touches Git.

Rotation Checklist

Before Rotation

Staging Deployment

Production Deployment

Cleanup (1-2 Weeks Later)

Key Takeaways

Versioning prevents incidents. Timestamped secret names give instant rollback and clear history.

Progressive rollout matters. Staging → Production → Monitor → Cleanup. No shortcuts.

Keep old secrets longer than you think. Weekly cronjobs, monthly jobs, obscure scripts all need secrets.

Automate monitoring. Can't watch logs manually for every rotation. Set up alerts.

Secrets aren't config. They expire. They break unexpectedly. Treat them with extra caution.

TL;DR

  1. Use versioned secret names with timestamps (secret-name-20250723110000)
  2. Create new secret, keep old one during transition
  3. Test in staging first, always
  4. Deploy to production gradually with monitoring
  5. Keep old secrets for 1-2 weeks minimum
  6. Document rollback procedures
  7. Use External Secrets Operator for GitOps
  8. Set up monitoring for secret rotations

This pattern eliminated our secret rotation incidents. No emergency rollbacks since implementation.

Related Articles