Auto-Refreshing GCS Memory with Pub/Sub: Fixing the Stale Cache Problem

Intro

After adding GCS memory to Gemini, I had a bot that could answer team-specific questions from markdown files. Update a file in GCS, and the bot knows about it. Simple.

Except... it didn't.

I'd update a markdown file in the bucket, ask the bot a question, and it would give me the old answer. Every time. The only way to get it to pick up changes was to redeploy the entire service.

The goal: Make the bot automatically refresh its knowledge when GCS files are updated - without redeploying.

Finding the Problem

Two Layers of Staleness

Looking at the code I wrote in the previous post, the problem was staring right at me. There are two layers of caching, and neither ever refreshes.

Layer 1: MemoryService._cache - A plain Python dict that caches every file after the first GCS download:

1class MemoryService:
2    def __init__(self) -> None:
3        self._cache: dict[str, str] = {}
4
5    def _load_file(self, blob_name: str) -> Optional[str]:
6        if blob_name in self._cache:
7            return self._cache[blob_name]  # Always returns cached version
8
9        blob = self.bucket.blob(blob_name)
10        content = blob.download_as_text()
11        self._cache[blob_name] = content  # Cached forever
12        return content

Layer 2: GeminiService.system_instruction - The memory content gets baked into Gemini's system instruction at startup:

1class GeminiService:
2    def __init__(self) -> None:
3        self.memory_content = self._load_memory()  # Loaded once
4        self.system_instruction = self._build_system_instruction()  # Built once
5        self.model = GenerativeModel(
6            self.model_name, system_instruction=self.system_instruction  # Set once
7        )

Both services are global singletons. Both are created once when the process starts. Neither ever updates.

Cloud Run scales to zero when idle. When it scales back up, a fresh container starts, singletons are re-created, and GCS files are re-fetched. So if you update a file and wait long enough, the bot eventually picks it up - whenever the current instance dies and a new one spins up.

The problem shows up when the bot is actively being used. Cloud Run keeps the instance alive, the cache never clears, and you're stuck with whatever was loaded at startup.

The Fix: A Reload Endpoint

I already had MemoryService.refresh() and GeminiService.reload_memory() methods from the previous implementation. They just weren't wired up to anything.

Bug in `reload_memory()`

First, a subtle bug. The existing reload_memory() called _load_memory(), which calls load_all_memory(), which calls _load_file() - which checks the cache. So even "reloading" would return stale data.

1# Before: doesn't actually refresh because _load_memory() hits the cache
2def reload_memory(self) -> None:
3    self.memory_content = self._load_memory()
4    # ...

Fixed by calling memory_service.refresh() directly, which clears the cache first:

1# After: clears cache before re-fetching
2def reload_memory(self) -> None:
3    memory_service = get_memory_service()
4    self.memory_content = memory_service.refresh()
5    self.system_instruction = self._build_system_instruction()
6    self.model = GenerativeModel(
7        self.model_name, system_instruction=self.system_instruction
8    )
9    self.chat_sessions.clear()
10    logger.info("Memory reloaded and chat sessions cleared")

Adding the HTTP Endpoint

Then I added a POST endpoint in main.py:

1@app.route("/admin/reload-memory", methods=["POST"])
2def reload_memory():
3    # ... auth check (see next section) ...
4
5    try:
6        gemini_service = get_gemini_service()
7        gemini_service.reload_memory()
8        logger.info("Memory reload triggered successfully")
9        return jsonify({"status": "ok", "message": "Memory reloaded"}), 200
10    except Exception as e:
11        logger.error(f"Memory reload failed: {e}")
12        return jsonify({"status": "error", "message": str(e)}), 500

Securing the Endpoint

My Cloud Run service uses --allow-unauthenticated because Slack needs to send webhooks to it. That means anyone on the internet could curl /admin/reload-memory.

The endpoint only reloads cached data (not destructive), but I still don't want random people triggering it.

Why Not an API Key?

My first thought was a simple Bearer token check. But I wanted to trigger this from GCS via Pub/Sub, and Pub/Sub push subscriptions don't support custom Authorization headers - they send their own OIDC JWT token.

OIDC Token Verification

Since we're in GCP land, the proper approach is to verify Google-signed OIDC tokens. When Pub/Sub pushes a message, it includes a JWT signed by Google with the service account's identity. We can verify that.

google-auth is already installed as a dependency of google-cloud-storage, so no new packages needed:

1from google.oauth2 import id_token
2from google.auth.transport import requests as google_auth_requests
3
4@app.route("/admin/reload-memory", methods=["POST"])
5def reload_memory():
6    auth_header = request.headers.get("Authorization", "")
7    if not auth_header.startswith("Bearer "):
8        return jsonify({"status": "error", "message": "Unauthorized"}), 401
9
10    token = auth_header.split(" ", 1)[1]
11    try:
12        audience = os.getenv("CLOUD_RUN_URL")
13        claim = id_token.verify_oauth2_token(
14            token, google_auth_requests.Request(), audience=audience
15        )
16        allowed_emails = [
17            e.strip()
18            for e in os.getenv("WIF_SERVICE_ACCOUNT", "").split(",")
19            if e.strip()
20        ]
21        if claim.get("email") not in allowed_emails:
22            logger.warning(f"Unauthorized reload attempt from: {claim.get('email')}")
23            return jsonify({"status": "error", "message": "Unauthorized"}), 403
24    except ValueError as e:
25        logger.warning(f"Token verification failed: {e}")
26        return jsonify({"status": "error", "message": "Invalid token"}), 401
27
28    # ... reload logic

How it works:

Extract the Bearer token from the Authorization header
Verify it's a valid Google-signed JWT using verify_oauth2_token()
Check the audience matches our Cloud Run URL (prevents token reuse across services)
Check the email claim matches our allowed service account
Reject everything else with 401/403

Environment Variables

Two new env vars in the Cloud Run deployment:

1--set-env-vars "...,CLOUD_RUN_URL=https://your-service-url,WIF_SERVICE_ACCOUNT=your-sa@project.iam.gserviceaccount.com"

Wiring Up GCS to Pub/Sub

Now for the automatic part. The pipeline:

1GCS file updated → Pub/Sub notification → Push to Cloud Run → Memory reload

Step 1: Create a Pub/Sub Topic

1gcloud pubsub topics create gcs-memory-updates --project=your-project

Verify: Cloud Console > Pub/Sub > Topics

Step 2: Set Up GCS Bucket Notification

1gsutil notification create \
2  -t gcs-memory-updates \
3  -f json \
4  -e OBJECT_FINALIZE \
5  gs://your-bot-memory

OBJECT_FINALIZE fires when an object is created or overwritten - exactly what we want.

Verify: There's no UI for this in Cloud Console. Use the CLI:

1gsutil notification list gs://your-bot-memory

Step 3: Grant Token Creator Permission

This is the step that's easy to miss. Pub/Sub needs to generate an OIDC token for your service account when pushing messages. The Pub/Sub service agent (a Google-managed service account) needs permission to do this:

1gcloud iam service-accounts add-iam-policy-binding \
2  your-sa@your-project.iam.gserviceaccount.com \
3  --member="serviceAccount:service-PROJECT_NUMBER@gcp-sa-pubsub.iam.gserviceaccount.com" \
4  --role="roles/iam.serviceAccountTokenCreator" \
5  --project=your-project

The service-PROJECT_NUMBER@gcp-sa-pubsub.iam.gserviceaccount.com is Google's internal Pub/Sub agent. You can find your project number in the Cloud Run URL or the project dashboard.

Verify: Cloud Console > IAM & Admin > Service Accounts > click your SA > Permissions tab

Also, the service account invoking Cloud Run needs roles/run.invoker:

1gcloud run services add-iam-policy-binding your-service \
2  --member="serviceAccount:your-sa@your-project.iam.gserviceaccount.com" \
3  --role="roles/run.invoker" \
4  --region=your-region \
5  --project=your-project

Step 4: Create the Push Subscription

1gcloud pubsub subscriptions create gcs-memory-push \
2  --topic=gcs-memory-updates \
3  --push-endpoint=https://your-cloud-run-url/admin/reload-memory \
4  --push-auth-service-account=your-sa@your-project.iam.gserviceaccount.com \
5  --push-auth-token-audience=https://your-cloud-run-url \
6  --project=your-project

Verify: Cloud Console > Pub/Sub > Subscriptions > gcs-memory-push - check delivery type is Push with the correct endpoint and authentication enabled.

Testing the Full Pipeline

Upload a file and check the logs:

1# Trigger a GCS change
2echo "test content" | gsutil cp - gs://your-bot-memory/test-reload.md
3
4# Check Cloud Run logs
5gcloud logging read "resource.type=cloud_run_revision AND resource.labels.service_name=your-service AND textPayload:reload" \
6  --project=your-project --limit=5 --freshness=5m \
7  --format="table(timestamp, textPayload)"

If everything is wired up correctly:

1TIMESTAMP                    TEXT_PAYLOAD
22026-01-28T00:21:03.879988Z  Memory reload triggered successfully
32026-01-28T00:21:03.879937Z  Memory reloaded and chat sessions cleared

If messages aren't getting through, check the Pub/Sub subscription monitoring:

Cloud Console > Pub/Sub > Subscriptions > gcs-memory-push > Monitoring tab

Look at "Unacked message count" - if messages are piling up, it's usually an auth issue.

Clean up after testing:

1gsutil rm gs://your-bot-memory/test-reload.md

Manual Trigger

You can also invoke the endpoint manually using gcloud to generate an identity token:

1curl -X POST https://your-cloud-run-url/admin/reload-memory \
2  -H "Authorization: Bearer $(gcloud auth print-identity-token --audiences=https://your-cloud-run-url)"

This works if your GCP user account is in the allowed service accounts list.

Gotchas

1. Pub/Sub Service Agent vs Default Service Account

service-PROJECT_NUMBER@gcp-sa-pubsub.iam.gserviceaccount.com (Pub/Sub service agent) is different from PROJECT_NUMBER-compute@developer.gserviceaccount.com (Compute Engine default SA). The Pub/Sub agent is Google-managed and doesn't show up under IAM > Service Accounts. Check it under IAM > IAM with "Include Google-provided role grants" enabled.

2. GCS Notifications Have No Console UI

Unlike most GCP features, GCS Pub/Sub notifications can't be viewed or managed in the Cloud Console. Use gsutil notification list gs://your-bucket to verify.

Architecture After

1┌──────────────┐     ┌──────────────┐     ┌──────────────────┐
2│  .md file    │────▶│   Pub/Sub    │────▶│    Cloud Run     │
3│  updated     │     │   (push)     │     │                  │
4│  in GCS      │     │              │     │  POST /admin/    │
5└──────────────┘     │  OIDC token  │     │  reload-memory   │
6                     └──────────────┘     │                  │
7                                          │  ┌────────────┐  │
8                                          │  │ Verify JWT │  │
9                                          │  │ Check SA   │  │
10                                          │  └─────┬──────┘  │
11                                          │        │         │
12                                          │  ┌─────▼──────┐  │
13                                          │  │ Clear cache │  │
14                                          │  │ Re-fetch    │  │
15                                          │  │ Rebuild     │  │
16                                          │  │ model       │  │
17                                          │  └────────────┘  │
18                                          └──────────────────┘

Wrapping Up

The irony: in the previous article, I wrote "No redeploy needed to update knowledge" as a feature. Technically true - you didn't need to redeploy code. But the bot wouldn't actually see the changes until the process restarted. The refresh() and reload_memory() methods were sitting right there, unused.

Key takeaways:

In-memory caches in long-running processes need invalidation - A Python dict cache with no TTL and no refresh trigger will serve stale data forever
Singletons amplify caching issues - When there's exactly one instance that lives for the entire process, stale state affects every request
OIDC token verification is free - google-auth is already in your dependency tree if you're using any GCP client library
GCS + Pub/Sub + Cloud Run is a clean event-driven pattern - File change → notification → HTTP push. No polling, no cron jobs
The Pub/Sub service agent needs serviceAccountTokenCreator - This is the step everyone forgets when setting up authenticated push subscriptions