Intro
After adding GCS memory to Gemini, I had a bot that could answer team-specific questions from markdown files. Update a file in GCS, and the bot knows about it. Simple.
Except... it didn't.
I'd update a markdown file in the bucket, ask the bot a question, and it would give me the old answer. Every time. The only way to get it to pick up changes was to redeploy the entire service.
The goal: Make the bot automatically refresh its knowledge when GCS files are updated - without redeploying.
Finding the Problem
Two Layers of Staleness
Looking at the code I wrote in the previous post, the problem was staring right at me. There are two layers of caching, and neither ever refreshes.
Layer 1: MemoryService._cache - A plain Python dict that caches every file after the first GCS download:
1class MemoryService:2 def __init__(self) -> None:3 self._cache: dict[str, str] = {}45 def _load_file(self, blob_name: str) -> Optional[str]:6 if blob_name in self._cache:7 return self._cache[blob_name] # Always returns cached version89 blob = self.bucket.blob(blob_name)10 content = blob.download_as_text()11 self._cache[blob_name] = content # Cached forever12 return content
Layer 2: GeminiService.system_instruction - The memory content gets baked into Gemini's system instruction at startup:
1class GeminiService:2 def __init__(self) -> None:3 self.memory_content = self._load_memory() # Loaded once4 self.system_instruction = self._build_system_instruction() # Built once5 self.model = GenerativeModel(6 self.model_name, system_instruction=self.system_instruction # Set once7 )
Both services are global singletons. Both are created once when the process starts. Neither ever updates.
Why I Didn't Notice Earlier
Cloud Run scales to zero when idle. When it scales back up, a fresh container starts, singletons are re-created, and GCS files are re-fetched. So if you update a file and wait long enough, the bot eventually picks it up - whenever the current instance dies and a new one spins up.
The problem shows up when the bot is actively being used. Cloud Run keeps the instance alive, the cache never clears, and you're stuck with whatever was loaded at startup.
The Fix: A Reload Endpoint
I already had MemoryService.refresh() and GeminiService.reload_memory() methods from the previous implementation. They just weren't wired up to anything.
Bug in reload_memory()
First, a subtle bug. The existing reload_memory() called _load_memory(), which calls load_all_memory(), which calls _load_file() - which checks the cache. So even "reloading" would return stale data.
1# Before: doesn't actually refresh because _load_memory() hits the cache2def reload_memory(self) -> None:3 self.memory_content = self._load_memory()4 # ...
Fixed by calling memory_service.refresh() directly, which clears the cache first:
1# After: clears cache before re-fetching2def reload_memory(self) -> None:3 memory_service = get_memory_service()4 self.memory_content = memory_service.refresh()5 self.system_instruction = self._build_system_instruction()6 self.model = GenerativeModel(7 self.model_name, system_instruction=self.system_instruction8 )9 self.chat_sessions.clear()10 logger.info("Memory reloaded and chat sessions cleared")
Adding the HTTP Endpoint
Then I added a POST endpoint in main.py:
1@app.route("/admin/reload-memory", methods=["POST"])2def reload_memory():3 # ... auth check (see next section) ...45 try:6 gemini_service = get_gemini_service()7 gemini_service.reload_memory()8 logger.info("Memory reload triggered successfully")9 return jsonify({"status": "ok", "message": "Memory reloaded"}), 20010 except Exception as e:11 logger.error(f"Memory reload failed: {e}")12 return jsonify({"status": "error", "message": str(e)}), 500
Securing the Endpoint
My Cloud Run service uses --allow-unauthenticated because Slack needs to send webhooks to it. That means anyone on the internet could curl /admin/reload-memory.
The endpoint only reloads cached data (not destructive), but I still don't want random people triggering it.
Why Not an API Key?
My first thought was a simple Bearer token check. But I wanted to trigger this from GCS via Pub/Sub, and Pub/Sub push subscriptions don't support custom Authorization headers - they send their own OIDC JWT token.
OIDC Token Verification
Since we're in GCP land, the proper approach is to verify Google-signed OIDC tokens. When Pub/Sub pushes a message, it includes a JWT signed by Google with the service account's identity. We can verify that.
google-auth is already installed as a dependency of google-cloud-storage, so no new packages needed:
1from google.oauth2 import id_token2from google.auth.transport import requests as google_auth_requests34@app.route("/admin/reload-memory", methods=["POST"])5def reload_memory():6 auth_header = request.headers.get("Authorization", "")7 if not auth_header.startswith("Bearer "):8 return jsonify({"status": "error", "message": "Unauthorized"}), 401910 token = auth_header.split(" ", 1)[1]11 try:12 audience = os.getenv("CLOUD_RUN_URL")13 claim = id_token.verify_oauth2_token(14 token, google_auth_requests.Request(), audience=audience15 )16 allowed_emails = [17 e.strip()18 for e in os.getenv("WIF_SERVICE_ACCOUNT", "").split(",")19 if e.strip()20 ]21 if claim.get("email") not in allowed_emails:22 logger.warning(f"Unauthorized reload attempt from: {claim.get('email')}")23 return jsonify({"status": "error", "message": "Unauthorized"}), 40324 except ValueError as e:25 logger.warning(f"Token verification failed: {e}")26 return jsonify({"status": "error", "message": "Invalid token"}), 4012728 # ... reload logic
How it works:
- Extract the Bearer token from the
Authorizationheader - Verify it's a valid Google-signed JWT using
verify_oauth2_token() - Check the
audiencematches our Cloud Run URL (prevents token reuse across services) - Check the
emailclaim matches our allowed service account - Reject everything else with 401/403
Environment Variables
Two new env vars in the Cloud Run deployment:
1--set-env-vars "...,CLOUD_RUN_URL=https://your-service-url,WIF_SERVICE_ACCOUNT=your-sa@project.iam.gserviceaccount.com"
Wiring Up GCS to Pub/Sub
Now for the automatic part. The pipeline:
1GCS file updated → Pub/Sub notification → Push to Cloud Run → Memory reload
Step 1: Create a Pub/Sub Topic
1gcloud pubsub topics create gcs-memory-updates --project=your-project
Verify: Cloud Console > Pub/Sub > Topics
Step 2: Set Up GCS Bucket Notification
1gsutil notification create \2 -t gcs-memory-updates \3 -f json \4 -e OBJECT_FINALIZE \5 gs://your-bot-memory
OBJECT_FINALIZE fires when an object is created or overwritten - exactly what we want.
Verify: There's no UI for this in Cloud Console. Use the CLI:
1gsutil notification list gs://your-bot-memory
Step 3: Grant Token Creator Permission
This is the step that's easy to miss. Pub/Sub needs to generate an OIDC token for your service account when pushing messages. The Pub/Sub service agent (a Google-managed service account) needs permission to do this:
1gcloud iam service-accounts add-iam-policy-binding \2 your-sa@your-project.iam.gserviceaccount.com \3 --member="serviceAccount:service-PROJECT_NUMBER@gcp-sa-pubsub.iam.gserviceaccount.com" \4 --role="roles/iam.serviceAccountTokenCreator" \5 --project=your-project
The service-PROJECT_NUMBER@gcp-sa-pubsub.iam.gserviceaccount.com is Google's internal Pub/Sub agent. You can find your project number in the Cloud Run URL or the project dashboard.
Verify: Cloud Console > IAM & Admin > Service Accounts > click your SA > Permissions tab
Also, the service account invoking Cloud Run needs roles/run.invoker:
1gcloud run services add-iam-policy-binding your-service \2 --member="serviceAccount:your-sa@your-project.iam.gserviceaccount.com" \3 --role="roles/run.invoker" \4 --region=your-region \5 --project=your-project
Step 4: Create the Push Subscription
1gcloud pubsub subscriptions create gcs-memory-push \2 --topic=gcs-memory-updates \3 --push-endpoint=https://your-cloud-run-url/admin/reload-memory \4 --push-auth-service-account=your-sa@your-project.iam.gserviceaccount.com \5 --push-auth-token-audience=https://your-cloud-run-url \6 --project=your-project
Verify: Cloud Console > Pub/Sub > Subscriptions > gcs-memory-push - check delivery type is Push with the correct endpoint and authentication enabled.
Testing the Full Pipeline
Upload a file and check the logs:
1# Trigger a GCS change2echo "test content" | gsutil cp - gs://your-bot-memory/test-reload.md34# Check Cloud Run logs5gcloud logging read "resource.type=cloud_run_revision AND resource.labels.service_name=your-service AND textPayload:reload" \6 --project=your-project --limit=5 --freshness=5m \7 --format="table(timestamp, textPayload)"
If everything is wired up correctly:
1TIMESTAMP TEXT_PAYLOAD22026-01-28T00:21:03.879988Z Memory reload triggered successfully32026-01-28T00:21:03.879937Z Memory reloaded and chat sessions cleared
If messages aren't getting through, check the Pub/Sub subscription monitoring:
Cloud Console > Pub/Sub > Subscriptions >
gcs-memory-push> Monitoring tab
Look at "Unacked message count" - if messages are piling up, it's usually an auth issue.
Clean up after testing:
1gsutil rm gs://your-bot-memory/test-reload.md
Manual Trigger
You can also invoke the endpoint manually using gcloud to generate an identity token:
1curl -X POST https://your-cloud-run-url/admin/reload-memory \2 -H "Authorization: Bearer $(gcloud auth print-identity-token --audiences=https://your-cloud-run-url)"
This works if your GCP user account is in the allowed service accounts list.
Gotchas
1. Pub/Sub Service Agent vs Default Service Account
service-PROJECT_NUMBER@gcp-sa-pubsub.iam.gserviceaccount.com (Pub/Sub service agent) is different from PROJECT_NUMBER-compute@developer.gserviceaccount.com (Compute Engine default SA). The Pub/Sub agent is Google-managed and doesn't show up under IAM > Service Accounts. Check it under IAM > IAM with "Include Google-provided role grants" enabled.
2. GCS Notifications Have No Console UI
Unlike most GCP features, GCS Pub/Sub notifications can't be viewed or managed in the Cloud Console. Use gsutil notification list gs://your-bucket to verify.
Architecture After
1┌──────────────┐ ┌──────────────┐ ┌──────────────────┐2│ .md file │────▶│ Pub/Sub │────▶│ Cloud Run │3│ updated │ │ (push) │ │ │4│ in GCS │ │ │ │ POST /admin/ │5└──────────────┘ │ OIDC token │ │ reload-memory │6 └──────────────┘ │ │7 │ ┌────────────┐ │8 │ │ Verify JWT │ │9 │ │ Check SA │ │10 │ └─────┬──────┘ │11 │ │ │12 │ ┌─────▼──────┐ │13 │ │ Clear cache │ │14 │ │ Re-fetch │ │15 │ │ Rebuild │ │16 │ │ model │ │17 │ └────────────┘ │18 └──────────────────┘
Wrapping Up
The irony: in the previous article, I wrote "No redeploy needed to update knowledge" as a feature. Technically true - you didn't need to redeploy code. But the bot wouldn't actually see the changes until the process restarted. The refresh() and reload_memory() methods were sitting right there, unused.
Key takeaways:
- In-memory caches in long-running processes need invalidation - A Python dict cache with no TTL and no refresh trigger will serve stale data forever
- Singletons amplify caching issues - When there's exactly one instance that lives for the entire process, stale state affects every request
- OIDC token verification is free -
google-authis already in your dependency tree if you're using any GCP client library - GCS + Pub/Sub + Cloud Run is a clean event-driven pattern - File change → notification → HTTP push. No polling, no cron jobs
- The Pub/Sub service agent needs
serviceAccountTokenCreator- This is the step everyone forgets when setting up authenticated push subscriptions
