Intro
After deploying my Gemini-powered Slack bot, everything seemed fine... until users started complaining.
"The bot is sending the same message 4 times!"
At first I thought it was a bug in my code. Maybe an infinite loop? Maybe I was calling say() multiple times?
Turns out, neither. The culprit was Slack's retry mechanism.
1. Duplicate Messages
The Symptom
Users would @mention the bot, and instead of getting one response, they'd get 2-4 identical responses:
1User: @Bot what is Python?23Bot: Python is a high-level programming language...4Bot: Python is a high-level programming language...5Bot: Python is a high-level programming language...6Bot: Python is a high-level programming language...
Finding the Root Cause
I checked Cloud Run logs:
1gcloud logging read 'resource.type="cloud_run_revision" AND resource.labels.service_name="your-service"' \2 --project=your-project \3 --limit=30
The logs revealed everything:
108:40:10.594 POST /slack/events 200208:40:10.593 POST /slack/events 200308:40:10.590 POST /slack/events 200408:40:10.582 POST /slack/events 200
Four requests within 12 milliseconds. All for the same message.
Why This Happens
Slack has a built-in retry mechanism:
- Slack sends an event to your webhook
- If it doesn't receive a 200 response within 3 seconds, it retries
- It retries up to 3 times (4 total requests)
My bot's flow was:
1Slack Event → Handler → Call Gemini (5+ seconds) → Send Response → Return 200
Gemini takes 3-5 seconds to respond (even longer with cross-region latency since Gemini isn't available in asia-northeast1 yet). By the time we return 200 to Slack, it has already sent 3 retry requests. Each retry triggers another Gemini call. Four responses.
The Fix: Event Deduplication
The solution is to track which events we've already processed and skip duplicates:
1import time2from collections import OrderedDict34_processed_events: OrderedDict[str, float] = OrderedDict()5_CACHE_TTL_SECONDS = 606_CACHE_MAX_SIZE = 1000789def _is_duplicate(event_id: str) -> bool:10 now = time.time()1112 # Clean old entries13 while _processed_events:14 oldest_id, oldest_time = next(iter(_processed_events.items()))15 if now - oldest_time > _CACHE_TTL_SECONDS:16 _processed_events.pop(oldest_id)17 else:18 break1920 if event_id in _processed_events:21 return True2223 _processed_events[event_id] = now2425 # Limit cache size26 while len(_processed_events) > _CACHE_MAX_SIZE:27 _processed_events.popitem(last=False)2829 return False303132@app.event("app_mention")33def handle_mention(event, say, logger):34 event_ts = event.get("event_ts") or event.get("ts")35 if _is_duplicate(f"mention:{event_ts}"):36 return # Skip duplicate3738 # ... process normally
The event_ts is unique per event. If we see the same event_ts twice within 60 seconds, we know it's a retry and skip it.
Downsides:
- Retries still hit your server (wasted CPU)
- Need to maintain an in-memory cache
- Cache doesn't work across multiple instances (would need Redis for that)
But for a single-instance bot, this works well.
Alternative: Lazy Listeners
A cleaner solution exists: lazy listeners. Instead of processing the event synchronously, you acknowledge Slack immediately and process in the background.
1def _process_mention(event, say, logger):2 """Runs in background after ack."""3 # Heavy processing here...4 gemini = get_gemini_service()5 response = gemini.generate_response_sync(user, clean_text)6 say(text=response, thread_ts=thread_ts)789@app.event("app_mention", lazy=[_process_mention])10def handle_mention():11 pass # Ack immediately, processing happens in lazy listener
How it works:
- Slack sends event
handle_mention()runs → returns immediately- Slack gets 200 response → no retries
_process_mention()runs in background
No deduplication cache needed. Much cleaner.
The catch: Lazy listeners only work with FaaS (Function-as-a-Service) adapters:
slack_bolt.adapter.aws_lambdaslack_bolt.adapter.google_cloud_functions
If you're using Flask on Cloud Run (like me), lazy listeners won't work. The Flask adapter doesn't support the lazy parameter - you'll get a runtime error:
1TypeError: App.event() got an unexpected keyword argument 'lazy'
For more details, see the Slack Bolt lazy listeners documentation.
Type Errors with Lazy Listeners
Even if you're using a supported adapter (Lambda, Cloud Functions), you might hit a mypy error:
1error: Unexpected keyword argument "lazy" for "event" of "App" [call-arg]
The lazy parameter works at runtime but isn't included in slack-bolt's type stubs. You'll need to suppress the error:
1@app.event("app_mention", lazy=[_process_mention]) # type: ignore[call-arg]2def handle_mention():3 pass
I opened an issue for this: slackapi/bolt-python#1412
Which Should You Use?
| Setup | Solution |
|---|---|
| AWS Lambda | Lazy listeners |
| Google Cloud Functions | Lazy listeners |
| Flask + Cloud Run | Event deduplication |
| Flask + any server | Event deduplication |
If your app is stateful (like mine, with chat sessions in memory), Cloud Run with deduplication is actually better - Cloud Functions would lose state between invocations.
2. Cold Start Problem
Even after fixing duplicates, I noticed the first request after idle was extremely slow (5-10 seconds).
This is the classic cold start problem with Cloud Run.
What's Happening
Cloud Run scales to zero when idle. When a new request comes in:
- Cloud Run spins up a new container (~2-3 seconds)
- Python loads all modules
- GeminiService initializes and loads memory from GCS
- Gemini API call happens (~3-5 seconds)
Total: 5-10 seconds for the first request.
The Fix: Min Instances
Tell Cloud Run to keep at least one instance warm:
1gcloud run services update your-service-name \2 --min-instances=1 \3 --region=asia-northeast1 \4 --project=your-project
What this does:
- Cloud Run always keeps 1 instance running
- No cold starts for normal traffic
- Costs ~$5-10/month depending on instance size (with CPU throttling, idle instances cost almost nothing)
Before:
1Cold start + Gemini = 5-10 seconds
After:
1Just Gemini = 2-4 seconds
3. Gemini Regional Latency
Even with deduplication and warm instances, responses still take 3-5 seconds. Why?
The Problem
My Cloud Run service runs in asia-northeast1 (Tokyo) to be close to users. But Gemini via Vertex AI isn't available in that region yet. I have to call us-central1 (Iowa):
1# In config.py2GEMINI_LOCATION = "us-central1" # Gemini not available in asia-northeast1
This adds ~200-400ms round-trip latency on top of Gemini's processing time.
Why Not Move Everything to us-central1?
I considered moving Cloud Run to us-central1 too, but:
- Users are in Japan - Slack webhook latency would increase
- GCS bucket for memory is in
asia-northeast1 - The Gemini API call is the bottleneck, not the network hop
The Reality
There's no fix for this until Google expands Gemini's regional availability. The cross-region latency is unavoidable for now.
Current response times:
1Cloud Run (Tokyo) → Gemini API (Iowa) → Response2Total: 3-5 seconds typical
This is acceptable for a chatbot, but something to be aware of if you're building latency-sensitive applications with Gemini.
Wrapping Up
The duplicate message issue was frustrating because the code looked correct. The lesson: always check your infrastructure's timeout and retry behavior.
Key takeaways:
- Slack retries events if it doesn't get 200 within 3 seconds
- Gemini is slow (3-5 seconds typical, worse with cross-region calls)
- Use event deduplication for Flask/Cloud Run setups
- Use lazy listeners if you're on Lambda or Cloud Functions
- Use min-instances=1 to eliminate cold starts
- Check logs first -
gcloud logging readis your friend
Quick Reference
Set min instances:
1gcloud run services update SERVICE_NAME \2 --min-instances=1 \3 --region=REGION
Check Cloud Run logs:
1gcloud logging read 'resource.type="cloud_run_revision"' \2 --project=PROJECT_ID \3 --limit=50
Deduplication pattern:
1def _is_duplicate(event_id: str) -> bool:2 if event_id in _processed_events:3 return True4 _processed_events[event_id] = time.time()5 return False67@app.event("app_mention")8def handle_mention(event, say, logger):9 if _is_duplicate(event.get("event_ts")):10 return11 # process...
Lazy listener pattern (Lambda/Cloud Functions only):
1def _process_event(event, say, logger):2 # Heavy processing here34@app.event("event_type", lazy=[_process_event]) # type: ignore[call-arg]5def handle_event():6 pass # Ack immediately
