Slack Bot Troubleshooting: Duplicate Messages, Cold Starts, and Gemini Latency

Intro

After deploying my Gemini-powered Slack bot, everything seemed fine... until users started complaining.

"The bot is sending the same message 4 times!"

At first I thought it was a bug in my code. Maybe an infinite loop? Maybe I was calling say() multiple times?

Turns out, neither. The culprit was Slack's retry mechanism.

1. Duplicate Messages

The Symptom

Users would @mention the bot, and instead of getting one response, they'd get 2-4 identical responses:

1User: @Bot what is Python?
2
3Bot: Python is a high-level programming language...
4Bot: Python is a high-level programming language...
5Bot: Python is a high-level programming language...
6Bot: Python is a high-level programming language...

Finding the Root Cause

I checked Cloud Run logs:

1gcloud logging read 'resource.type="cloud_run_revision" AND resource.labels.service_name="your-service"' \
2  --project=your-project \
3  --limit=30

The logs revealed everything:

108:40:10.594 POST /slack/events 200
208:40:10.593 POST /slack/events 200
308:40:10.590 POST /slack/events 200
408:40:10.582 POST /slack/events 200

Four requests within 12 milliseconds. All for the same message.

Why This Happens

Slack has a built-in retry mechanism:

Slack sends an event to your webhook
If it doesn't receive a 200 response within 3 seconds, it retries
It retries up to 3 times (4 total requests)

My bot's flow was:

1Slack Event → Handler → Call Gemini (5+ seconds) → Send Response → Return 200

Gemini takes 3-5 seconds to respond (even longer with cross-region latency since Gemini isn't available in asia-northeast1 yet). By the time we return 200 to Slack, it has already sent 3 retry requests. Each retry triggers another Gemini call. Four responses.

The Fix: Event Deduplication

The solution is to track which events we've already processed and skip duplicates:

1import time
2from collections import OrderedDict
3
4_processed_events: OrderedDict[str, float] = OrderedDict()
5_CACHE_TTL_SECONDS = 60
6_CACHE_MAX_SIZE = 1000
7
8
9def _is_duplicate(event_id: str) -> bool:
10    now = time.time()
11
12    # Clean old entries
13    while _processed_events:
14        oldest_id, oldest_time = next(iter(_processed_events.items()))
15        if now - oldest_time > _CACHE_TTL_SECONDS:
16            _processed_events.pop(oldest_id)
17        else:
18            break
19
20    if event_id in _processed_events:
21        return True
22
23    _processed_events[event_id] = now
24
25    # Limit cache size
26    while len(_processed_events) > _CACHE_MAX_SIZE:
27        _processed_events.popitem(last=False)
28
29    return False
30
31
32@app.event("app_mention")
33def handle_mention(event, say, logger):
34    event_ts = event.get("event_ts") or event.get("ts")
35    if _is_duplicate(f"mention:{event_ts}"):
36        return  # Skip duplicate
37
38    # ... process normally

The event_ts is unique per event. If we see the same event_ts twice within 60 seconds, we know it's a retry and skip it.

Downsides:

Retries still hit your server (wasted CPU)
Need to maintain an in-memory cache
Cache doesn't work across multiple instances (would need Redis for that)

But for a single-instance bot, this works well.

Alternative: Lazy Listeners

A cleaner solution exists: lazy listeners. Instead of processing the event synchronously, you acknowledge Slack immediately and process in the background.

1def _process_mention(event, say, logger):
2    """Runs in background after ack."""
3    # Heavy processing here...
4    gemini = get_gemini_service()
5    response = gemini.generate_response_sync(user, clean_text)
6    say(text=response, thread_ts=thread_ts)
7
8
9@app.event("app_mention", lazy=[_process_mention])
10def handle_mention():
11    pass  # Ack immediately, processing happens in lazy listener

How it works:

Slack sends event
handle_mention() runs → returns immediately
Slack gets 200 response → no retries
_process_mention() runs in background

No deduplication cache needed. Much cleaner.

The catch: Lazy listeners only work with FaaS (Function-as-a-Service) adapters:

slack_bolt.adapter.aws_lambda
slack_bolt.adapter.google_cloud_functions

If you're using Flask on Cloud Run (like me), lazy listeners won't work. The Flask adapter doesn't support the lazy parameter - you'll get a runtime error:

1TypeError: App.event() got an unexpected keyword argument 'lazy'

For more details, see the Slack Bolt lazy listeners documentation.

Type Errors with Lazy Listeners

Even if you're using a supported adapter (Lambda, Cloud Functions), you might hit a mypy error:

1error: Unexpected keyword argument "lazy" for "event" of "App"  [call-arg]

The lazy parameter works at runtime but isn't included in slack-bolt's type stubs. You'll need to suppress the error:

1@app.event("app_mention", lazy=[_process_mention])  # type: ignore[call-arg]
2def handle_mention():
3    pass

I opened an issue for this: slackapi/bolt-python#1412

Which Should You Use?

Setup	Solution
AWS Lambda	Lazy listeners
Google Cloud Functions	Lazy listeners
Flask + Cloud Run	Event deduplication
Flask + any server	Event deduplication

If your app is stateful (like mine, with chat sessions in memory), Cloud Run with deduplication is actually better - Cloud Functions would lose state between invocations.

2. Cold Start Problem

Even after fixing duplicates, I noticed the first request after idle was extremely slow (5-10 seconds).

This is the classic cold start problem with Cloud Run.

What's Happening

Cloud Run scales to zero when idle. When a new request comes in:

Cloud Run spins up a new container (~2-3 seconds)
Python loads all modules
GeminiService initializes and loads memory from GCS
Gemini API call happens (~3-5 seconds)

Total: 5-10 seconds for the first request.

The Fix: Min Instances

Tell Cloud Run to keep at least one instance warm:

1gcloud run services update your-service-name \
2  --min-instances=1 \
3  --region=asia-northeast1 \
4  --project=your-project

What this does:

Cloud Run always keeps 1 instance running
No cold starts for normal traffic
Costs ~$5-10/month depending on instance size (with CPU throttling, idle instances cost almost nothing)

Before:

1Cold start + Gemini = 5-10 seconds

After:

1Just Gemini = 2-4 seconds

3. Gemini Regional Latency

Even with deduplication and warm instances, responses still take 3-5 seconds. Why?

The Problem

My Cloud Run service runs in asia-northeast1 (Tokyo) to be close to users. But Gemini via Vertex AI isn't available in that region yet. I have to call us-central1 (Iowa):

1# In config.py
2GEMINI_LOCATION = "us-central1"  # Gemini not available in asia-northeast1

This adds ~200-400ms round-trip latency on top of Gemini's processing time.

Why Not Move Everything to us-central1?

I considered moving Cloud Run to us-central1 too, but:

Users are in Japan - Slack webhook latency would increase
GCS bucket for memory is in asia-northeast1
The Gemini API call is the bottleneck, not the network hop

The Reality

There's no fix for this until Google expands Gemini's regional availability. The cross-region latency is unavoidable for now.

Current response times:

1Cloud Run (Tokyo) → Gemini API (Iowa) → Response
2Total: 3-5 seconds typical

This is acceptable for a chatbot, but something to be aware of if you're building latency-sensitive applications with Gemini.

Wrapping Up

The duplicate message issue was frustrating because the code looked correct. The lesson: always check your infrastructure's timeout and retry behavior.

Key takeaways:

Slack retries events if it doesn't get 200 within 3 seconds
Gemini is slow (3-5 seconds typical, worse with cross-region calls)
Use event deduplication for Flask/Cloud Run setups
Use lazy listeners if you're on Lambda or Cloud Functions
Use min-instances=1 to eliminate cold starts
Check logs first - gcloud logging read is your friend

Quick Reference

Set min instances:

1gcloud run services update SERVICE_NAME \
2  --min-instances=1 \
3  --region=REGION

Check Cloud Run logs:

1gcloud logging read 'resource.type="cloud_run_revision"' \
2  --project=PROJECT_ID \
3  --limit=50

Deduplication pattern:

1def _is_duplicate(event_id: str) -> bool:
2    if event_id in _processed_events:
3        return True
4    _processed_events[event_id] = time.time()
5    return False
6
7@app.event("app_mention")
8def handle_mention(event, say, logger):
9    if _is_duplicate(event.get("event_ts")):
10        return
11    # process...

Lazy listener pattern (Lambda/Cloud Functions only):

1def _process_event(event, say, logger):
2    # Heavy processing here
3
4@app.event("event_type", lazy=[_process_event])  # type: ignore[call-arg]
5def handle_event():
6    pass  # Ack immediately