Logo

Slack Bot Troubleshooting: Duplicate Messages, Cold Starts, and Gemini Latency

7 min read
PythonGCPSlack

Table of Contents

Intro

After deploying my Gemini-powered Slack bot, everything seemed fine... until users started complaining.

"The bot is sending the same message 4 times!"

At first I thought it was a bug in my code. Maybe an infinite loop? Maybe I was calling say() multiple times?

Turns out, neither. The culprit was Slack's retry mechanism.

1. Duplicate Messages

The Symptom

Users would @mention the bot, and instead of getting one response, they'd get 2-4 identical responses:

1User: @Bot what is Python?
2
3Bot: Python is a high-level programming language...
4Bot: Python is a high-level programming language...
5Bot: Python is a high-level programming language...
6Bot: Python is a high-level programming language...

Finding the Root Cause

I checked Cloud Run logs:

1gcloud logging read 'resource.type="cloud_run_revision" AND resource.labels.service_name="your-service"' \
2 --project=your-project \
3 --limit=30

The logs revealed everything:

108:40:10.594 POST /slack/events 200
208:40:10.593 POST /slack/events 200
308:40:10.590 POST /slack/events 200
408:40:10.582 POST /slack/events 200

Four requests within 12 milliseconds. All for the same message.

Why This Happens

Slack has a built-in retry mechanism:

  1. Slack sends an event to your webhook
  2. If it doesn't receive a 200 response within 3 seconds, it retries
  3. It retries up to 3 times (4 total requests)

My bot's flow was:

1Slack Event → Handler → Call Gemini (5+ seconds) → Send Response → Return 200

Gemini takes 3-5 seconds to respond (even longer with cross-region latency since Gemini isn't available in asia-northeast1 yet). By the time we return 200 to Slack, it has already sent 3 retry requests. Each retry triggers another Gemini call. Four responses.

The Fix: Event Deduplication

The solution is to track which events we've already processed and skip duplicates:

1import time
2from collections import OrderedDict
3
4_processed_events: OrderedDict[str, float] = OrderedDict()
5_CACHE_TTL_SECONDS = 60
6_CACHE_MAX_SIZE = 1000
7
8
9def _is_duplicate(event_id: str) -> bool:
10 now = time.time()
11
12 # Clean old entries
13 while _processed_events:
14 oldest_id, oldest_time = next(iter(_processed_events.items()))
15 if now - oldest_time > _CACHE_TTL_SECONDS:
16 _processed_events.pop(oldest_id)
17 else:
18 break
19
20 if event_id in _processed_events:
21 return True
22
23 _processed_events[event_id] = now
24
25 # Limit cache size
26 while len(_processed_events) > _CACHE_MAX_SIZE:
27 _processed_events.popitem(last=False)
28
29 return False
30
31
32@app.event("app_mention")
33def handle_mention(event, say, logger):
34 event_ts = event.get("event_ts") or event.get("ts")
35 if _is_duplicate(f"mention:{event_ts}"):
36 return # Skip duplicate
37
38 # ... process normally

The event_ts is unique per event. If we see the same event_ts twice within 60 seconds, we know it's a retry and skip it.

Downsides:

But for a single-instance bot, this works well.

Alternative: Lazy Listeners

A cleaner solution exists: lazy listeners. Instead of processing the event synchronously, you acknowledge Slack immediately and process in the background.

1def _process_mention(event, say, logger):
2 """Runs in background after ack."""
3 # Heavy processing here...
4 gemini = get_gemini_service()
5 response = gemini.generate_response_sync(user, clean_text)
6 say(text=response, thread_ts=thread_ts)
7
8
9@app.event("app_mention", lazy=[_process_mention])
10def handle_mention():
11 pass # Ack immediately, processing happens in lazy listener

How it works:

  1. Slack sends event
  2. handle_mention() runs → returns immediately
  3. Slack gets 200 response → no retries
  4. _process_mention() runs in background

No deduplication cache needed. Much cleaner.

The catch: Lazy listeners only work with FaaS (Function-as-a-Service) adapters:

If you're using Flask on Cloud Run (like me), lazy listeners won't work. The Flask adapter doesn't support the lazy parameter - you'll get a runtime error:

1TypeError: App.event() got an unexpected keyword argument 'lazy'

For more details, see the Slack Bolt lazy listeners documentation.

Type Errors with Lazy Listeners

Even if you're using a supported adapter (Lambda, Cloud Functions), you might hit a mypy error:

1error: Unexpected keyword argument "lazy" for "event" of "App" [call-arg]

The lazy parameter works at runtime but isn't included in slack-bolt's type stubs. You'll need to suppress the error:

1@app.event("app_mention", lazy=[_process_mention]) # type: ignore[call-arg]
2def handle_mention():
3 pass

I opened an issue for this: slackapi/bolt-python#1412

Which Should You Use?

SetupSolution
AWS LambdaLazy listeners
Google Cloud FunctionsLazy listeners
Flask + Cloud RunEvent deduplication
Flask + any serverEvent deduplication

If your app is stateful (like mine, with chat sessions in memory), Cloud Run with deduplication is actually better - Cloud Functions would lose state between invocations.

2. Cold Start Problem

Even after fixing duplicates, I noticed the first request after idle was extremely slow (5-10 seconds).

This is the classic cold start problem with Cloud Run.

What's Happening

Cloud Run scales to zero when idle. When a new request comes in:

  1. Cloud Run spins up a new container (~2-3 seconds)
  2. Python loads all modules
  3. GeminiService initializes and loads memory from GCS
  4. Gemini API call happens (~3-5 seconds)

Total: 5-10 seconds for the first request.

The Fix: Min Instances

Tell Cloud Run to keep at least one instance warm:

1gcloud run services update your-service-name \
2 --min-instances=1 \
3 --region=asia-northeast1 \
4 --project=your-project

What this does:

Before:

1Cold start + Gemini = 5-10 seconds

After:

1Just Gemini = 2-4 seconds

3. Gemini Regional Latency

Even with deduplication and warm instances, responses still take 3-5 seconds. Why?

The Problem

My Cloud Run service runs in asia-northeast1 (Tokyo) to be close to users. But Gemini via Vertex AI isn't available in that region yet. I have to call us-central1 (Iowa):

1# In config.py
2GEMINI_LOCATION = "us-central1" # Gemini not available in asia-northeast1

This adds ~200-400ms round-trip latency on top of Gemini's processing time.

Why Not Move Everything to us-central1?

I considered moving Cloud Run to us-central1 too, but:

The Reality

There's no fix for this until Google expands Gemini's regional availability. The cross-region latency is unavoidable for now.

Current response times:

1Cloud Run (Tokyo) → Gemini API (Iowa) → Response
2Total: 3-5 seconds typical

This is acceptable for a chatbot, but something to be aware of if you're building latency-sensitive applications with Gemini.

Wrapping Up

The duplicate message issue was frustrating because the code looked correct. The lesson: always check your infrastructure's timeout and retry behavior.

Key takeaways:

  1. Slack retries events if it doesn't get 200 within 3 seconds
  2. Gemini is slow (3-5 seconds typical, worse with cross-region calls)
  3. Use event deduplication for Flask/Cloud Run setups
  4. Use lazy listeners if you're on Lambda or Cloud Functions
  5. Use min-instances=1 to eliminate cold starts
  6. Check logs first - gcloud logging read is your friend

Quick Reference

Set min instances:

1gcloud run services update SERVICE_NAME \
2 --min-instances=1 \
3 --region=REGION

Check Cloud Run logs:

1gcloud logging read 'resource.type="cloud_run_revision"' \
2 --project=PROJECT_ID \
3 --limit=50

Deduplication pattern:

1def _is_duplicate(event_id: str) -> bool:
2 if event_id in _processed_events:
3 return True
4 _processed_events[event_id] = time.time()
5 return False
6
7@app.event("app_mention")
8def handle_mention(event, say, logger):
9 if _is_duplicate(event.get("event_ts")):
10 return
11 # process...

Lazy listener pattern (Lambda/Cloud Functions only):

1def _process_event(event, say, logger):
2 # Heavy processing here
3
4@app.event("event_type", lazy=[_process_event]) # type: ignore[call-arg]
5def handle_event():
6 pass # Ack immediately

Project Navigation

  1. 1.Building My First Flask App: A Next.js Developer‘s Perspective
  2. 2.From TypeScript to Python: Setting Up a Modern Development Environment
  3. 3.Deploying Python to GCP Cloud Run: A Guide for AWS Developers
  4. 4.Integrating Vertex AI Gemini into Flask: Building an AI-Powered Slack Bot
  5. 5.Adding GCS Memory to Gemini: Teaching Your Bot with Markdown Files
  6. 6.Slack Bot Troubleshooting: Duplicate Messages, Cold Starts, and Gemini Latency
  7. 7.Setting Up Analytics with BigQuery and Looker Studio