Integrating Vertex AI Gemini into Flask: Building an AI-Powered Slack Bot

PythonFlaskGCPAI

Intro

After deploying my Flask app to Cloud Run, I had a working Slack bot that could receive messages. But it couldn't actually respond intelligently - it just echoed back what you said.

Time to add the AI brain.

I wanted to use Google's Vertex AI Gemini (their latest multimodal model) to make the bot conversational.

The goal: users could @mention the bot or DM it, and it would respond naturally using Gemini.

Here's everything I learned integrating Vertex AI into Flask, including all the permission errors and debugging I had to do along the way.

What We're Building

Before: Bot receives Slack messages but can't respond

After: Bot uses Gemini to generate contextual responses, maintaining conversation history per user

Key features:

  • User-specific chat sessions (conversation context)
  • Works in both DMs and channel mentions
  • Detailed error logging for debugging
  • Local testing support

Installing Dependencies

First, add the Vertex AI SDK:

1uv add google-cloud-aiplatform

This installs the google-cloud-aiplatform package which includes support for Vertex AI's Generative Models API.

Updated pyproject.toml:

1[project]
2dependencies = [
3 "flask>=3.1.2",
4 "python-dotenv>=1.1.1",
5 "slack-bolt>=1.26.0",
6 "google-cloud-aiplatform>=1.121.0", # New!
7]

Creating the Gemini Service

I created a service class to encapsulate all Gemini logic in app/services/gemini.py:

1import os
2import logging
3from typing import Optional
4import vertexai
5from vertexai.generative_models import GenerativeModel, ChatSession
6
7logger = logging.getLogger(__name__)
8
9
10class GeminiService:
11 """Service for interacting with Vertex AI Gemini"""
12
13 def __init__(self):
14 self.project_id = os.getenv("GCP_PROJECT_ID")
15 # Gemini is not available in all regions as of Nov 2025
16 self.location = "us-central1"
17 self.model_name = "gemini-2.5-flash"
18
19 if not self.project_id:
20 raise ValueError("GCP_PROJECT_ID environment variable is required")
21
22 logger.info(
23 f"Initializing GeminiService - Project: {self.project_id}, "
24 f"Location: {self.location}, Model: {self.model_name}"
25 )
26
27 # Initialize Vertex AI
28 vertexai.init(project=self.project_id, location=self.location)
29
30 # Create model instance
31 self.model = GenerativeModel(self.model_name)
32
33 # Store chat sessions per user
34 self.chat_sessions: dict[str, ChatSession] = {}
35
36 logger.info("GeminiService initialized successfully")
37
38 def get_chat_session(self, user_id: str) -> ChatSession:
39 """Get or create chat session for user"""
40 if user_id not in self.chat_sessions:
41 self.chat_sessions[user_id] = self.model.start_chat()
42 return self.chat_sessions[user_id]
43
44 def clear_chat_session(self, user_id: str) -> None:
45 """Clear chat history for user"""
46 if user_id in self.chat_sessions:
47 del self.chat_sessions[user_id]
48
49 def generate_response_sync(
50 self, user_id: str, message: str, use_context: bool = True
51 ) -> str:
52 """
53 Generate AI response using Gemini
54
55 Args:
56 user_id: Unique identifier for user (preserves conversation context)
57 message: User's message
58 use_context: If True, uses chat history. If False, one-off question.
59
60 Returns:
61 Gemini's response text
62 """
63 try:
64 if use_context:
65 # Use chat session to maintain conversation history
66 chat = self.get_chat_session(user_id)
67 response = chat.send_message(message)
68 else:
69 # One-off question without context
70 response = self.model.generate_content(message)
71
72 return response.text
73
74 except Exception as e:
75 import traceback
76
77 error_trace = traceback.format_exc()
78 logger.error(
79 f"Error generating Gemini response: {e}\n"
80 f"Project: {self.project_id}, Location: {self.location}, "
81 f"Model: {self.model_name}, User: {user_id}\n"
82 f"Full traceback:\n{error_trace}"
83 )
84 return (
85 "Sorry, I encountered an error processing your request. "
86 "Please try again."
87 )
88
89
90# Singleton instance
91_gemini_service: Optional[GeminiService] = None
92
93
94def get_gemini_service() -> GeminiService:
95 """Get or create the Gemini service singleton"""
96 global _gemini_service
97 if _gemini_service is None:
98 _gemini_service = GeminiService()
99 return _gemini_service

Key Design Decisions

1. User-Specific Chat Sessions

The chat sessions dictionary (self.chat_sessions) maintains separate conversation histories for each user:

1self.chat_sessions: dict[str, ChatSession] = {}

Why this matters:

  • User A asks "What's Python?" → Gemini explains Python
  • User A asks "What's it used for?" → Gemini knows "it" refers to Python
  • User B asks "What's it used for?" → Doesn't see User A's history

Each Slack user gets their own context. No cross-user contamination.

2. Singleton Pattern

1_gemini_service: Optional[GeminiService] = None
2
3def get_gemini_service() -> GeminiService:
4 global _gemini_service
5 if _gemini_service is None:
6 _gemini_service = GeminiService()
7 return _gemini_service

This ensures we only initialize Vertex AI once per app lifecycle, not on every request. Faster responses, fewer API calls.

3. Regional Model Availability

1self.location = "us-central1"

Gemini isn't available in all GCP regions. I initially tried asia-northeast1 (Tokyo) but got 404 errors.

Available regions (as of 2025):

  • us-central1 (Iowa) ✅
  • us-east4 (Virginia) ✅
  • europe-west1 (Belgium) ✅
  • asia-southeast1 (Singapore) ✅

Always check Vertex AI region availability before choosing a region.

4. Detailed Error Logging

1logger.error(
2 f"Error generating Gemini response: {e}\n"
3 f"Project: {self.project_id}, Location: {self.location}, "
4 f"Model: {self.model_name}, User: {user_id}\n"
5 f"Full traceback:\n{error_trace}"
6)

This saved me hours of debugging. When errors occur in Cloud Run, you see exactly:

  • What went wrong
  • Which project/region/model
  • Full stack trace

Critical for troubleshooting permission issues (more on that later).

Integrating with Slack Handlers

Now update the Slack event handlers to use Gemini in app/handlers/slack_events.py:

1from app.services.gemini import get_gemini_service
2import re
3
4@app.event("app_mention")
5def handle_mention(event, say, logger):
6 """
7 When someone @mentions the bot
8 Example: "@PromoBot hello"
9 Replies in a thread using Gemini AI
10 """
11 user = event.get("user")
12 text = event.get("text", "")
13 thread_ts = event.get("thread_ts") or event.get("ts")
14
15 logger.info(f"Mention from {user}: {text}")
16
17 # Remove @bot mention from message
18 clean_text = re.sub(r"<@[A-Z0-9]+>", "", text).strip()
19
20 try:
21 gemini = get_gemini_service()
22 response = gemini.generate_response_sync(user, clean_text)
23 say(text=response, thread_ts=thread_ts)
24 except Exception as e:
25 logger.error(f"Error with Gemini: {e}")
26 say(
27 text=f"Hi <@{user}>! I'm having trouble connecting to my AI service. "
28 f"Please try again later.",
29 thread_ts=thread_ts,
30 )
31
32@app.event("message")
33def handle_message(event, say, logger):
34 """
35 When someone DMs the bot
36 Only responds to direct messages, not channel messages
37 Uses Gemini AI to generate contextual responses
38 """
39 # Ignore bot messages to prevent loops
40 if event.get("bot_id"):
41 return
42
43 # Only respond to DMs
44 channel_type = event.get("channel_type")
45 if channel_type != "im":
46 return
47
48 user = event.get("user")
49 text = event.get("text", "")
50
51 logger.info(f"DM from {user}: {text}")
52
53 try:
54 gemini = get_gemini_service()
55 response = gemini.generate_response_sync(user, text, use_context=True)
56 say(response)
57 except Exception as e:
58 logger.error(f"Error with Gemini: {e}")
59 say(
60 "I'm having trouble connecting to my AI service. "
61 "Please try again later."
62 )

Important: The regex re.sub(r"<@[A-Z0-9]+>", "", text) removes the bot mention (@PromoBot) from the message before sending to Gemini. Otherwise, Gemini would see messages like "@U12345ABC what is Python?" instead of "what is Python?".

Configuration: Environment Variables

Add these to your .env for local development:

1FLASK_ENV=development
2PORT=3000
3
4# Slack credentials
5SLACK_BOT_TOKEN=xoxb-your-slack-token
6SLACK_SIGNING_SECRET=your-signing-secret
7
8# GCP Configuration
9GCP_PROJECT_ID=your-project-id
10GCP_LOCATION=us-central1

And configure as secrets in Cloud Run:

1# Store GCP_PROJECT_ID as secret
2echo -n "your-project-id" | \
3 gcloud secrets create GCP_PROJECT_ID --data-file=-

Update your GitHub Actions deployment to include the secret:

1--set-secrets "SLACK_BOT_TOKEN=SLACK_BOT_TOKEN:latest,SLACK_SIGNING_SECRET=SLACK_SIGNING_SECRET:latest,GCP_PROJECT_ID=GCP_PROJECT_ID:latest"

Local Testing

Create a test endpoint in app/main.py for easy local testing without Slack:

1@app.route("/test/chat", methods=["POST"])
2def test_chat():
3 """
4 Test endpoint to simulate Slack messages locally
5 Bypasses Slack signature verification for easy testing
6
7 Usage:
8 curl -X POST http://localhost:3000/test/chat \
9 -H "Content-Type: application/json" \
10 -d '{"user_id": "test-user", "message": "What is Python?"}'
11 """
12 if os.getenv("FLASK_ENV") != "development":
13 return jsonify({"error": "Test endpoint only available in development"}), 403
14
15 from app.services.gemini import get_gemini_service
16
17 data = request.get_json()
18 user_id = data.get("user_id", "test-user")
19 message = data.get("message", "")
20 use_context = data.get("use_context", True)
21
22 if not message:
23 return jsonify({"error": "message is required"}), 400
24
25 try:
26 gemini = get_gemini_service()
27 response = gemini.generate_response_sync(user_id, message, use_context)
28 return jsonify({
29 "user_id": user_id,
30 "message": message,
31 "response": response
32 }), 200
33 except Exception as e:
34 import traceback
35 error_details = traceback.format_exc()
36 logger.error(f"Error in test endpoint: {e}\n{error_details}")
37 return jsonify({"error": "Internal server error"}), 500

Test it:

1# Start dev server
2make dev
3
4# Test Gemini integration
5curl -X POST http://localhost:3000/test/chat \
6 -H "Content-Type: application/json" \
7 -d '{"message": "What is Python?"}'

Example response:

1{
2 "user_id": "test-user",
3 "message": "What is Python?",
4 "response": "Python is a high-level, interpreted programming language known for its readability and versatility..."
5}

The Permission Nightmare (and How to Fix It)

When I first deployed to Cloud Run, I got this error:

1403 Permission 'aiplatform.endpoints.predict' denied on resource
2'//aiplatform.googleapis.com/projects/your-project/locations/us-central1/publishers/google/models/gemini-2.5-flash'
3[reason: "IAM_PERMISSION_DENIED"]

The problem: Cloud Run's service account didn't have permission to use Vertex AI.

Why it worked locally: My personal Google account has owner permissions. Cloud Run uses a service account with limited permissions.

Solution: Grant Vertex AI Permissions

First, find which service account Cloud Run uses:

1gcloud run services describe your-service-name \
2 --region=asia-northeast1 \
3 --format="value(spec.template.spec.serviceAccountName)"

Example output:

1123456789-compute@developer.gserviceaccount.com

Then grant the Vertex AI User role:

1gcloud projects add-iam-policy-binding your-project-id \
2 --member="serviceAccount:123456789-compute@developer.gserviceaccount.com" \
3 --role="roles/aiplatform.user"

What this role grants:

  • aiplatform.endpoints.predict - Call Gemini models
  • aiplatform.models.get - Access model metadata
  • Full access to Vertex AI prediction APIs

After granting this permission, the bot started working immediately - no redeploy needed.

Common Gotchas

1. Model Not Found (404 Error)

Error:

1404 Publisher Model 'projects/your-project/locations/asia-northeast1/publishers/google/models/gemini-1.5-flash' was not found

Causes:

  1. Wrong region - Gemini not available in that region
  2. Wrong model name - Model version doesn't exist
  3. Terms of service not accepted - Need to enable Generative AI

Solutions:

1# 1. Use supported region
2self.location = "us-central1" # Not "asia-northeast1"
3
4# 2. Check model name
5# ✅ gemini-2.5-flash
6# ✅ gemini-1.5-flash
7# ❌ gemini-flash (missing version)
8
9# 3. Enable Generative AI API and accept ToS
10gcloud services enable generativelanguage.googleapis.com
11# Then visit: https://console.cloud.google.com/vertex-ai/generative/language
12# Click "Enable" and accept terms of service

2. Works Locally but Fails in Cloud Run

This usually means:

  1. Service account permissions - Grant roles/aiplatform.user
  2. Environment variable missing - Check GCP_PROJECT_ID is in secrets
  3. Region mismatch - Hardcode us-central1 instead of using env var

3. Chat History Not Persisting

Problem: Every message is treated as a new conversation.

Check:

1# ❌ Creates new service every time
2def handle_message():
3 gemini = GeminiService() # Don't do this!
4
5# ✅ Use singleton
6def handle_message():
7 gemini = get_gemini_service() # Reuses same instance

The singleton pattern ensures chat sessions survive across requests.

4. Removing All User Mentions

The current regex removes all user mentions:

1clean_text = re.sub(r"<@[A-Z0-9]+>", "", text).strip()

If a user writes:

1@Bot hey can you help @john with this?

Gemini receives:

1hey can you help with this?

Both @Bot and @john are removed. If you want to keep other mentions, you'd need to only remove the bot's specific mention.

Logging Configuration

Add proper logging to app/main.py:

1import logging
2
3# Configure logging for Cloud Run
4logging.basicConfig(
5 level=logging.INFO,
6 format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
7)
8logger = logging.getLogger(__name__)

Why this matters: Cloud Run automatically sends stdout/stderr to Cloud Logging. Using Python's logging module ensures proper log levels and formatting.

View logs in Cloud Run:

1gcloud run services logs tail your-service-name --region asia-northeast1

Or in the console: https://console.cloud.google.com/run

Testing the Full Flow

1. Local Testing (with real Gemini)

1# Authenticate with your Google account
2gcloud auth application-default login
3
4# Run locally
5make dev
6
7# Test endpoint
8curl -X POST http://localhost:3000/test/chat \
9 -H "Content-Type: application/json" \
10 -d '{"message": "What is Python?"}'

2. Test in Slack (Development)

  1. Set up ngrok to expose local server:
1ngrok http 3000
  1. Update Slack Event URL with ngrok URL:
1https://abc123.ngrok.io/slack/events
  1. Send test message in Slack - @mention your bot or DM it

3. Production Testing (Cloud Run)

Deploy and test:

1git add .
2git commit -m "Add Gemini integration"
3git push origin main

GitHub Actions deploys automatically. Test by messaging the bot in Slack.

Cost Considerations

Vertex AI Gemini Pricing (as of 2025):

For gemini-1.5-flash and gemini-2.5-flash:

  • Input: $0.075 per 1M characters
  • Output: $0.30 per 1M characters
  • Free tier: 1,500 requests/day (limited quota)

Example calculation for a small team Slack bot:

  • 100 messages/day
  • Average 50 characters input + 200 characters output per message

Monthly cost:

1Input: 100 × 50 × 30 = 150,000 chars → $0.01
2Output: 100 × 200 × 30 = 600,000 chars → $0.18
3Total: ~$0.20/month

Nearly free for small teams. Even at 1000+ messages/day, you're looking at less than $5/month.

Compare to OpenAI:

  • GPT-4: ~$0.03 per message (15x more expensive)
  • GPT-3.5: ~$0.002 per message (similar to Gemini)

Gemini Flash is extremely cost-effective for conversational bots.

Monitoring and Debugging

Check if Gemini is being called:

1gcloud logging read "resource.type=cloud_run_revision AND textPayload:GeminiService" \
2 --project=your-project-id \
3 --limit=10

Check for errors:

1gcloud logging read "resource.type=cloud_run_revision AND severity>=ERROR" \
2 --project=your-project-id \
3 --limit=20

Useful log queries:

1# All Gemini-related logs
2textPayload:"Gemini"
3
4# Permission errors
5severity>=ERROR AND textPayload:"Permission"
6
7# Model not found errors
8severity>=ERROR AND textPayload:"404"

Complete File Structure

After adding Gemini integration:

1slack-bot/
2├── app/
3│ ├── __init__.py
4│ ├── main.py # Flask app with /test/chat endpoint
5│ ├── handlers/
6│ │ ├── __init__.py
7│ │ └── slack_events.py # Slack handlers using Gemini
8│ └── services/
9│ ├── __init__.py
10│ └── gemini.py # Vertex AI Gemini service
11├── .env # Local environment variables
12├── .vscode/
13│ └── settings.json # VSCode Python config
14├── Dockerfile
15├── .dockerignore
16├── .github/workflows/deploy.yml
17├── pyproject.toml
18└── uv.lock

Next Steps

Now that you have Gemini working:

  1. Add system prompts to customize bot personality
  2. Implement conversation reset (e.g., /reset command)
  3. Add rate limiting to prevent abuse
  4. Store conversation history in a database (currently in-memory)
  5. Add multimodal support (images, PDFs) - Gemini supports it!

Wrapping Up

Integrating Vertex AI Gemini into Flask was easier than expected, once I got past the permission issues. The hardest part was understanding:

  1. Regional availability - Not all regions support Gemini
  2. Service account permissions - Need explicit Vertex AI access
  3. Chat session management - Singleton pattern for shared state

Key takeaways:

  • Use us-central1 for Gemini (or check region availability)
  • Grant roles/aiplatform.user to Cloud Run service account
  • Use singleton pattern for GeminiService to maintain chat sessions
  • Add detailed error logging for debugging in production
  • Create a /test/chat endpoint for local testing
  • Gemini is extremely cost-effective for chatbots

The entire integration took about 2 hours, with most time spent debugging permissions. Once that was solved, everything worked smoothly.

If you're building AI-powered apps on GCP, Vertex AI Gemini is a no-brainer - especially if you're already using Cloud Run.

Pro Tips

Always log project ID, region, and model name in errors. This makes troubleshooting 10x faster.

Use different model versions for dev vs prod. Test with gemini-1.5-flash (stable) before trying newer models.

Implement conversation reset. Users need a way to clear context if the bot gets confused.

Monitor token usage in Cloud Logging. Vertex AI logs include token counts - useful for cost tracking.

Use use_context=False for one-off questions. Saves chat session memory and prevents context pollution.