Intro
After deploying my Flask app to Cloud Run, I had a working Slack bot that could receive messages. But it couldn't actually respond intelligently - it just echoed back what you said.
Time to add the AI brain.
I wanted to use Google's Vertex AI Gemini (their latest multimodal model) to make the bot conversational.
The goal: users could @mention the bot or DM it, and it would respond naturally using Gemini.
Here's everything I learned integrating Vertex AI into Flask, including all the permission errors and debugging I had to do along the way.
What We're Building
Before: Bot receives Slack messages but can't respond
After: Bot uses Gemini to generate contextual responses, maintaining conversation history per user
Key features:
- User-specific chat sessions (conversation context)
- Works in both DMs and channel mentions
- Detailed error logging for debugging
- Local testing support
Installing Dependencies
First, add the Vertex AI SDK:
1uv add google-cloud-aiplatform
This installs the google-cloud-aiplatform package which includes support for Vertex AI's Generative Models API.
Updated pyproject.toml:
1[project]2dependencies = [3 "flask>=3.1.2",4 "python-dotenv>=1.1.1",5 "slack-bolt>=1.26.0",6 "google-cloud-aiplatform>=1.121.0", # New!7]
Creating the Gemini Service
I created a service class to encapsulate all Gemini logic in app/services/gemini.py:
1import os2import logging3from typing import Optional4import vertexai5from vertexai.generative_models import GenerativeModel, ChatSession67logger = logging.getLogger(__name__)8910class GeminiService:11 """Service for interacting with Vertex AI Gemini"""1213 def __init__(self):14 self.project_id = os.getenv("GCP_PROJECT_ID")15 # Gemini is not available in all regions as of Nov 202516 self.location = "us-central1"17 self.model_name = "gemini-2.5-flash"1819 if not self.project_id:20 raise ValueError("GCP_PROJECT_ID environment variable is required")2122 logger.info(23 f"Initializing GeminiService - Project: {self.project_id}, "24 f"Location: {self.location}, Model: {self.model_name}"25 )2627 # Initialize Vertex AI28 vertexai.init(project=self.project_id, location=self.location)2930 # Create model instance31 self.model = GenerativeModel(self.model_name)3233 # Store chat sessions per user34 self.chat_sessions: dict[str, ChatSession] = {}3536 logger.info("GeminiService initialized successfully")3738 def get_chat_session(self, user_id: str) -> ChatSession:39 """Get or create chat session for user"""40 if user_id not in self.chat_sessions:41 self.chat_sessions[user_id] = self.model.start_chat()42 return self.chat_sessions[user_id]4344 def clear_chat_session(self, user_id: str) -> None:45 """Clear chat history for user"""46 if user_id in self.chat_sessions:47 del self.chat_sessions[user_id]4849 def generate_response_sync(50 self, user_id: str, message: str, use_context: bool = True51 ) -> str:52 """53 Generate AI response using Gemini5455 Args:56 user_id: Unique identifier for user (preserves conversation context)57 message: User's message58 use_context: If True, uses chat history. If False, one-off question.5960 Returns:61 Gemini's response text62 """63 try:64 if use_context:65 # Use chat session to maintain conversation history66 chat = self.get_chat_session(user_id)67 response = chat.send_message(message)68 else:69 # One-off question without context70 response = self.model.generate_content(message)7172 return response.text7374 except Exception as e:75 import traceback7677 error_trace = traceback.format_exc()78 logger.error(79 f"Error generating Gemini response: {e}\n"80 f"Project: {self.project_id}, Location: {self.location}, "81 f"Model: {self.model_name}, User: {user_id}\n"82 f"Full traceback:\n{error_trace}"83 )84 return (85 "Sorry, I encountered an error processing your request. "86 "Please try again."87 )888990# Singleton instance91_gemini_service: Optional[GeminiService] = None929394def get_gemini_service() -> GeminiService:95 """Get or create the Gemini service singleton"""96 global _gemini_service97 if _gemini_service is None:98 _gemini_service = GeminiService()99 return _gemini_service
Key Design Decisions
1. User-Specific Chat Sessions
The chat sessions dictionary (self.chat_sessions) maintains separate conversation histories for each user:
1self.chat_sessions: dict[str, ChatSession] = {}
Why this matters:
- User A asks "What's Python?" → Gemini explains Python
- User A asks "What's it used for?" → Gemini knows "it" refers to Python
- User B asks "What's it used for?" → Doesn't see User A's history
Each Slack user gets their own context. No cross-user contamination.
2. Singleton Pattern
1_gemini_service: Optional[GeminiService] = None23def get_gemini_service() -> GeminiService:4 global _gemini_service5 if _gemini_service is None:6 _gemini_service = GeminiService()7 return _gemini_service
This ensures we only initialize Vertex AI once per app lifecycle, not on every request. Faster responses, fewer API calls.
3. Regional Model Availability
1self.location = "us-central1"
Gemini isn't available in all GCP regions. I initially tried asia-northeast1 (Tokyo) but got 404 errors.
Available regions (as of 2025):
us-central1(Iowa) ✅us-east4(Virginia) ✅europe-west1(Belgium) ✅asia-southeast1(Singapore) ✅
Always check Vertex AI region availability before choosing a region.
4. Detailed Error Logging
1logger.error(2 f"Error generating Gemini response: {e}\n"3 f"Project: {self.project_id}, Location: {self.location}, "4 f"Model: {self.model_name}, User: {user_id}\n"5 f"Full traceback:\n{error_trace}"6)
This saved me hours of debugging. When errors occur in Cloud Run, you see exactly:
- What went wrong
- Which project/region/model
- Full stack trace
Critical for troubleshooting permission issues (more on that later).
Integrating with Slack Handlers
Now update the Slack event handlers to use Gemini in app/handlers/slack_events.py:
1from app.services.gemini import get_gemini_service2import re34@app.event("app_mention")5def handle_mention(event, say, logger):6 """7 When someone @mentions the bot8 Example: "@PromoBot hello"9 Replies in a thread using Gemini AI10 """11 user = event.get("user")12 text = event.get("text", "")13 thread_ts = event.get("thread_ts") or event.get("ts")1415 logger.info(f"Mention from {user}: {text}")1617 # Remove @bot mention from message18 clean_text = re.sub(r"<@[A-Z0-9]+>", "", text).strip()1920 try:21 gemini = get_gemini_service()22 response = gemini.generate_response_sync(user, clean_text)23 say(text=response, thread_ts=thread_ts)24 except Exception as e:25 logger.error(f"Error with Gemini: {e}")26 say(27 text=f"Hi <@{user}>! I'm having trouble connecting to my AI service. "28 f"Please try again later.",29 thread_ts=thread_ts,30 )3132@app.event("message")33def handle_message(event, say, logger):34 """35 When someone DMs the bot36 Only responds to direct messages, not channel messages37 Uses Gemini AI to generate contextual responses38 """39 # Ignore bot messages to prevent loops40 if event.get("bot_id"):41 return4243 # Only respond to DMs44 channel_type = event.get("channel_type")45 if channel_type != "im":46 return4748 user = event.get("user")49 text = event.get("text", "")5051 logger.info(f"DM from {user}: {text}")5253 try:54 gemini = get_gemini_service()55 response = gemini.generate_response_sync(user, text, use_context=True)56 say(response)57 except Exception as e:58 logger.error(f"Error with Gemini: {e}")59 say(60 "I'm having trouble connecting to my AI service. "61 "Please try again later."62 )
Important: The regex re.sub(r"<@[A-Z0-9]+>", "", text) removes the bot mention (@PromoBot) from the message before sending to Gemini. Otherwise, Gemini would see messages like "@U12345ABC what is Python?" instead of "what is Python?".
Configuration: Environment Variables
Add these to your .env for local development:
1FLASK_ENV=development2PORT=300034# Slack credentials5SLACK_BOT_TOKEN=xoxb-your-slack-token6SLACK_SIGNING_SECRET=your-signing-secret78# GCP Configuration9GCP_PROJECT_ID=your-project-id10GCP_LOCATION=us-central1
And configure as secrets in Cloud Run:
1# Store GCP_PROJECT_ID as secret2echo -n "your-project-id" | \3 gcloud secrets create GCP_PROJECT_ID --data-file=-
Update your GitHub Actions deployment to include the secret:
1--set-secrets "SLACK_BOT_TOKEN=SLACK_BOT_TOKEN:latest,SLACK_SIGNING_SECRET=SLACK_SIGNING_SECRET:latest,GCP_PROJECT_ID=GCP_PROJECT_ID:latest"
Local Testing
Create a test endpoint in app/main.py for easy local testing without Slack:
1@app.route("/test/chat", methods=["POST"])2def test_chat():3 """4 Test endpoint to simulate Slack messages locally5 Bypasses Slack signature verification for easy testing67 Usage:8 curl -X POST http://localhost:3000/test/chat \9 -H "Content-Type: application/json" \10 -d '{"user_id": "test-user", "message": "What is Python?"}'11 """12 if os.getenv("FLASK_ENV") != "development":13 return jsonify({"error": "Test endpoint only available in development"}), 4031415 from app.services.gemini import get_gemini_service1617 data = request.get_json()18 user_id = data.get("user_id", "test-user")19 message = data.get("message", "")20 use_context = data.get("use_context", True)2122 if not message:23 return jsonify({"error": "message is required"}), 4002425 try:26 gemini = get_gemini_service()27 response = gemini.generate_response_sync(user_id, message, use_context)28 return jsonify({29 "user_id": user_id,30 "message": message,31 "response": response32 }), 20033 except Exception as e:34 import traceback35 error_details = traceback.format_exc()36 logger.error(f"Error in test endpoint: {e}\n{error_details}")37 return jsonify({"error": "Internal server error"}), 500
Test it:
1# Start dev server2make dev34# Test Gemini integration5curl -X POST http://localhost:3000/test/chat \6 -H "Content-Type: application/json" \7 -d '{"message": "What is Python?"}'
Example response:
1{2 "user_id": "test-user",3 "message": "What is Python?",4 "response": "Python is a high-level, interpreted programming language known for its readability and versatility..."5}
The Permission Nightmare (and How to Fix It)
When I first deployed to Cloud Run, I got this error:
1403 Permission 'aiplatform.endpoints.predict' denied on resource2'//aiplatform.googleapis.com/projects/your-project/locations/us-central1/publishers/google/models/gemini-2.5-flash'3[reason: "IAM_PERMISSION_DENIED"]
The problem: Cloud Run's service account didn't have permission to use Vertex AI.
Why it worked locally: My personal Google account has owner permissions. Cloud Run uses a service account with limited permissions.
Solution: Grant Vertex AI Permissions
First, find which service account Cloud Run uses:
1gcloud run services describe your-service-name \2 --region=asia-northeast1 \3 --format="value(spec.template.spec.serviceAccountName)"
Example output:
1123456789-compute@developer.gserviceaccount.com
Then grant the Vertex AI User role:
1gcloud projects add-iam-policy-binding your-project-id \2 --member="serviceAccount:123456789-compute@developer.gserviceaccount.com" \3 --role="roles/aiplatform.user"
What this role grants:
aiplatform.endpoints.predict- Call Gemini modelsaiplatform.models.get- Access model metadata- Full access to Vertex AI prediction APIs
After granting this permission, the bot started working immediately - no redeploy needed.
Common Gotchas
1. Model Not Found (404 Error)
Error:
1404 Publisher Model 'projects/your-project/locations/asia-northeast1/publishers/google/models/gemini-1.5-flash' was not found
Causes:
- Wrong region - Gemini not available in that region
- Wrong model name - Model version doesn't exist
- Terms of service not accepted - Need to enable Generative AI
Solutions:
1# 1. Use supported region2self.location = "us-central1" # Not "asia-northeast1"34# 2. Check model name5# ✅ gemini-2.5-flash6# ✅ gemini-1.5-flash7# ❌ gemini-flash (missing version)89# 3. Enable Generative AI API and accept ToS10gcloud services enable generativelanguage.googleapis.com11# Then visit: https://console.cloud.google.com/vertex-ai/generative/language12# Click "Enable" and accept terms of service
2. Works Locally but Fails in Cloud Run
This usually means:
- Service account permissions - Grant
roles/aiplatform.user - Environment variable missing - Check
GCP_PROJECT_IDis in secrets - Region mismatch - Hardcode
us-central1instead of using env var
3. Chat History Not Persisting
Problem: Every message is treated as a new conversation.
Check:
1# ❌ Creates new service every time2def handle_message():3 gemini = GeminiService() # Don't do this!45# ✅ Use singleton6def handle_message():7 gemini = get_gemini_service() # Reuses same instance
The singleton pattern ensures chat sessions survive across requests.
4. Removing All User Mentions
The current regex removes all user mentions:
1clean_text = re.sub(r"<@[A-Z0-9]+>", "", text).strip()
If a user writes:
1@Bot hey can you help @john with this?
Gemini receives:
1hey can you help with this?
Both @Bot and @john are removed. If you want to keep other mentions, you'd need to only remove the bot's specific mention.
Logging Configuration
Add proper logging to app/main.py:
1import logging23# Configure logging for Cloud Run4logging.basicConfig(5 level=logging.INFO,6 format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",7)8logger = logging.getLogger(__name__)
Why this matters: Cloud Run automatically sends stdout/stderr to Cloud Logging. Using Python's logging module ensures proper log levels and formatting.
View logs in Cloud Run:
1gcloud run services logs tail your-service-name --region asia-northeast1
Or in the console: https://console.cloud.google.com/run
Testing the Full Flow
1. Local Testing (with real Gemini)
1# Authenticate with your Google account2gcloud auth application-default login34# Run locally5make dev67# Test endpoint8curl -X POST http://localhost:3000/test/chat \9 -H "Content-Type: application/json" \10 -d '{"message": "What is Python?"}'
2. Test in Slack (Development)
- Set up ngrok to expose local server:
1ngrok http 3000
- Update Slack Event URL with ngrok URL:
1https://abc123.ngrok.io/slack/events
- Send test message in Slack - @mention your bot or DM it
3. Production Testing (Cloud Run)
Deploy and test:
1git add .2git commit -m "Add Gemini integration"3git push origin main
GitHub Actions deploys automatically. Test by messaging the bot in Slack.
Cost Considerations
Vertex AI Gemini Pricing (as of 2025):
For gemini-1.5-flash and gemini-2.5-flash:
- Input: $0.075 per 1M characters
- Output: $0.30 per 1M characters
- Free tier: 1,500 requests/day (limited quota)
Example calculation for a small team Slack bot:
- 100 messages/day
- Average 50 characters input + 200 characters output per message
Monthly cost:
1Input: 100 × 50 × 30 = 150,000 chars → $0.012Output: 100 × 200 × 30 = 600,000 chars → $0.183Total: ~$0.20/month
Nearly free for small teams. Even at 1000+ messages/day, you're looking at less than $5/month.
Compare to OpenAI:
- GPT-4: ~$0.03 per message (15x more expensive)
- GPT-3.5: ~$0.002 per message (similar to Gemini)
Gemini Flash is extremely cost-effective for conversational bots.
Monitoring and Debugging
Check if Gemini is being called:
1gcloud logging read "resource.type=cloud_run_revision AND textPayload:GeminiService" \2 --project=your-project-id \3 --limit=10
Check for errors:
1gcloud logging read "resource.type=cloud_run_revision AND severity>=ERROR" \2 --project=your-project-id \3 --limit=20
Useful log queries:
1# All Gemini-related logs2textPayload:"Gemini"34# Permission errors5severity>=ERROR AND textPayload:"Permission"67# Model not found errors8severity>=ERROR AND textPayload:"404"
Complete File Structure
After adding Gemini integration:
1slack-bot/2├── app/3│ ├── __init__.py4│ ├── main.py # Flask app with /test/chat endpoint5│ ├── handlers/6│ │ ├── __init__.py7│ │ └── slack_events.py # Slack handlers using Gemini8│ └── services/9│ ├── __init__.py10│ └── gemini.py # Vertex AI Gemini service11├── .env # Local environment variables12├── .vscode/13│ └── settings.json # VSCode Python config14├── Dockerfile15├── .dockerignore16├── .github/workflows/deploy.yml17├── pyproject.toml18└── uv.lock
Next Steps
Now that you have Gemini working:
- Add system prompts to customize bot personality
- Implement conversation reset (e.g.,
/resetcommand) - Add rate limiting to prevent abuse
- Store conversation history in a database (currently in-memory)
- Add multimodal support (images, PDFs) - Gemini supports it!
Wrapping Up
Integrating Vertex AI Gemini into Flask was easier than expected, once I got past the permission issues. The hardest part was understanding:
- Regional availability - Not all regions support Gemini
- Service account permissions - Need explicit Vertex AI access
- Chat session management - Singleton pattern for shared state
Key takeaways:
- Use
us-central1for Gemini (or check region availability) - Grant
roles/aiplatform.userto Cloud Run service account - Use singleton pattern for GeminiService to maintain chat sessions
- Add detailed error logging for debugging in production
- Create a
/test/chatendpoint for local testing - Gemini is extremely cost-effective for chatbots
The entire integration took about 2 hours, with most time spent debugging permissions. Once that was solved, everything worked smoothly.
If you're building AI-powered apps on GCP, Vertex AI Gemini is a no-brainer - especially if you're already using Cloud Run.
Pro Tips
Always log project ID, region, and model name in errors. This makes troubleshooting 10x faster.
Use different model versions for dev vs prod. Test with gemini-1.5-flash (stable) before trying newer models.
Implement conversation reset. Users need a way to clear context if the bot gets confused.
Monitor token usage in Cloud Logging. Vertex AI logs include token counts - useful for cost tracking.
Use use_context=False for one-off questions. Saves chat session memory and prevents context pollution.
