Intro
After integrating Gemini into my Slack bot, I had a conversational AI that could chat with users. But it only knew general knowledge - it couldn't answer questions specific to our team, like "How do I use the office coffee machine?"
I wanted to give Gemini a knowledge base - documents it could reference when answering questions.
The goal: Store FAQ and documentation as markdown files in GCS, and have Gemini automatically use them as context when responding.
Here's how I set up GCS as Gemini's "memory" - and the IAM permission rabbit hole I fell into along the way.
What We're Building
Before: Bot answers from general knowledge only
After: Bot loads markdown files from GCS and uses them as context for answers
Key features:
- Markdown files in GCS as knowledge base
- System prompt stored separately from code
- No redeploy needed to update knowledge
- Automatic caching for performance
Why Markdown?
I considered several formats for storing knowledge:
| Format | Pros | Cons |
|---|---|---|
| Markdown | Human-readable, hierarchical, LLM-friendly | None really |
| Spreadsheets | Good for tabular data | Flat structure, parsing overhead |
| JSON | Structured | Less readable, hard to add context |
| Plain text | Simple | No structure |
Markdown won because:
- Gemini naturally understands markdown structure (headers, lists, tables)
- Easy to edit without special tools
- Version controllable with git
- Can include explanations and context
Example knowledge file:
1# Office Coffee Machine Guide23## Common Questions45### Q: How do I use the coffee machine?67### Q: The coffee tastes weird, what's wrong?89## Answer1011### Making Coffee12131. Fill the water tank (left side)142. Add beans to the hopper (not pre-ground!)153. Select your drink size on the touchscreen164. Place cup on the drip tray1718| Drink | Button | Time |19| --------- | ---------------- | ------ |20| Espresso | Single cup icon | 25 sec |21| Americano | Cup + water drop | 45 sec |22| Latte | Cup + milk icon | 60 sec |2324### Troubleshooting2526**Bitter taste?** Beans might be stale. Check the roast date on the bag.2728**No water coming out?** The tank is empty. Dave from accounting always forgets to refill it.2930**Weird grinding noise?** Someone put flavored beans in again. Please don't.
Gemini can parse this structure and provide accurate answers.
Architecture Overview
1┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐2│ User asks │────▶│ GeminiService │────▶│ Gemini │3│ question │ │ │ │ │4└─────────────────┘ │ System prompt │ │ Answers using │5 │ + GCS memory │ │ memory context │6 └────────┬────────┘ └─────────────────┘7 │8 ▼9 ┌─────────────────┐10 │ MemoryService │11 │ (loads from │12 │ GCS bucket) │13 └─────────────────┘
Two sources of context:
- System prompt (in codebase) - Defines bot personality, requires deploy to change
- GCS memory (in bucket) - Knowledge base, editable without deploy
Setting Up GCS
1. Create the Bucket
1gsutil mb -l asia-northeast1 gs://your-bot-memory
I used the same region as Cloud Run for lower latency.
2. Grant Permissions
Your Cloud Run service account needs read access:
1gcloud projects add-iam-policy-binding your-project \2 --member="serviceAccount:your-service-account@your-project.iam.gserviceaccount.com" \3 --role="roles/storage.objectViewer"
3. Upload Knowledge Files
1gsutil cp knowledge/faq.md gs://your-bot-memory/2gsutil cp knowledge/processes.md gs://your-bot-memory/
Installing Dependencies
Add the GCS client library:
1uv add google-cloud-storage
Updated pyproject.toml:
1[project]2dependencies = [3 "flask>=3.1.2",4 "google-cloud-aiplatform>=1.121.0",5 "google-cloud-storage>=2.18.0", # New!6 "python-dotenv>=1.1.1",7]
Mypy Gotcha
If you're using mypy for type checking, you might get:
1Module "google.cloud" has no attribute "storage" [attr-defined]
Fix: Use direct module import instead of attribute access:
1# ❌ mypy can't resolve this2from google.cloud import storage34# ✅ This works5import google.cloud.storage as storage
The namespace package google.cloud confuses mypy, but direct import works fine.
Creating the Memory Service
Create app/services/memory.py:
1import os2import logging3from typing import Optional4import google.cloud.storage as storage56logger = logging.getLogger(__name__)789class MemoryService:10 """Service for loading knowledge from GCS."""1112 def __init__(self) -> None:13 self.bucket_name = os.getenv("GCS_MEMORY_BUCKET")1415 if not self.bucket_name:16 raise ValueError("GCS_MEMORY_BUCKET environment variable is required")1718 # Uses Application Default Credentials (ADC)19 # In Cloud Run: uses service account20 # Locally: uses gcloud auth21 self.client = storage.Client()22 self.bucket = self.client.bucket(self.bucket_name)2324 # Cache loaded content to avoid repeated GCS calls25 self._cache: dict[str, str] = {}2627 logger.info(f"MemoryService initialized - Bucket: {self.bucket_name}")2829 def _load_file(self, blob_name: str) -> Optional[str]:30 """Load single file from GCS with caching."""31 if blob_name in self._cache:32 return self._cache[blob_name]3334 try:35 blob = self.bucket.blob(blob_name)36 content = blob.download_as_text()37 self._cache[blob_name] = content38 logger.info(f"Loaded and cached: {blob_name}")39 return content40 except Exception as e:41 logger.error(f"Failed to load {blob_name}: {e}")42 return None4344 def load_all_memory(self) -> str:45 """Load all markdown files from bucket."""46 contents = []4748 try:49 blobs = self.client.list_blobs(self.bucket_name)5051 for blob in blobs:52 if blob.name.endswith(".md"):53 content = self._load_file(blob.name)54 if content:55 filename = blob.name.split("/")[-1]56 contents.append(f"--- {filename} ---\n{content}")5758 logger.info(f"Loaded {len(contents)} memory files")59 except Exception as e:60 logger.error(f"Failed to list memory files: {e}")6162 return "\n\n".join(contents)6364 def clear_cache(self) -> None:65 """Clear cache to force reload from GCS."""66 self._cache.clear()6768 def refresh(self) -> str:69 """Clear cache and reload all memory."""70 self.clear_cache()71 return self.load_all_memory()727374# Singleton instance75_memory_service: Optional[MemoryService] = None767778def get_memory_service() -> MemoryService:79 """Get or create the MemoryService singleton."""80 global _memory_service81 if _memory_service is None:82 _memory_service = MemoryService()83 return _memory_service
Key Design Decisions
1. Caching: GCS files are cached in memory to avoid network calls on every request. The cache persists for the Cloud Run instance lifetime.
2. Graceful degradation: If GCS fails, the bot still works - just without memory context.
3. Singleton pattern: Same as GeminiService, ensures one instance per app.
Storing the System Prompt
I moved the system prompt from inline code to a separate file for cleaner separation:
prompts/system_prompt.md:
1You are a helpful support assistant for our team.2When answering questions, refer to the knowledge base provided below.3If relevant information exists in the knowledge base, use it to answer.4If the information is not available, either use general knowledge or let the user know you don't have that information.5Keep responses concise and clear.
This defines the bot's personality and how it should use the knowledge base.
Integrating Memory with Gemini
Update app/services/gemini.py:
1import os2import logging3from pathlib import Path4from typing import Optional5import vertexai6from vertexai.generative_models import GenerativeModel, ChatSession78from app.services.memory import get_memory_service910PROJECT_ROOT = Path(__file__).parent.parent.parent11SYSTEM_PROMPT_PATH = PROJECT_ROOT / "prompts" / "system_prompt.md"1213logger = logging.getLogger(__name__)141516class GeminiService:17 """Service for interacting with Vertex AI Gemini"""1819 def __init__(self) -> None:20 self.project_id = os.getenv("GCP_PROJECT_ID")21 self.location = "us-central1"22 self.model_name = "gemini-2.5-flash"2324 if not self.project_id:25 raise ValueError("GCP_PROJECT_ID environment variable is required")2627 vertexai.init(project=self.project_id, location=self.location)2829 # Load memory from GCS30 self.memory_content = self._load_memory()3132 # Build system instruction with memory33 self.system_instruction = self._build_system_instruction()3435 # Initialize model with system instruction36 self.model = GenerativeModel(37 self.model_name,38 system_instruction=self.system_instruction39 )4041 self.chat_sessions: dict[str, ChatSession] = {}42 logger.info("GeminiService initialized successfully")4344 def _load_memory(self) -> str:45 """Load memory content from GCS."""46 try:47 memory_service = get_memory_service()48 content = memory_service.load_all_memory()49 logger.info(f"Loaded memory: {len(content)} characters")50 return content51 except Exception as e:52 logger.warning(f"Failed to load memory: {e}")53 return ""5455 def _load_system_prompt(self) -> str:56 """Load base system prompt from file."""57 try:58 return SYSTEM_PROMPT_PATH.read_text(encoding="utf-8").strip()59 except Exception as e:60 logger.error(f"Failed to load system prompt: {e}")61 raise6263 def _build_system_instruction(self) -> str:64 """Build system instruction with memory context."""65 base_instruction = self._load_system_prompt()6667 if self.memory_content:68 return f"{base_instruction}\n\n{self.memory_content}"69 else:70 return base_instruction7172 def reload_memory(self) -> None:73 """Reload memory from GCS and update model."""74 self.memory_content = self._load_memory()75 self.system_instruction = self._build_system_instruction()76 self.model = GenerativeModel(77 self.model_name,78 system_instruction=self.system_instruction79 )80 self.chat_sessions.clear()81 logger.info("Memory reloaded, chat sessions cleared")8283 # ... rest of the methods unchanged
How It Works
- On startup,
GeminiServiceloads all.mdfiles from GCS - Memory content is appended to the system prompt
- Gemini receives the combined instruction when generating responses
- User's question is matched against the knowledge base
Configuration
Environment Variables
Add to .env:
1# GCS Memory Configuration2GCS_MEMORY_BUCKET=your-bot-memory
Cloud Run Deployment
Update deploy.yml:
1--set-env-vars "FLASK_ENV=production,GCS_MEMORY_BUCKET=your-bot-memory"
Dockerfile
Don't forget to copy the prompts directory:
1COPY app/ ./app/2COPY prompts/ ./prompts/ # Add this line!
I spent 30 minutes debugging "file not found" errors before realizing this.
Testing Locally
1# Authenticate2gcloud auth application-default login34# Set environment5export GCS_MEMORY_BUCKET=your-bot-memory67# Run app8make dev910# Test11curl -X POST http://localhost:3000/test/chat \12 -H "Content-Type: application/json" \13 -d '{"message": "How do I make a latte?"}'
Expected response should include information from your markdown files.
Updating Knowledge Without Redeploying
The best part: to update the knowledge base, just:
1# Edit your markdown locally2vim knowledge/coffee-machine.md34# Upload to GCS5gsutil cp knowledge/coffee-machine.md gs://your-bot-memory/67# Optional: Force reload (or wait for new Cloud Run instance)8# The cache clears when instances restart
No code changes, no deployment, no downtime.
File Structure
After adding GCS memory:
1slack-bot/2├── app/3│ ├── __init__.py4│ ├── main.py5│ ├── handlers/6│ │ └── slack_events.py7│ └── services/8│ ├── gemini.py # Updated with memory integration9│ └── memory.py # New! GCS memory service10├── prompts/11│ └── system_prompt.md # New! Bot personality12├── memory/ # Local copies of knowledge files13│ └── coffee-machine.md14├── .env15├── Dockerfile # Updated to copy prompts/16├── pyproject.toml17└── uv.lock
Gotchas and Tips
1. Dockerfile Must Copy prompts/
1COPY prompts/ ./prompts/
Forgot this and spent time debugging "No such file" errors in Cloud Run.
2. Cache Behavior
Memory is cached for the Cloud Run instance lifetime. To force refresh:
- Scale to zero and back up
- Deploy a new revision
- Call
reload_memory()programmatically
3. Large Knowledge Bases
For small teams (< 50 files), loading everything into context works fine. For larger knowledge bases, consider:
- Vertex AI Search for semantic retrieval
- Vector embeddings for similarity search
- Only loading relevant files based on query
4. Markdown Best Practices for LLMs
Structure matters:
1# Main Topic23## Question Section45### Q: Exact question users might ask?67## Answer89Clear, structured answer with:1011- Bullet points12- Code blocks13- Tables for structured data1415**Important notes** stand out with bold.
Gemini picks up on this structure when matching questions to answers.
Cost Impact
GCS costs for a small knowledge base:
- Storage: < $0.01/month for a few markdown files
- Operations: ~$0.004 per 10,000 reads
Essentially free for this use case.
Next Steps
Now that the bot has memory:
- Add more knowledge files - FAQ, processes, troubleshooting guides
- Implement memory refresh command -
/refresh-knowledgeslash command - Add memory search - For large knowledge bases, semantic search
- Version knowledge - Use GCS versioning for rollback
- Monitor usage - Track how often memory is used in responses
Wrapping Up
Adding GCS as Gemini's memory transformed my bot from a generic AI to a team-specific assistant. The key insight: separate what changes frequently (knowledge) from what changes rarely (code).
Key takeaways:
- Markdown is the ideal format for LLM knowledge bases
- GCS provides cheap, simple storage with easy updates
- System prompts should be in version control, knowledge can be external
- Don't forget to copy all required files in your Dockerfile
Now when someone asks "How do I make a latte?", the bot actually knows the answer - and can even tell you about Dave from accounting.
