Adding GCS Memory to Gemini: Teaching Your Bot with Markdown Files

Intro

After integrating Gemini into my Slack bot, I had a conversational AI that could chat with users. But it only knew general knowledge - it couldn't answer questions specific to our team, like "How do I use the office coffee machine?"

I wanted to give Gemini a knowledge base - documents it could reference when answering questions.

The goal: Store FAQ and documentation as markdown files in GCS, and have Gemini automatically use them as context when responding.

Here's how I set up GCS as Gemini's "memory" - and the IAM permission rabbit hole I fell into along the way.

What We're Building

Before: Bot answers from general knowledge only

After: Bot loads markdown files from GCS and uses them as context for answers

Key features:

Markdown files in GCS as knowledge base
System prompt stored separately from code
No redeploy needed to update knowledge
Automatic caching for performance

Why Markdown?

I considered several formats for storing knowledge:

Format	Pros	Cons
Markdown	Human-readable, hierarchical, LLM-friendly	None really
Spreadsheets	Good for tabular data	Flat structure, parsing overhead
JSON	Structured	Less readable, hard to add context
Plain text	Simple	No structure

Markdown won because:

Gemini naturally understands markdown structure (headers, lists, tables)
Easy to edit without special tools
Version controllable with git
Can include explanations and context

Example knowledge file:

1# Office Coffee Machine Guide
2
3## Common Questions
4
5### Q: How do I use the coffee machine?
6
7### Q: The coffee tastes weird, what's wrong?
8
9## Answer
10
11### Making Coffee
12
131. Fill the water tank (left side)
142. Add beans to the hopper (not pre-ground!)
153. Select your drink size on the touchscreen
164. Place cup on the drip tray
17
18| Drink     | Button           | Time   |
19| --------- | ---------------- | ------ |
20| Espresso  | Single cup icon  | 25 sec |
21| Americano | Cup + water drop | 45 sec |
22| Latte     | Cup + milk icon  | 60 sec |
23
24### Troubleshooting
25
26**Bitter taste?** Beans might be stale. Check the roast date on the bag.
27
28**No water coming out?** The tank is empty. Dave from accounting always forgets to refill it.
29
30**Weird grinding noise?** Someone put flavored beans in again. Please don't.

Gemini can parse this structure and provide accurate answers.

Architecture Overview

1┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
2│   User asks     │────▶│  GeminiService  │────▶│     Gemini      │
3│   question      │     │                 │     │                 │
4└─────────────────┘     │  System prompt  │     │  Answers using  │
5                        │  + GCS memory   │     │  memory context │
6                        └────────┬────────┘     └─────────────────┘
7                                 │
8                                 ▼
9                        ┌─────────────────┐
10                        │  MemoryService  │
11                        │  (loads from    │
12                        │   GCS bucket)   │
13                        └─────────────────┘

Two sources of context:

System prompt (in codebase) - Defines bot personality, requires deploy to change
GCS memory (in bucket) - Knowledge base, editable without deploy

Setting Up GCS

1. Create the Bucket

1gsutil mb -l asia-northeast1 gs://your-bot-memory

I used the same region as Cloud Run for lower latency.

2. Grant Permissions

Your Cloud Run service account needs read access:

1gcloud projects add-iam-policy-binding your-project \
2  --member="serviceAccount:your-service-account@your-project.iam.gserviceaccount.com" \
3  --role="roles/storage.objectViewer"

3. Upload Knowledge Files

1gsutil cp knowledge/faq.md gs://your-bot-memory/
2gsutil cp knowledge/processes.md gs://your-bot-memory/

Installing Dependencies

Add the GCS client library:

1uv add google-cloud-storage

Updated pyproject.toml:

1[project]
2dependencies = [
3    "flask>=3.1.2",
4    "google-cloud-aiplatform>=1.121.0",
5    "google-cloud-storage>=2.18.0",  # New!
6    "python-dotenv>=1.1.1",
7]

Mypy Gotcha

If you're using mypy for type checking, you might get:

1Module "google.cloud" has no attribute "storage"  [attr-defined]

Fix: Use direct module import instead of attribute access:

1# ❌ mypy can't resolve this
2from google.cloud import storage
3
4# ✅ This works
5import google.cloud.storage as storage

The namespace package google.cloud confuses mypy, but direct import works fine.

Creating the Memory Service

Create app/services/memory.py:

1import os
2import logging
3from typing import Optional
4import google.cloud.storage as storage
5
6logger = logging.getLogger(__name__)
7
8
9class MemoryService:
10    """Service for loading knowledge from GCS."""
11
12    def __init__(self) -> None:
13        self.bucket_name = os.getenv("GCS_MEMORY_BUCKET")
14
15        if not self.bucket_name:
16            raise ValueError("GCS_MEMORY_BUCKET environment variable is required")
17
18        # Uses Application Default Credentials (ADC)
19        # In Cloud Run: uses service account
20        # Locally: uses gcloud auth
21        self.client = storage.Client()
22        self.bucket = self.client.bucket(self.bucket_name)
23
24        # Cache loaded content to avoid repeated GCS calls
25        self._cache: dict[str, str] = {}
26
27        logger.info(f"MemoryService initialized - Bucket: {self.bucket_name}")
28
29    def _load_file(self, blob_name: str) -> Optional[str]:
30        """Load single file from GCS with caching."""
31        if blob_name in self._cache:
32            return self._cache[blob_name]
33
34        try:
35            blob = self.bucket.blob(blob_name)
36            content = blob.download_as_text()
37            self._cache[blob_name] = content
38            logger.info(f"Loaded and cached: {blob_name}")
39            return content
40        except Exception as e:
41            logger.error(f"Failed to load {blob_name}: {e}")
42            return None
43
44    def load_all_memory(self) -> str:
45        """Load all markdown files from bucket."""
46        contents = []
47
48        try:
49            blobs = self.client.list_blobs(self.bucket_name)
50
51            for blob in blobs:
52                if blob.name.endswith(".md"):
53                    content = self._load_file(blob.name)
54                    if content:
55                        filename = blob.name.split("/")[-1]
56                        contents.append(f"--- {filename} ---\n{content}")
57
58            logger.info(f"Loaded {len(contents)} memory files")
59        except Exception as e:
60            logger.error(f"Failed to list memory files: {e}")
61
62        return "\n\n".join(contents)
63
64    def clear_cache(self) -> None:
65        """Clear cache to force reload from GCS."""
66        self._cache.clear()
67
68    def refresh(self) -> str:
69        """Clear cache and reload all memory."""
70        self.clear_cache()
71        return self.load_all_memory()
72
73
74# Singleton instance
75_memory_service: Optional[MemoryService] = None
76
77
78def get_memory_service() -> MemoryService:
79    """Get or create the MemoryService singleton."""
80    global _memory_service
81    if _memory_service is None:
82        _memory_service = MemoryService()
83    return _memory_service

Key Design Decisions

1. Caching: GCS files are cached in memory to avoid network calls on every request. The cache persists for the Cloud Run instance lifetime.

2. Graceful degradation: If GCS fails, the bot still works - just without memory context.

3. Singleton pattern: Same as GeminiService, ensures one instance per app.

Storing the System Prompt

I moved the system prompt from inline code to a separate file for cleaner separation:

prompts/system_prompt.md:

1You are a helpful support assistant for our team.
2When answering questions, refer to the knowledge base provided below.
3If relevant information exists in the knowledge base, use it to answer.
4If the information is not available, either use general knowledge or let the user know you don't have that information.
5Keep responses concise and clear.

This defines the bot's personality and how it should use the knowledge base.

Integrating Memory with Gemini

Update app/services/gemini.py:

1import os
2import logging
3from pathlib import Path
4from typing import Optional
5import vertexai
6from vertexai.generative_models import GenerativeModel, ChatSession
7
8from app.services.memory import get_memory_service
9
10PROJECT_ROOT = Path(__file__).parent.parent.parent
11SYSTEM_PROMPT_PATH = PROJECT_ROOT / "prompts" / "system_prompt.md"
12
13logger = logging.getLogger(__name__)
14
15
16class GeminiService:
17    """Service for interacting with Vertex AI Gemini"""
18
19    def __init__(self) -> None:
20        self.project_id = os.getenv("GCP_PROJECT_ID")
21        self.location = "us-central1"
22        self.model_name = "gemini-2.5-flash"
23
24        if not self.project_id:
25            raise ValueError("GCP_PROJECT_ID environment variable is required")
26
27        vertexai.init(project=self.project_id, location=self.location)
28
29        # Load memory from GCS
30        self.memory_content = self._load_memory()
31
32        # Build system instruction with memory
33        self.system_instruction = self._build_system_instruction()
34
35        # Initialize model with system instruction
36        self.model = GenerativeModel(
37            self.model_name,
38            system_instruction=self.system_instruction
39        )
40
41        self.chat_sessions: dict[str, ChatSession] = {}
42        logger.info("GeminiService initialized successfully")
43
44    def _load_memory(self) -> str:
45        """Load memory content from GCS."""
46        try:
47            memory_service = get_memory_service()
48            content = memory_service.load_all_memory()
49            logger.info(f"Loaded memory: {len(content)} characters")
50            return content
51        except Exception as e:
52            logger.warning(f"Failed to load memory: {e}")
53            return ""
54
55    def _load_system_prompt(self) -> str:
56        """Load base system prompt from file."""
57        try:
58            return SYSTEM_PROMPT_PATH.read_text(encoding="utf-8").strip()
59        except Exception as e:
60            logger.error(f"Failed to load system prompt: {e}")
61            raise
62
63    def _build_system_instruction(self) -> str:
64        """Build system instruction with memory context."""
65        base_instruction = self._load_system_prompt()
66
67        if self.memory_content:
68            return f"{base_instruction}\n\n{self.memory_content}"
69        else:
70            return base_instruction
71
72    def reload_memory(self) -> None:
73        """Reload memory from GCS and update model."""
74        self.memory_content = self._load_memory()
75        self.system_instruction = self._build_system_instruction()
76        self.model = GenerativeModel(
77            self.model_name,
78            system_instruction=self.system_instruction
79        )
80        self.chat_sessions.clear()
81        logger.info("Memory reloaded, chat sessions cleared")
82
83    # ... rest of the methods unchanged

How It Works

On startup, GeminiService loads all .md files from GCS
Memory content is appended to the system prompt
Gemini receives the combined instruction when generating responses
User's question is matched against the knowledge base

Configuration

Environment Variables

Add to .env:

1# GCS Memory Configuration
2GCS_MEMORY_BUCKET=your-bot-memory

Cloud Run Deployment

Update deploy.yml:

1--set-env-vars "FLASK_ENV=production,GCS_MEMORY_BUCKET=your-bot-memory"

Dockerfile

Don't forget to copy the prompts directory:

1COPY app/ ./app/
2COPY prompts/ ./prompts/  # Add this line!

I spent 30 minutes debugging "file not found" errors before realizing this.

Testing Locally

1# Authenticate
2gcloud auth application-default login
3
4# Set environment
5export GCS_MEMORY_BUCKET=your-bot-memory
6
7# Run app
8make dev
9
10# Test
11curl -X POST http://localhost:3000/test/chat \
12  -H "Content-Type: application/json" \
13  -d '{"message": "How do I make a latte?"}'

Expected response should include information from your markdown files.

Updating Knowledge Without Redeploying

The best part: to update the knowledge base, just:

1# Edit your markdown locally
2vim knowledge/coffee-machine.md
3
4# Upload to GCS
5gsutil cp knowledge/coffee-machine.md gs://your-bot-memory/
6
7# Optional: Force reload (or wait for new Cloud Run instance)
8# The cache clears when instances restart

No code changes, no deployment, no downtime.

File Structure

After adding GCS memory:

1slack-bot/
2├── app/
3│   ├── __init__.py
4│   ├── main.py
5│   ├── handlers/
6│   │   └── slack_events.py
7│   └── services/
8│       ├── gemini.py        # Updated with memory integration
9│       └── memory.py        # New! GCS memory service
10├── prompts/
11│   └── system_prompt.md     # New! Bot personality
12├── memory/                   # Local copies of knowledge files
13│   └── coffee-machine.md
14├── .env
15├── Dockerfile               # Updated to copy prompts/
16├── pyproject.toml
17└── uv.lock

Gotchas and Tips

1. Dockerfile Must Copy prompts/

1COPY prompts/ ./prompts/

Forgot this and spent time debugging "No such file" errors in Cloud Run.

2. Cache Behavior

Memory is cached for the Cloud Run instance lifetime. To force refresh:

Scale to zero and back up
Deploy a new revision
Call reload_memory() programmatically

3. Large Knowledge Bases

For small teams (< 50 files), loading everything into context works fine. For larger knowledge bases, consider:

Vertex AI Search for semantic retrieval
Vector embeddings for similarity search
Only loading relevant files based on query

4. Markdown Best Practices for LLMs

Structure matters:

1# Main Topic
2
3## Question Section
4
5### Q: Exact question users might ask?
6
7## Answer
8
9Clear, structured answer with:
10
11- Bullet points
12- Code blocks
13- Tables for structured data
14
15**Important notes** stand out with bold.

Gemini picks up on this structure when matching questions to answers.

Cost Impact

GCS costs for a small knowledge base:

Storage: < $0.01/month for a few markdown files
Operations: ~$0.004 per 10,000 reads

Essentially free for this use case.

Next Steps

Now that the bot has memory:

Add more knowledge files - FAQ, processes, troubleshooting guides
Implement memory refresh command - /refresh-knowledge slash command
Add memory search - For large knowledge bases, semantic search
Version knowledge - Use GCS versioning for rollback
Monitor usage - Track how often memory is used in responses

Wrapping Up

Adding GCS as Gemini's memory transformed my bot from a generic AI to a team-specific assistant. The key insight: separate what changes frequently (knowledge) from what changes rarely (code).

Key takeaways:

Markdown is the ideal format for LLM knowledge bases
GCS provides cheap, simple storage with easy updates
System prompts should be in version control, knowledge can be external
Don't forget to copy all required files in your Dockerfile

Now when someone asks "How do I make a latte?", the bot actually knows the answer - and can even tell you about Dave from accounting.

Adding GCS Memory to Gemini: Teaching Your Bot with Markdown Files

Series: Building a Slack AI Assistant with Flask + Gemini