Logo

Migrating from Gemini to Claude: Swapping LLMs on Vertex AI

7 min read
PythonGCPAIClaude

Table of Contents

Intro

After months of running my Slack bot on Gemini 2.5 Flash, I started noticing a problem: hallucinations.

The bot would confidently answer questions with information that sounded plausible but was completely wrong. It would cite knowledge documents that didn't exist, mix up details between different documents, or just make things up entirely. For a support bot that people rely on for accurate answers, this was a dealbreaker.

I wanted to try Claude to see if it would do better. Spoiler: it did.

Why Switch?

The specific issues with Gemini 2.5 Flash Lite:

  1. False information - It would fabricate answers instead of saying "I don't know"
  2. Hallucinated sources - It would reference documents that weren't in the knowledge base
  3. Mixed-up details - It would combine information from different documents into wrong answers

For a general chatbot, some hallucination is tolerable. For an internal support bot where people trust the answers to make decisions, it's not.

I'd heard Claude was better at staying grounded in provided context and admitting when it doesn't know something. Worth a try.

The Good News: Claude is on Vertex AI

I was dreading the idea of setting up a completely separate API, managing new credentials, dealing with a different auth flow. But it turns out Anthropic's Claude models are available directly on Vertex AI through Model Garden.

This means:

Enabling Claude on Vertex AI

Before you can use Claude, you need to enable it in Model Garden:

  1. Go to Vertex AI > Model Garden in GCP Console
  2. Search for "Claude Sonnet"
  3. Click Enable - you'll need to fill out a form with your business info
  4. Accept Anthropic's terms of service

The form asks about your business, intended use cases, and whether your use case falls under Anthropic's Acceptable Use Policy. For an internal Slack bot, it's straightforward - just describe what you're building.

Approval was quick in my case.

The Migration

1. Swap the SDK

Out with google-cloud-aiplatform, in with anthropic[vertex]:

1# pyproject.toml
2dependencies = [
3 "anthropic[vertex]>=0.52.0", # was google-cloud-aiplatform>=1.121.0
4 # ... rest unchanged
5]

2. Update Config

1# app/utils/config.py
2
3# Before
4GEMINI_LOCATION = "us-central1"
5GEMINI_MODEL = "gemini-2.5-flash-lite"
6
7# After
8CLAUDE_LOCATION = "global"
9CLAUDE_MODEL = "claude-sonnet-4-6"

Two things to note:

3. Rewrite the Service

This was the biggest change. The Gemini SDK and Anthropic SDK have fundamentally different APIs.

Gemini approach:

1import vertexai
2from vertexai.generative_models import GenerativeModel, ChatSession
3
4vertexai.init(project=project_id, location=location)
5model = GenerativeModel(model_name, system_instruction=system_prompt)
6chat = model.start_chat()
7response = chat.send_message("Hello")
8print(response.text)

Claude approach:

1from anthropic import AnthropicVertex
2
3client = AnthropicVertex(project_id=project_id, region=location)
4response = client.messages.create(
5 model=model_name,
6 max_tokens=4096,
7 system=system_prompt,
8 messages=[{"role": "user", "content": "Hello"}],
9)
10print(response.content[0].text)

Key differences:

GeminiClaude
Initvertexai.init() + GenerativeModel()AnthropicVertex()
System promptBaked into model instancePassed per API call
Chat historyManaged by ChatSessionYou manage it yourself (list of messages)
Responseresponse.textresponse.content[0].text
StatefulnessStateful chat sessionsStateless - send full history each time

The biggest difference is chat history management. With Gemini, ChatSession tracks the conversation for you. With Claude, you maintain a list of {"role": "user/assistant", "content": "..."} messages and send the full list on every request.

Here's the core of the new service:

1from anthropic import AnthropicVertex
2from anthropic.types import MessageParam, TextBlock
3
4
5class ClaudeService:
6 def __init__(self) -> None:
7 self.client = AnthropicVertex(
8 project_id=self.project_id, region=self.location
9 )
10 self.chat_histories: dict[str, list[MessageParam]] = {}
11
12 def _send_message(self, messages: list[MessageParam]) -> str:
13 response = self.client.messages.create(
14 model=self.model_name,
15 max_tokens=4096,
16 system=self.system_instruction,
17 messages=messages,
18 )
19 block = response.content[0]
20 assert isinstance(block, TextBlock)
21 return block.text
22
23 def generate_response_sync(
24 self, user_id: str, message: str, use_context: bool = True
25 ) -> str:
26 if use_context:
27 history = self.get_chat_history(user_id)
28 history.append({"role": "user", "content": message})
29 reply = self._send_message(history)
30 history.append({"role": "assistant", "content": reply})
31 else:
32 reply = self._send_message(
33 [{"role": "user", "content": message}]
34 )
35 return markdown_to_slack(reply)

4. Update All Imports

Since I renamed the file from gemini.py to claude.py and the service from GeminiService to ClaudeService, I had to update imports everywhere:

1# Before
2from app.services.gemini import get_gemini_service
3gemini = get_gemini_service()
4response = gemini.generate_response_sync(user, text)
5
6# After
7from app.services.claude import get_claude_service
8claude = get_claude_service()
9response = claude.generate_response_sync(user, text)

Files touched: main.py, slack_events.py, workflow_handler.py, and all test files.

Type Safety Gotcha

This one caught me off guard. Locally, mypy passed fine. But in GitHub Actions, it failed with 23 errors:

1error: Argument "messages" to "create" of "Messages" has incompatible type
2 "list[dict[Any, Any]]"; expected "Iterable[MessageParam]"
3
4error: Item "ThinkingBlock" of "TextBlock | ThinkingBlock | RedactedThinkingBlock |
5 ToolUseBlock | ..." has no attribute "text"

Two issues:

1. Message typing - Claude's API expects MessageParam (a TypedDict), not plain dict:

1# Before
2self.chat_histories: dict[str, list[dict]] = {}
3
4# After
5from anthropic.types import MessageParam
6self.chat_histories: dict[str, list[MessageParam]] = {}

2. Response block type - response.content[0] could be a TextBlock, ThinkingBlock, ToolUseBlock, etc. You need to narrow the type:

1# Before - mypy doesn't like this
2return response.content[0].text
3
4# After - explicit type narrowing
5from anthropic.types import TextBlock
6
7block = response.content[0]
8assert isinstance(block, TextBlock)
9return block.text

The assert serves double duty: it satisfies mypy AND acts as a runtime safety check. If Claude ever returns a non-text block, you'll get a clear AssertionError instead of a confusing AttributeError.

What Stayed the Same

Everything outside the LLM service was untouched:

The service interface (generate_response_sync, generate_document, reload_memory) stayed identical. The handlers don't care which LLM is behind the service - they just call the same methods and get strings back.

First Impressions

After deploying and testing:

The good:

The different:

The verdict: For a knowledge-base bot where accuracy matters more than speed, Claude has been a clear improvement. The hallucination issue that motivated the switch hasn't come back.

Wrapping Up

The migration took about an hour of actual coding. Most of the changes were mechanical - swap SDK, rewrite one service file, update imports. The hardest part was the mypy type issues, which only showed up in CI.

If you're considering the switch:

  1. Claude is available on Vertex AI - no need to leave GCP
  2. Enable it in Model Garden first (requires approval form)
  3. The Anthropic Vertex SDK is well-documented and straightforward
  4. Your system prompts will likely work as-is
  5. Budget time for type annotation fixes if you use mypy

The fact that both models are available on the same platform made this much less scary than it could have been. If Claude doesn't work out, switching back would be just as easy.

Project Navigation

  1. 1.Building My First Flask App: A Next.js Developer‘s Perspective
  2. 2.From TypeScript to Python: Setting Up a Modern Development Environment
  3. 3.Deploying Python to GCP Cloud Run: A Guide for AWS Developers
  4. 4.Integrating Vertex AI Gemini into Flask: Building an AI-Powered Slack Bot
  5. 5.Adding GCS Memory to Gemini: Teaching Your Bot with Markdown Files
  6. 6.Slack Bot Troubleshooting: Duplicate Messages, Cold Starts, and Gemini Latency
  7. 7.Setting Up Analytics with BigQuery and Looker Studio
  8. 8.Auto-Refreshing GCS Memory with Pub/Sub: Fixing the Stale Cache Problem
  9. 9.Adding Thumbs Up/Down Feedback Buttons to Slack Bot Responses
  10. 10.Adding Knowledge via Slack Workflow: Automating Documentation with Gemini
  11. 11.Migrating from Gemini to Claude: Swapping LLMs on Vertex AI