Migrating from Gemini to Claude: Swapping LLMs on Vertex AI

Intro

After months of running my Slack bot on Gemini 2.5 Flash, I started noticing a problem: hallucinations.

The bot would confidently answer questions with information that sounded plausible but was completely wrong. It would cite knowledge documents that didn't exist, mix up details between different documents, or just make things up entirely. For a support bot that people rely on for accurate answers, this was a dealbreaker.

I wanted to try Claude to see if it would do better. Spoiler: it did.

Why Switch?

The specific issues with Gemini 2.5 Flash Lite:

False information - It would fabricate answers instead of saying "I don't know"
Hallucinated sources - It would reference documents that weren't in the knowledge base
Mixed-up details - It would combine information from different documents into wrong answers

For a general chatbot, some hallucination is tolerable. For an internal support bot where people trust the answers to make decisions, it's not.

I'd heard Claude was better at staying grounded in provided context and admitting when it doesn't know something. Worth a try.

The Good News: Claude is on Vertex AI

I was dreading the idea of setting up a completely separate API, managing new credentials, dealing with a different auth flow. But it turns out Anthropic's Claude models are available directly on Vertex AI through Model Garden.

This means:

Same GCP project, same billing
Same service account authentication
No new API keys to manage
Just a different SDK and model name

Enabling Claude on Vertex AI

Before you can use Claude, you need to enable it in Model Garden:

Go to Vertex AI > Model Garden in GCP Console
Search for "Claude Sonnet"
Click Enable - you'll need to fill out a form with your business info
Accept Anthropic's terms of service

The form asks about your business, intended use cases, and whether your use case falls under Anthropic's Acceptable Use Policy. For an internal Slack bot, it's straightforward - just describe what you're building.

Approval was quick in my case.

The Migration

1. Swap the SDK

Out with google-cloud-aiplatform, in with anthropic[vertex]:

1# pyproject.toml
2dependencies = [
3    "anthropic[vertex]>=0.52.0",  # was google-cloud-aiplatform>=1.121.0
4    # ... rest unchanged
5]

2. Update Config

1# app/utils/config.py
2
3# Before
4GEMINI_LOCATION = "us-central1"
5GEMINI_MODEL = "gemini-2.5-flash-lite"
6
7# After
8CLAUDE_LOCATION = "global"
9CLAUDE_MODEL = "claude-sonnet-4-6"

Two things to note:

Region is "global" - Claude on Vertex AI supports a global endpoint, unlike Gemini which required a specific region like us-central1
Model ID format - It's claude-sonnet-4-6, not the model@date format you might see in older docs

3. Rewrite the Service

This was the biggest change. The Gemini SDK and Anthropic SDK have fundamentally different APIs.

Gemini approach:

1import vertexai
2from vertexai.generative_models import GenerativeModel, ChatSession
3
4vertexai.init(project=project_id, location=location)
5model = GenerativeModel(model_name, system_instruction=system_prompt)
6chat = model.start_chat()
7response = chat.send_message("Hello")
8print(response.text)

Claude approach:

1from anthropic import AnthropicVertex
2
3client = AnthropicVertex(project_id=project_id, region=location)
4response = client.messages.create(
5    model=model_name,
6    max_tokens=4096,
7    system=system_prompt,
8    messages=[{"role": "user", "content": "Hello"}],
9)
10print(response.content[0].text)

Key differences:

	Gemini	Claude
Init	`vertexai.init()` + `GenerativeModel()`	`AnthropicVertex()`
System prompt	Baked into model instance	Passed per API call
Chat history	Managed by `ChatSession`	You manage it yourself (list of messages)
Response	`response.text`	`response.content[0].text`
Statefulness	Stateful chat sessions	Stateless - send full history each time

The biggest difference is chat history management. With Gemini, ChatSession tracks the conversation for you. With Claude, you maintain a list of {"role": "user/assistant", "content": "..."} messages and send the full list on every request.

Here's the core of the new service:

1from anthropic import AnthropicVertex
2from anthropic.types import MessageParam, TextBlock
3
4
5class ClaudeService:
6    def __init__(self) -> None:
7        self.client = AnthropicVertex(
8            project_id=self.project_id, region=self.location
9        )
10        self.chat_histories: dict[str, list[MessageParam]] = {}
11
12    def _send_message(self, messages: list[MessageParam]) -> str:
13        response = self.client.messages.create(
14            model=self.model_name,
15            max_tokens=4096,
16            system=self.system_instruction,
17            messages=messages,
18        )
19        block = response.content[0]
20        assert isinstance(block, TextBlock)
21        return block.text
22
23    def generate_response_sync(
24        self, user_id: str, message: str, use_context: bool = True
25    ) -> str:
26        if use_context:
27            history = self.get_chat_history(user_id)
28            history.append({"role": "user", "content": message})
29            reply = self._send_message(history)
30            history.append({"role": "assistant", "content": reply})
31        else:
32            reply = self._send_message(
33                [{"role": "user", "content": message}]
34            )
35        return markdown_to_slack(reply)

4. Update All Imports

Since I renamed the file from gemini.py to claude.py and the service from GeminiService to ClaudeService, I had to update imports everywhere:

1# Before
2from app.services.gemini import get_gemini_service
3gemini = get_gemini_service()
4response = gemini.generate_response_sync(user, text)
5
6# After
7from app.services.claude import get_claude_service
8claude = get_claude_service()
9response = claude.generate_response_sync(user, text)

Files touched: main.py, slack_events.py, workflow_handler.py, and all test files.

Type Safety Gotcha

This one caught me off guard. Locally, mypy passed fine. But in GitHub Actions, it failed with 23 errors:

1error: Argument "messages" to "create" of "Messages" has incompatible type
2  "list[dict[Any, Any]]"; expected "Iterable[MessageParam]"
3
4error: Item "ThinkingBlock" of "TextBlock | ThinkingBlock | RedactedThinkingBlock |
5  ToolUseBlock | ..." has no attribute "text"

Two issues:

1. Message typing - Claude's API expects MessageParam (a TypedDict), not plain dict:

1# Before
2self.chat_histories: dict[str, list[dict]] = {}
3
4# After
5from anthropic.types import MessageParam
6self.chat_histories: dict[str, list[MessageParam]] = {}

2. Response block type - response.content[0] could be a TextBlock, ThinkingBlock, ToolUseBlock, etc. You need to narrow the type:

1# Before - mypy doesn't like this
2return response.content[0].text
3
4# After - explicit type narrowing
5from anthropic.types import TextBlock
6
7block = response.content[0]
8assert isinstance(block, TextBlock)
9return block.text

The assert serves double duty: it satisfies mypy AND acts as a runtime safety check. If Claude ever returns a non-text block, you'll get a clear AssertionError instead of a confusing AttributeError.

What Stayed the Same

Everything outside the LLM service was untouched:

Cloud Run deployment
GCS memory/knowledge base
Slack handlers (just updated imports)
GitHub Actions CI/CD
System prompts (worked as-is with Claude)

The service interface (generate_response_sync, generate_document, reload_memory) stayed identical. The handlers don't care which LLM is behind the service - they just call the same methods and get strings back.

First Impressions

After deploying and testing:

The good:

Claude actually says "I don't know" when the answer isn't in the knowledge base, instead of making something up
Responses are more grounded in the provided context
It cites the right sources
Response quality feels noticeably better for our Japanese-language support use case

The different:

Response times are comparable (both go through Vertex AI)
The global endpoint means less regional latency headache
max_tokens is required per request (Gemini infers it)

The verdict: For a knowledge-base bot where accuracy matters more than speed, Claude has been a clear improvement. The hallucination issue that motivated the switch hasn't come back.

Wrapping Up

The migration took about an hour of actual coding. Most of the changes were mechanical - swap SDK, rewrite one service file, update imports. The hardest part was the mypy type issues, which only showed up in CI.

If you're considering the switch:

Claude is available on Vertex AI - no need to leave GCP
Enable it in Model Garden first (requires approval form)
The Anthropic Vertex SDK is well-documented and straightforward
Your system prompts will likely work as-is
Budget time for type annotation fixes if you use mypy

The fact that both models are available on the same platform made this much less scary than it could have been. If Claude doesn't work out, switching back would be just as easy.