Logo

Adding Knowledge via Slack Workflow: Automating Documentation with Gemini

12 min read
PythonSlackGCPGitHubVertex AI

Table of Contents

Intro

After setting up the knowledge base in GCS, I had a bot that could answer questions from our documentation. But adding new knowledge required:

  1. Writing markdown manually
  2. Following our document structure (frontmatter, sections, etc.)
  3. Uploading to the GCS bucket
  4. Getting someone to review it

This friction meant knowledge rarely got added. People would answer questions in Slack threads, but that knowledge never made it back into the docs.

The goal: Let users paste raw notes into a Slack form, have AI structure it into proper documentation, and automatically create a PR for review.

What We're Building

1┌─────────────────────────────────────────────────────────────────────┐
2│ Slack Workflow Form │
3│ ┌────────────────────────────────────────────────────────────────┐ │
4│ │ Requester: @john.doe │ │
5│ │ Knowledge to add: [raw notes here] │ │
6│ │ Reference URL: https://... │ │
7│ │ Contact: #support-channel │ │
8│ └────────────────────────────────────────────────────────────────┘ │
9└───────────────────────────────┬─────────────────────────────────────┘
10
11
12┌─────────────────────────────────────────────────────────────────────┐
13│ Bot receives message in workflow channel │
14│ └─▶ Posts "Received! Formatting your knowledge..." │
15└───────────────────────────────┬─────────────────────────────────────┘
16
17
18┌─────────────────────────────────────────────────────────────────────┐
19│ Gemini + DCP (Data Control Protocol) │
20│ └─▶ Structures raw data into proper markdown with frontmatter │
21│ └─▶ Assigns doc_id, target_path automatically │
22└───────────────────────────────┬─────────────────────────────────────┘
23
24
25┌─────────────────────────────────────────────────────────────────────┐
26│ GitHub Service │
27│ └─▶ Creates branch: knowledge/{doc_id} │
28│ └─▶ Commits structured document │
29│ └─▶ Opens PR for review │
30└───────────────────────────────┬─────────────────────────────────────┘
31
32
33┌─────────────────────────────────────────────────────────────────────┐
34│ Bot posts success message │
35│ "Done! @ops-team Please review this PR" │
36│ └─▶ Links to PR │
37└─────────────────────────────────────────────────────────────────────┘

The DCP Prompt

The magic happens in the DCP (Data Control Protocol) - a system prompt stored in GCS that tells Gemini exactly how to structure documents. It handles:

I won't include the full prompt here, but the key is that it outputs complete markdown with frontmatter:

1---
2doc_id: manual-42
3target_path: docs/manuals/manual-42_coffee-machine-guide.md
4title: Coffee Machine Guide
5description: How to use the office coffee machine
6created: 2026-03-23
7creator: "@john.doe"
8source_url: https://slack.com/archives/C123/p456
9contact_info: "#faq-support @ops-team"
10tags:
11 - office
12 - equipment
13---
14
15# Coffee Machine Guide
16
17## Overview
18...

Implementation

1. Configuration

First, I added the new environment variables:

1# app/utils/config.py
2
3WORKFLOW_CHANNEL_ID = os.getenv("WORKFLOW_CHANNEL_ID", "")
4WORKFLOW_ID = os.getenv("WORKFLOW_ID", "")
5DEFAULT_CONTACT_INFO = os.getenv("DEFAULT_CONTACT_INFO", "#support @team")
6REVIEW_TEAM_ID = os.getenv("REVIEW_TEAM_ID", "")
7REVIEW_TEAM_MENTION = f"<!subteam^{REVIEW_TEAM_ID}>" if REVIEW_TEAM_ID else "@team"
8SLACK_WORKSPACE_DOMAIN = os.getenv("SLACK_WORKSPACE_DOMAIN", "")
9DCP_PROMPT_PATH = "prompts/dcp_prompt.md"
10
11WF_FIELDS = {
12 "creator": "Requester",
13 "raw_data": "Knowledge to add",
14 "source_url": "Reference URL",
15 "contact_info": "Contact",
16}
17
18WORKFLOW_PROCESSING_MESSAGE = "Received! Formatting your knowledge, please wait..."
19WORKFLOW_SUCCESS_MESSAGE = """Done!
20<@{user_id}> submitted new knowledge for review.
21
22📄 *{title}*
23🔗 {pr_url}
24
25{review_team} Please review this PR!"""
26WORKFLOW_ERROR_MESSAGE = """Error occurred while formatting!
27<@{user_id}>, please check the following and try again.
28
29Reason: {error_details}"""

A few notes:

2. Detecting Workflow Messages

I updated the message handler to detect workflow submissions:

1# app/handlers/slack_events.py
2
3@app.event("message")
4def handle_message(event, say, client, logger):
5 bot_id = event.get("bot_id")
6 channel = event.get("channel")
7
8 # Detect workflow submission
9 if bot_id and channel == WORKFLOW_CHANNEL_ID and WORKFLOW_CHANNEL_ID:
10 metadata = event.get("metadata", {})
11 event_payload = metadata.get("event_payload", {})
12 workflow_id = event_payload.get("workflow_id")
13
14 logger.info(
15 f"Workflow channel message: workflow_id={workflow_id}, "
16 f"expected={WORKFLOW_ID}"
17 )
18
19 # Only process messages from our specific workflow
20 if WORKFLOW_ID and workflow_id and workflow_id != WORKFLOW_ID:
21 logger.info(f"Ignoring message from different workflow {workflow_id}")
22 return
23
24 event_ts = event.get("event_ts") or event.get("ts")
25 if _is_duplicate(f"workflow:{event_ts}"):
26 return
27
28 handle_workflow_submission(event, say, client, logger)
29 return
30
31 # ... existing DM handling

Key points:

3. Parsing the Workflow Form

The workflow posts a message with labeled fields. I parse it into a dataclass:

1# app/handlers/workflow_handler.py
2
3@dataclass
4class WorkflowFormData:
5 raw_data: str
6 contact_info: str
7 creator: str
8 source_url: Optional[str] = None
9
10
11def parse_workflow_message(text: str) -> Optional[WorkflowFormData]:
12 if not text:
13 return None
14
15 raw_data = None
16 source_url = None
17 contact_info = None
18 creator = None
19
20 lines = text.strip().split("\n")
21 current_field = None
22 raw_data_lines = []
23 in_raw_data = False
24
25 for line in lines:
26 if WF_FIELDS["creator"] in line:
27 in_raw_data = False
28 current_field = "creator"
29 elif WF_FIELDS["raw_data"] in line:
30 in_raw_data = True
31 current_field = "raw_data"
32 elif WF_FIELDS["source_url"] in line:
33 in_raw_data = False
34 current_field = "source_url"
35 elif WF_FIELDS["contact_info"] in line:
36 in_raw_data = False
37 current_field = "contact_info"
38 elif current_field == "creator" and not creator:
39 if line.strip():
40 creator = line.strip()
41 elif in_raw_data and current_field == "raw_data":
42 raw_data_lines.append(line)
43 elif current_field == "source_url" and not source_url:
44 url_match = re.search(r"https?://[^\s<>]+", line)
45 if url_match:
46 source_url = url_match.group(0)
47 elif current_field == "contact_info" and not contact_info:
48 if line.strip():
49 contact_info = line.strip()
50
51 # ... handle multiline raw_data and fallbacks

The tricky part is raw_data - it can span multiple lines, so I collect lines until hitting the next field label.

4. Resolving User Mentions

Slack converts @john.doe typed in the form to <@U12345ABCDE> before the bot receives it. I wanted the actual display name in the documentation, so I added a resolver:

1def resolve_user_mention(text: str, client) -> str:
2 match = re.search(r"<@([A-Z0-9]+)>", text)
3 if not match:
4 return text
5
6 user_id = match.group(1)
7 try:
8 result = client.users_info(user=user_id)
9 if result.get("ok"):
10 profile = result.get("user", {}).get("profile", {})
11 display_name = (
12 profile.get("display_name") or profile.get("real_name") or user_id
13 )
14 return f"@{display_name}"
15 except Exception:
16 pass
17 return text

Now <@U12345ABCDE> becomes @john.doe in the generated docs.

5. Building Slack Message URLs

If the user doesn't provide a source URL, I fall back to linking the Slack message itself:

1def build_slack_message_url(channel: str, ts: str) -> str:
2 ts_without_dot = ts.replace(".", "")
3 return f"https://{SLACK_WORKSPACE_DOMAIN}/archives/{channel}/p{ts_without_dot}"

This creates URLs like https://workspace.slack.com/archives/C123/p1234567890123456.

6. Generating Documents with Gemini

I added a new method to the Gemini service:

1# app/services/gemini.py
2
3def generate_document(
4 self,
5 raw_data: str,
6 source_url: str,
7 contact_info: str,
8 creator: str,
9) -> str:
10 dcp_prompt = self._load_gcs_prompt(DCP_PROMPT_PATH)
11 today = date.today().isoformat()
12
13 prompt = f"""{dcp_prompt}
14
15[INPUT]
16today: {today}
17creator: {creator}
18data: {raw_data}
19source_url: {source_url}
20contact_info: {contact_info}
21"""
22
23 # Use standalone model without chat history
24 standalone_model = GenerativeModel(self.model_name)
25 response = standalone_model.generate_content(prompt)
26
27 result = response.text.strip()
28 logger.info("Generated document via DCP")
29 return result

Using a standalone model (not a chat session) ensures consistent results - no conversation history affecting the output.

7. Creating GitHub PRs

The GitHub service handles branch creation, commits, and PR opening:

1# app/services/github.py
2
3class GitHubService:
4 def __init__(self) -> None:
5 self.token = os.getenv("GITHUB_TOKEN")
6 self.repo_name = os.getenv("GITHUB_MEMORY_REPO")
7
8 if not self.token:
9 raise ValueError("GITHUB_TOKEN environment variable is required")
10 if not self.repo_name:
11 raise ValueError("GITHUB_MEMORY_REPO environment variable is required")
12
13 self.github = Github(self.token)
14 self.repo: Repository = self.github.get_repo(self.repo_name)
15
16 def create_document_pr(
17 self,
18 doc_id: str,
19 file_path: str,
20 content: str,
21 title: str,
22 source_url: str,
23 creator: str,
24 ) -> str:
25 branch_name = f"knowledge/{doc_id}"
26
27 # Get default branch SHA
28 default_branch = self.repo.default_branch
29 default_ref = self.repo.get_git_ref(f"heads/{default_branch}")
30 sha = default_ref.object.sha
31
32 # Create new branch
33 self.repo.create_git_ref(f"refs/heads/{branch_name}", sha)
34
35 # Create/update file
36 self.repo.create_file(
37 path=file_path,
38 message=f"Add {title}",
39 content=content,
40 branch=branch_name,
41 )
42
43 # Create PR
44 pr = self.repo.create_pull(
45 title=f"[Knowledge] {title}",
46 body=f"Source: {source_url}\nRequested by: {creator}",
47 head=branch_name,
48 base=default_branch,
49 )
50
51 return pr.html_url

8. Extracting Frontmatter

After Gemini generates the document, I extract the frontmatter to get doc_id, target_path, and title:

1def extract_frontmatter(content: str) -> dict:
2 result = {
3 "doc_id": None,
4 "target_path": None,
5 "title": "Untitled",
6 }
7
8 doc_id_match = re.search(
9 r'^doc_id:\s*["\']?(.+?)["\']?\s*$', content, re.MULTILINE
10 )
11 if doc_id_match:
12 result["doc_id"] = doc_id_match.group(1).strip()
13
14 target_path_match = re.search(
15 r'^target_path:\s*["\']?(.+?)["\']?\s*$', content, re.MULTILINE
16 )
17 if target_path_match:
18 result["target_path"] = target_path_match.group(1).strip()
19
20 title_match = re.search(
21 r'^title:\s*["\']?(.+?)["\']?\s*$', content, re.MULTILINE
22 )
23 if title_match:
24 result["title"] = title_match.group(1).strip()
25
26 return result

9. Handling DCP Errors

If the user's input is missing required fields, DCP returns a JSON error:

1{"status": "error", "missing": ["title", "description"]}

I detect and handle this:

1def is_dcp_error(content: str) -> Tuple[bool, Optional[list]]:
2 content_stripped = content.strip()
3
4 # Strip markdown code block wrapper if present
5 if content_stripped.startswith("```json"):
6 content_stripped = content_stripped[7:]
7 if content_stripped.endswith("```"):
8 content_stripped = content_stripped[:-3]
9 content_stripped = content_stripped.strip()
10
11 if content_stripped.startswith("{") and "status" in content_stripped:
12 try:
13 data = json.loads(content_stripped)
14 if data.get("status") == "error":
15 return True, data.get("missing", [])
16 except json.JSONDecodeError:
17 pass
18
19 return False, None

10. The Main Handler

Putting it all together:

1def handle_workflow_submission(event: dict, say, client, logger) -> None:
2 text = event.get("text", "")
3 channel = event.get("channel", "")
4 thread_ts = event.get("ts", "")
5
6 # Extract user_id for mentions
7 user_id = None
8 if "metadata" in event:
9 metadata = event.get("metadata", {})
10 if "event_payload" in metadata:
11 user_id = metadata["event_payload"].get("user_id")
12
13 if not user_id:
14 user_match = re.search(r"<@([A-Z0-9]+)>", text)
15 if user_match:
16 user_id = user_match.group(1)
17
18 if not user_id:
19 user_id = "unknown"
20
21 # Post processing message
22 say(text=WORKFLOW_PROCESSING_MESSAGE, thread_ts=thread_ts)
23
24 # Parse form data
25 form_data = parse_workflow_message(text)
26 if not form_data:
27 say(text=WORKFLOW_ERROR_MESSAGE.format(...), thread_ts=thread_ts)
28 return
29
30 # Resolve user mention to display name
31 creator = resolve_user_mention(form_data.creator, client)
32
33 # Fall back to Slack message URL if no source URL
34 source_url = form_data.source_url
35 if not source_url:
36 source_url = build_slack_message_url(channel, thread_ts)
37
38 try:
39 # Generate structured document
40 gemini = get_gemini_service()
41 document_content = gemini.generate_document(
42 raw_data=form_data.raw_data,
43 source_url=source_url,
44 contact_info=form_data.contact_info,
45 creator=creator,
46 )
47
48 # Check for DCP error
49 is_error, missing_fields = is_dcp_error(document_content)
50 if is_error:
51 say(text=WORKFLOW_ERROR_MESSAGE.format(...), thread_ts=thread_ts)
52 return
53
54 # Extract frontmatter
55 frontmatter = extract_frontmatter(document_content)
56 doc_id = frontmatter["doc_id"]
57 target_path = frontmatter["target_path"]
58 title = frontmatter["title"]
59
60 if not doc_id or not target_path:
61 say(text=WORKFLOW_ERROR_MESSAGE.format(...), thread_ts=thread_ts)
62 return
63
64 # Create PR
65 github = get_github_service()
66 pr_url = github.create_document_pr(
67 doc_id=doc_id,
68 file_path=target_path,
69 content=document_content,
70 title=title,
71 source_url=source_url,
72 creator=creator,
73 )
74
75 # Post success message
76 success_msg = WORKFLOW_SUCCESS_MESSAGE.format(
77 user_id=user_id,
78 title=title,
79 pr_url=pr_url,
80 review_team=REVIEW_TEAM_MENTION,
81 )
82 say(text=success_msg, thread_ts=thread_ts)
83
84 except Exception as e:
85 say(text=WORKFLOW_ERROR_MESSAGE.format(...), thread_ts=thread_ts)

Testing

I wrote comprehensive tests for each component:

1# tests/test_workflow_handler.py
2
3class TestResolveUserMention:
4 def test_resolves_user_mention_to_display_name(self):
5 mock_client = MagicMock()
6 mock_client.users_info.return_value = {
7 "ok": True,
8 "user": {
9 "profile": {"display_name": "john.doe", "real_name": "John Doe"}
10 },
11 }
12
13 result = resolve_user_mention("<@U12345678>", mock_client)
14
15 assert result == "@john.doe"
16 mock_client.users_info.assert_called_once_with(user="U12345678")
17
18 def test_falls_back_to_real_name_when_no_display_name(self):
19 mock_client = MagicMock()
20 mock_client.users_info.return_value = {
21 "ok": True,
22 "user": {"profile": {"display_name": "", "real_name": "John Doe"}},
23 }
24
25 result = resolve_user_mention("<@U12345678>", mock_client)
26
27 assert result == "@John Doe"
28
29
30class TestParseWorkflowMessage:
31 def test_parses_full_workflow_message(self):
32 text = """Requester
33@john.doe
34Knowledge to add
35This is the raw knowledge content.
36Reference URL
37https://example.com/doc
38Contact
39#support-channel"""
40
41 result = parse_workflow_message(text)
42
43 assert result is not None
44 assert result.creator == "@john.doe"
45 assert "raw knowledge content" in result.raw_data
46 assert result.source_url == "https://example.com/doc"
47 assert result.contact_info == "#support-channel"
48
49
50class TestIsDcpError:
51 def test_detects_error_json(self):
52 content = '{"status": "error", "missing": ["title"]}'
53
54 is_error, missing = is_dcp_error(content)
55
56 assert is_error is True
57 assert missing == ["title"]
58
59 def test_returns_false_for_valid_content(self):
60 content = """---
61doc_id: manual-1
62title: Test Document
63---
64# Content here"""
65
66 is_error, missing = is_dcp_error(content)
67
68 assert is_error is False
69 assert missing is None

For the Slack event tests, I captured the registered handlers and tested them in isolation:

1# tests/test_slack_events.py
2
3class TestWorkflowMessageFiltering:
4 def test_processes_message_from_matching_workflow_id(
5 self, mock_say, mock_client, mock_logger
6 ):
7 event = {
8 "bot_id": "B123",
9 "channel": "C_WORKFLOW",
10 "ts": "123.456",
11 "text": "Requester\n@user\nKnowledge to add\ntest",
12 "metadata": {"event_payload": {"workflow_id": "Wf_CORRECT"}},
13 }
14
15 with (
16 patch("app.handlers.slack_events.WORKFLOW_CHANNEL_ID", "C_WORKFLOW"),
17 patch("app.handlers.slack_events.WORKFLOW_ID", "Wf_CORRECT"),
18 patch("app.handlers.slack_events.handle_workflow_submission") as mock_handler,
19 patch("app.handlers.slack_events._is_duplicate", return_value=False),
20 ):
21 # ... register and call handler
22
23 mock_handler.assert_called_once()
24
25 def test_ignores_message_from_different_workflow_id(
26 self, mock_say, mock_client, mock_logger
27 ):
28 event = {
29 "bot_id": "B123",
30 "channel": "C_WORKFLOW",
31 "ts": "123.456",
32 "text": "Some message",
33 "metadata": {"event_payload": {"workflow_id": "Wf_DIFFERENT"}},
34 }
35
36 with (
37 patch("app.handlers.slack_events.WORKFLOW_CHANNEL_ID", "C_WORKFLOW"),
38 patch("app.handlers.slack_events.WORKFLOW_ID", "Wf_CORRECT"),
39 patch("app.handlers.slack_events.handle_workflow_submission") as mock_handler,
40 ):
41 # ... register and call handler
42
43 mock_handler.assert_not_called()

Deployment

I added the new dependencies and secrets:

1# pyproject.toml
2[project]
3dependencies = [
4 # ... existing deps
5 "PyGithub>=2.1.1", # New!
6]

And updated the Cloud Run deployment:

1# .github/workflows/deploy.yml
2
3env:
4 WORKFLOW_CHANNEL_ID: C0123456789
5 WORKFLOW_ID: Wf0123456789
6 REVIEW_TEAM_ID: S0123456789
7 SLACK_WORKSPACE_DOMAIN: your-workspace.slack.com
8 GITHUB_MEMORY_REPO: your-org/your-bot-memory
9
10# In the deploy step
11--set-env-vars "\
12 WORKFLOW_CHANNEL_ID=${{ env.WORKFLOW_CHANNEL_ID }},\
13 WORKFLOW_ID=${{ env.WORKFLOW_ID }},\
14 REVIEW_TEAM_ID=${{ env.REVIEW_TEAM_ID }},\
15 SLACK_WORKSPACE_DOMAIN=${{ env.SLACK_WORKSPACE_DOMAIN }},\
16 GITHUB_MEMORY_REPO=${{ env.GITHUB_MEMORY_REPO }}\
17" \
18--set-secrets "\
19 GITHUB_TOKEN=GITHUB_TOKEN:latest\
20"

Gotchas

1. Workflow ID Filtering

The workflow channel might have multiple workflows posting to it. Without filtering by workflow_id, the bot would try to process every workflow message. The workflow ID is in event.metadata.event_payload.workflow_id.

2. Slack Converts Mentions

When a user types @john.doe in a Slack form, it arrives as <@U12345ABCDE>. I had to add the resolve_user_mention function to convert it back to a readable name using the Slack API.

Important: This requires the users:read OAuth scope. Without it, you'll get a missing_scope error:

1{'ok': False, 'error': 'missing_scope', 'needed': 'users:read', ...}

Add the scope in your Slack app settings under OAuth & PermissionsBot Token Scopes, then reinstall the app to your workspace.

3. Source URL Fallback

If the user leaves the source URL empty or enters something that's not a URL, the bot falls back to linking the Slack message itself. This ensures we always have a reference.

4. DCP Prompt in GCS

I store the DCP prompt in GCS rather than the codebase. This lets me iterate on the prompt without redeploying the bot.

5. Thread Replies

All bot messages use thread_ts=thread_ts to reply in the same thread as the workflow form submission. This keeps the channel clean and groups related messages together.

6. Branch Name Collisions

If Gemini generates the same doc_id twice (e.g., for similar content), the bot will try to create a branch that already exists:

1Reference already exists: 422 {"message": "Reference already exists", ...}

I fixed this by adding a Unix timestamp suffix to branch names:

1timestamp = int(time.time())
2branch_name = f"knowledge/{doc_id}-{timestamp}"

Now branches like knowledge/ref-20260323-abc-1774228173 are always unique.

Wrapping Up

The complete flow:

  1. User fills out Slack Workflow form with raw notes
  2. Bot detects the message, posts "Processing..."
  3. Gemini + DCP structures the content into proper documentation
  4. Bot creates a GitHub PR with the formatted doc
  5. Bot posts success message with PR link and @mentions the review team

What used to require manual markdown writing, file uploading, and PR creation now happens in seconds with a Slack form. The knowledge actually gets documented.

Files changed:

Project Navigation

  1. 1.Building My First Flask App: A Next.js Developer‘s Perspective
  2. 2.From TypeScript to Python: Setting Up a Modern Development Environment
  3. 3.Deploying Python to GCP Cloud Run: A Guide for AWS Developers
  4. 4.Integrating Vertex AI Gemini into Flask: Building an AI-Powered Slack Bot
  5. 5.Adding GCS Memory to Gemini: Teaching Your Bot with Markdown Files
  6. 6.Slack Bot Troubleshooting: Duplicate Messages, Cold Starts, and Gemini Latency
  7. 7.Setting Up Analytics with BigQuery and Looker Studio
  8. 8.Auto-Refreshing GCS Memory with Pub/Sub: Fixing the Stale Cache Problem
  9. 9.Adding Thumbs Up/Down Feedback Buttons to Slack Bot Responses
  10. 10.Adding Knowledge via Slack Workflow: Automating Documentation with Gemini