Intro
After setting up the knowledge base in GCS, I had a bot that could answer questions from our documentation. But adding new knowledge required:
- Writing markdown manually
- Following our document structure (frontmatter, sections, etc.)
- Uploading to the GCS bucket
- Getting someone to review it
This friction meant knowledge rarely got added. People would answer questions in Slack threads, but that knowledge never made it back into the docs.
The goal: Let users paste raw notes into a Slack form, have AI structure it into proper documentation, and automatically create a PR for review.
What We're Building
1┌─────────────────────────────────────────────────────────────────────┐2│ Slack Workflow Form │3│ ┌────────────────────────────────────────────────────────────────┐ │4│ │ Requester: @john.doe │ │5│ │ Knowledge to add: [raw notes here] │ │6│ │ Reference URL: https://... │ │7│ │ Contact: #support-channel │ │8│ └────────────────────────────────────────────────────────────────┘ │9└───────────────────────────────┬─────────────────────────────────────┘10 │11 ▼12┌─────────────────────────────────────────────────────────────────────┐13│ Bot receives message in workflow channel │14│ └─▶ Posts "Received! Formatting your knowledge..." │15└───────────────────────────────┬─────────────────────────────────────┘16 │17 ▼18┌─────────────────────────────────────────────────────────────────────┐19│ Gemini + DCP (Data Control Protocol) │20│ └─▶ Structures raw data into proper markdown with frontmatter │21│ └─▶ Assigns doc_id, target_path automatically │22└───────────────────────────────┬─────────────────────────────────────┘23 │24 ▼25┌─────────────────────────────────────────────────────────────────────┐26│ GitHub Service │27│ └─▶ Creates branch: knowledge/{doc_id} │28│ └─▶ Commits structured document │29│ └─▶ Opens PR for review │30└───────────────────────────────┬─────────────────────────────────────┘31 │32 ▼33┌─────────────────────────────────────────────────────────────────────┐34│ Bot posts success message │35│ "Done! @ops-team Please review this PR" │36│ └─▶ Links to PR │37└─────────────────────────────────────────────────────────────────────┘
The DCP Prompt
The magic happens in the DCP (Data Control Protocol) - a system prompt stored in GCS that tells Gemini exactly how to structure documents. It handles:
- Inferring document type (manual, workflow, FAQ, etc.)
- Generating unique doc_id by scanning the repository
- Creating proper YAML frontmatter
- Structuring content with headers and sections
- Returning errors if required fields are missing
I won't include the full prompt here, but the key is that it outputs complete markdown with frontmatter:
1---2doc_id: manual-423target_path: docs/manuals/manual-42_coffee-machine-guide.md4title: Coffee Machine Guide5description: How to use the office coffee machine6created: 2026-03-237creator: "@john.doe"8source_url: https://slack.com/archives/C123/p4569contact_info: "#faq-support @ops-team"10tags:11 - office12 - equipment13---1415# Coffee Machine Guide1617## Overview18...
Implementation
1. Configuration
First, I added the new environment variables:
1# app/utils/config.py23WORKFLOW_CHANNEL_ID = os.getenv("WORKFLOW_CHANNEL_ID", "")4WORKFLOW_ID = os.getenv("WORKFLOW_ID", "")5DEFAULT_CONTACT_INFO = os.getenv("DEFAULT_CONTACT_INFO", "#support @team")6REVIEW_TEAM_ID = os.getenv("REVIEW_TEAM_ID", "")7REVIEW_TEAM_MENTION = f"<!subteam^{REVIEW_TEAM_ID}>" if REVIEW_TEAM_ID else "@team"8SLACK_WORKSPACE_DOMAIN = os.getenv("SLACK_WORKSPACE_DOMAIN", "")9DCP_PROMPT_PATH = "prompts/dcp_prompt.md"1011WF_FIELDS = {12 "creator": "Requester",13 "raw_data": "Knowledge to add",14 "source_url": "Reference URL",15 "contact_info": "Contact",16}1718WORKFLOW_PROCESSING_MESSAGE = "Received! Formatting your knowledge, please wait..."19WORKFLOW_SUCCESS_MESSAGE = """Done!20<@{user_id}> submitted new knowledge for review.2122📄 *{title}*23🔗 {pr_url}2425{review_team} Please review this PR!"""26WORKFLOW_ERROR_MESSAGE = """Error occurred while formatting!27<@{user_id}>, please check the following and try again.2829Reason: {error_details}"""
A few notes:
WORKFLOW_IDfilters messages to only process from our specific workflowWF_FIELDSmaps field names to Japanese labels used in the workflow formREVIEW_TEAM_MENTIONuses Slack's subteam mention syntax<!subteam^ID>
2. Detecting Workflow Messages
I updated the message handler to detect workflow submissions:
1# app/handlers/slack_events.py23@app.event("message")4def handle_message(event, say, client, logger):5 bot_id = event.get("bot_id")6 channel = event.get("channel")78 # Detect workflow submission9 if bot_id and channel == WORKFLOW_CHANNEL_ID and WORKFLOW_CHANNEL_ID:10 metadata = event.get("metadata", {})11 event_payload = metadata.get("event_payload", {})12 workflow_id = event_payload.get("workflow_id")1314 logger.info(15 f"Workflow channel message: workflow_id={workflow_id}, "16 f"expected={WORKFLOW_ID}"17 )1819 # Only process messages from our specific workflow20 if WORKFLOW_ID and workflow_id and workflow_id != WORKFLOW_ID:21 logger.info(f"Ignoring message from different workflow {workflow_id}")22 return2324 event_ts = event.get("event_ts") or event.get("ts")25 if _is_duplicate(f"workflow:{event_ts}"):26 return2728 handle_workflow_submission(event, say, client, logger)29 return3031 # ... existing DM handling
Key points:
- Bot messages only - Workflow posts come from a bot, not a user
- Channel filter - Only process messages in the designated workflow channel
- Workflow ID filter - The channel might have multiple workflows posting; we only want ours
- Duplicate check - Slack sometimes sends duplicate events
3. Parsing the Workflow Form
The workflow posts a message with labeled fields. I parse it into a dataclass:
1# app/handlers/workflow_handler.py23@dataclass4class WorkflowFormData:5 raw_data: str6 contact_info: str7 creator: str8 source_url: Optional[str] = None91011def parse_workflow_message(text: str) -> Optional[WorkflowFormData]:12 if not text:13 return None1415 raw_data = None16 source_url = None17 contact_info = None18 creator = None1920 lines = text.strip().split("\n")21 current_field = None22 raw_data_lines = []23 in_raw_data = False2425 for line in lines:26 if WF_FIELDS["creator"] in line:27 in_raw_data = False28 current_field = "creator"29 elif WF_FIELDS["raw_data"] in line:30 in_raw_data = True31 current_field = "raw_data"32 elif WF_FIELDS["source_url"] in line:33 in_raw_data = False34 current_field = "source_url"35 elif WF_FIELDS["contact_info"] in line:36 in_raw_data = False37 current_field = "contact_info"38 elif current_field == "creator" and not creator:39 if line.strip():40 creator = line.strip()41 elif in_raw_data and current_field == "raw_data":42 raw_data_lines.append(line)43 elif current_field == "source_url" and not source_url:44 url_match = re.search(r"https?://[^\s<>]+", line)45 if url_match:46 source_url = url_match.group(0)47 elif current_field == "contact_info" and not contact_info:48 if line.strip():49 contact_info = line.strip()5051 # ... handle multiline raw_data and fallbacks
The tricky part is raw_data - it can span multiple lines, so I collect lines until hitting the next field label.
4. Resolving User Mentions
Slack converts @john.doe typed in the form to <@U12345ABCDE> before the bot receives it. I wanted the actual display name in the documentation, so I added a resolver:
1def resolve_user_mention(text: str, client) -> str:2 match = re.search(r"<@([A-Z0-9]+)>", text)3 if not match:4 return text56 user_id = match.group(1)7 try:8 result = client.users_info(user=user_id)9 if result.get("ok"):10 profile = result.get("user", {}).get("profile", {})11 display_name = (12 profile.get("display_name") or profile.get("real_name") or user_id13 )14 return f"@{display_name}"15 except Exception:16 pass17 return text
Now <@U12345ABCDE> becomes @john.doe in the generated docs.
5. Building Slack Message URLs
If the user doesn't provide a source URL, I fall back to linking the Slack message itself:
1def build_slack_message_url(channel: str, ts: str) -> str:2 ts_without_dot = ts.replace(".", "")3 return f"https://{SLACK_WORKSPACE_DOMAIN}/archives/{channel}/p{ts_without_dot}"
This creates URLs like https://workspace.slack.com/archives/C123/p1234567890123456.
6. Generating Documents with Gemini
I added a new method to the Gemini service:
1# app/services/gemini.py23def generate_document(4 self,5 raw_data: str,6 source_url: str,7 contact_info: str,8 creator: str,9) -> str:10 dcp_prompt = self._load_gcs_prompt(DCP_PROMPT_PATH)11 today = date.today().isoformat()1213 prompt = f"""{dcp_prompt}1415[INPUT]16today: {today}17creator: {creator}18data: {raw_data}19source_url: {source_url}20contact_info: {contact_info}21"""2223 # Use standalone model without chat history24 standalone_model = GenerativeModel(self.model_name)25 response = standalone_model.generate_content(prompt)2627 result = response.text.strip()28 logger.info("Generated document via DCP")29 return result
Using a standalone model (not a chat session) ensures consistent results - no conversation history affecting the output.
7. Creating GitHub PRs
The GitHub service handles branch creation, commits, and PR opening:
1# app/services/github.py23class GitHubService:4 def __init__(self) -> None:5 self.token = os.getenv("GITHUB_TOKEN")6 self.repo_name = os.getenv("GITHUB_MEMORY_REPO")78 if not self.token:9 raise ValueError("GITHUB_TOKEN environment variable is required")10 if not self.repo_name:11 raise ValueError("GITHUB_MEMORY_REPO environment variable is required")1213 self.github = Github(self.token)14 self.repo: Repository = self.github.get_repo(self.repo_name)1516 def create_document_pr(17 self,18 doc_id: str,19 file_path: str,20 content: str,21 title: str,22 source_url: str,23 creator: str,24 ) -> str:25 branch_name = f"knowledge/{doc_id}"2627 # Get default branch SHA28 default_branch = self.repo.default_branch29 default_ref = self.repo.get_git_ref(f"heads/{default_branch}")30 sha = default_ref.object.sha3132 # Create new branch33 self.repo.create_git_ref(f"refs/heads/{branch_name}", sha)3435 # Create/update file36 self.repo.create_file(37 path=file_path,38 message=f"Add {title}",39 content=content,40 branch=branch_name,41 )4243 # Create PR44 pr = self.repo.create_pull(45 title=f"[Knowledge] {title}",46 body=f"Source: {source_url}\nRequested by: {creator}",47 head=branch_name,48 base=default_branch,49 )5051 return pr.html_url
8. Extracting Frontmatter
After Gemini generates the document, I extract the frontmatter to get doc_id, target_path, and title:
1def extract_frontmatter(content: str) -> dict:2 result = {3 "doc_id": None,4 "target_path": None,5 "title": "Untitled",6 }78 doc_id_match = re.search(9 r'^doc_id:\s*["\']?(.+?)["\']?\s*$', content, re.MULTILINE10 )11 if doc_id_match:12 result["doc_id"] = doc_id_match.group(1).strip()1314 target_path_match = re.search(15 r'^target_path:\s*["\']?(.+?)["\']?\s*$', content, re.MULTILINE16 )17 if target_path_match:18 result["target_path"] = target_path_match.group(1).strip()1920 title_match = re.search(21 r'^title:\s*["\']?(.+?)["\']?\s*$', content, re.MULTILINE22 )23 if title_match:24 result["title"] = title_match.group(1).strip()2526 return result
9. Handling DCP Errors
If the user's input is missing required fields, DCP returns a JSON error:
1{"status": "error", "missing": ["title", "description"]}
I detect and handle this:
1def is_dcp_error(content: str) -> Tuple[bool, Optional[list]]:2 content_stripped = content.strip()34 # Strip markdown code block wrapper if present5 if content_stripped.startswith("```json"):6 content_stripped = content_stripped[7:]7 if content_stripped.endswith("```"):8 content_stripped = content_stripped[:-3]9 content_stripped = content_stripped.strip()1011 if content_stripped.startswith("{") and "status" in content_stripped:12 try:13 data = json.loads(content_stripped)14 if data.get("status") == "error":15 return True, data.get("missing", [])16 except json.JSONDecodeError:17 pass1819 return False, None
10. The Main Handler
Putting it all together:
1def handle_workflow_submission(event: dict, say, client, logger) -> None:2 text = event.get("text", "")3 channel = event.get("channel", "")4 thread_ts = event.get("ts", "")56 # Extract user_id for mentions7 user_id = None8 if "metadata" in event:9 metadata = event.get("metadata", {})10 if "event_payload" in metadata:11 user_id = metadata["event_payload"].get("user_id")1213 if not user_id:14 user_match = re.search(r"<@([A-Z0-9]+)>", text)15 if user_match:16 user_id = user_match.group(1)1718 if not user_id:19 user_id = "unknown"2021 # Post processing message22 say(text=WORKFLOW_PROCESSING_MESSAGE, thread_ts=thread_ts)2324 # Parse form data25 form_data = parse_workflow_message(text)26 if not form_data:27 say(text=WORKFLOW_ERROR_MESSAGE.format(...), thread_ts=thread_ts)28 return2930 # Resolve user mention to display name31 creator = resolve_user_mention(form_data.creator, client)3233 # Fall back to Slack message URL if no source URL34 source_url = form_data.source_url35 if not source_url:36 source_url = build_slack_message_url(channel, thread_ts)3738 try:39 # Generate structured document40 gemini = get_gemini_service()41 document_content = gemini.generate_document(42 raw_data=form_data.raw_data,43 source_url=source_url,44 contact_info=form_data.contact_info,45 creator=creator,46 )4748 # Check for DCP error49 is_error, missing_fields = is_dcp_error(document_content)50 if is_error:51 say(text=WORKFLOW_ERROR_MESSAGE.format(...), thread_ts=thread_ts)52 return5354 # Extract frontmatter55 frontmatter = extract_frontmatter(document_content)56 doc_id = frontmatter["doc_id"]57 target_path = frontmatter["target_path"]58 title = frontmatter["title"]5960 if not doc_id or not target_path:61 say(text=WORKFLOW_ERROR_MESSAGE.format(...), thread_ts=thread_ts)62 return6364 # Create PR65 github = get_github_service()66 pr_url = github.create_document_pr(67 doc_id=doc_id,68 file_path=target_path,69 content=document_content,70 title=title,71 source_url=source_url,72 creator=creator,73 )7475 # Post success message76 success_msg = WORKFLOW_SUCCESS_MESSAGE.format(77 user_id=user_id,78 title=title,79 pr_url=pr_url,80 review_team=REVIEW_TEAM_MENTION,81 )82 say(text=success_msg, thread_ts=thread_ts)8384 except Exception as e:85 say(text=WORKFLOW_ERROR_MESSAGE.format(...), thread_ts=thread_ts)
Testing
I wrote comprehensive tests for each component:
1# tests/test_workflow_handler.py23class TestResolveUserMention:4 def test_resolves_user_mention_to_display_name(self):5 mock_client = MagicMock()6 mock_client.users_info.return_value = {7 "ok": True,8 "user": {9 "profile": {"display_name": "john.doe", "real_name": "John Doe"}10 },11 }1213 result = resolve_user_mention("<@U12345678>", mock_client)1415 assert result == "@john.doe"16 mock_client.users_info.assert_called_once_with(user="U12345678")1718 def test_falls_back_to_real_name_when_no_display_name(self):19 mock_client = MagicMock()20 mock_client.users_info.return_value = {21 "ok": True,22 "user": {"profile": {"display_name": "", "real_name": "John Doe"}},23 }2425 result = resolve_user_mention("<@U12345678>", mock_client)2627 assert result == "@John Doe"282930class TestParseWorkflowMessage:31 def test_parses_full_workflow_message(self):32 text = """Requester33@john.doe34Knowledge to add35This is the raw knowledge content.36Reference URL37https://example.com/doc38Contact39#support-channel"""4041 result = parse_workflow_message(text)4243 assert result is not None44 assert result.creator == "@john.doe"45 assert "raw knowledge content" in result.raw_data46 assert result.source_url == "https://example.com/doc"47 assert result.contact_info == "#support-channel"484950class TestIsDcpError:51 def test_detects_error_json(self):52 content = '{"status": "error", "missing": ["title"]}'5354 is_error, missing = is_dcp_error(content)5556 assert is_error is True57 assert missing == ["title"]5859 def test_returns_false_for_valid_content(self):60 content = """---61doc_id: manual-162title: Test Document63---64# Content here"""6566 is_error, missing = is_dcp_error(content)6768 assert is_error is False69 assert missing is None
For the Slack event tests, I captured the registered handlers and tested them in isolation:
1# tests/test_slack_events.py23class TestWorkflowMessageFiltering:4 def test_processes_message_from_matching_workflow_id(5 self, mock_say, mock_client, mock_logger6 ):7 event = {8 "bot_id": "B123",9 "channel": "C_WORKFLOW",10 "ts": "123.456",11 "text": "Requester\n@user\nKnowledge to add\ntest",12 "metadata": {"event_payload": {"workflow_id": "Wf_CORRECT"}},13 }1415 with (16 patch("app.handlers.slack_events.WORKFLOW_CHANNEL_ID", "C_WORKFLOW"),17 patch("app.handlers.slack_events.WORKFLOW_ID", "Wf_CORRECT"),18 patch("app.handlers.slack_events.handle_workflow_submission") as mock_handler,19 patch("app.handlers.slack_events._is_duplicate", return_value=False),20 ):21 # ... register and call handler2223 mock_handler.assert_called_once()2425 def test_ignores_message_from_different_workflow_id(26 self, mock_say, mock_client, mock_logger27 ):28 event = {29 "bot_id": "B123",30 "channel": "C_WORKFLOW",31 "ts": "123.456",32 "text": "Some message",33 "metadata": {"event_payload": {"workflow_id": "Wf_DIFFERENT"}},34 }3536 with (37 patch("app.handlers.slack_events.WORKFLOW_CHANNEL_ID", "C_WORKFLOW"),38 patch("app.handlers.slack_events.WORKFLOW_ID", "Wf_CORRECT"),39 patch("app.handlers.slack_events.handle_workflow_submission") as mock_handler,40 ):41 # ... register and call handler4243 mock_handler.assert_not_called()
Deployment
I added the new dependencies and secrets:
1# pyproject.toml2[project]3dependencies = [4 # ... existing deps5 "PyGithub>=2.1.1", # New!6]
And updated the Cloud Run deployment:
1# .github/workflows/deploy.yml23env:4 WORKFLOW_CHANNEL_ID: C01234567895 WORKFLOW_ID: Wf01234567896 REVIEW_TEAM_ID: S01234567897 SLACK_WORKSPACE_DOMAIN: your-workspace.slack.com8 GITHUB_MEMORY_REPO: your-org/your-bot-memory910# In the deploy step11--set-env-vars "\12 WORKFLOW_CHANNEL_ID=${{ env.WORKFLOW_CHANNEL_ID }},\13 WORKFLOW_ID=${{ env.WORKFLOW_ID }},\14 REVIEW_TEAM_ID=${{ env.REVIEW_TEAM_ID }},\15 SLACK_WORKSPACE_DOMAIN=${{ env.SLACK_WORKSPACE_DOMAIN }},\16 GITHUB_MEMORY_REPO=${{ env.GITHUB_MEMORY_REPO }}\17" \18--set-secrets "\19 GITHUB_TOKEN=GITHUB_TOKEN:latest\20"
Gotchas
1. Workflow ID Filtering
The workflow channel might have multiple workflows posting to it. Without filtering by workflow_id, the bot would try to process every workflow message. The workflow ID is in event.metadata.event_payload.workflow_id.
2. Slack Converts Mentions
When a user types @john.doe in a Slack form, it arrives as <@U12345ABCDE>. I had to add the resolve_user_mention function to convert it back to a readable name using the Slack API.
Important: This requires the users:read OAuth scope. Without it, you'll get a missing_scope error:
1{'ok': False, 'error': 'missing_scope', 'needed': 'users:read', ...}
Add the scope in your Slack app settings under OAuth & Permissions → Bot Token Scopes, then reinstall the app to your workspace.
3. Source URL Fallback
If the user leaves the source URL empty or enters something that's not a URL, the bot falls back to linking the Slack message itself. This ensures we always have a reference.
4. DCP Prompt in GCS
I store the DCP prompt in GCS rather than the codebase. This lets me iterate on the prompt without redeploying the bot.
5. Thread Replies
All bot messages use thread_ts=thread_ts to reply in the same thread as the workflow form submission. This keeps the channel clean and groups related messages together.
6. Branch Name Collisions
If Gemini generates the same doc_id twice (e.g., for similar content), the bot will try to create a branch that already exists:
1Reference already exists: 422 {"message": "Reference already exists", ...}
I fixed this by adding a Unix timestamp suffix to branch names:
1timestamp = int(time.time())2branch_name = f"knowledge/{doc_id}-{timestamp}"
Now branches like knowledge/ref-20260323-abc-1774228173 are always unique.
Wrapping Up
The complete flow:
- User fills out Slack Workflow form with raw notes
- Bot detects the message, posts "Processing..."
- Gemini + DCP structures the content into proper documentation
- Bot creates a GitHub PR with the formatted doc
- Bot posts success message with PR link and @mentions the review team
What used to require manual markdown writing, file uploading, and PR creation now happens in seconds with a Slack form. The knowledge actually gets documented.
Files changed:
app/handlers/workflow_handler.py- New orchestration handlerapp/services/github.py- New GitHub API serviceapp/services/gemini.py- Addedgenerate_document()methodapp/handlers/slack_events.py- Added workflow detectionapp/utils/config.py- New configuration constants
