Adding Knowledge via Slack Workflow: Automating Documentation with Gemini

Intro

After setting up the knowledge base in GCS, I had a bot that could answer questions from our documentation. But adding new knowledge required:

Writing markdown manually
Following our document structure (frontmatter, sections, etc.)
Uploading to the GCS bucket
Getting someone to review it

This friction meant knowledge rarely got added. People would answer questions in Slack threads, but that knowledge never made it back into the docs.

The goal: Let users paste raw notes into a Slack form, have AI structure it into proper documentation, and automatically create a PR for review.

What We're Building

1┌─────────────────────────────────────────────────────────────────────┐
2│  Slack Workflow Form                                                │
3│  ┌────────────────────────────────────────────────────────────────┐ │
4│  │  Requester: @john.doe                                           │ │
5│  │  Knowledge to add: [raw notes here]                             │ │
6│  │  Reference URL: https://...                                     │ │
7│  │  Contact: #support-channel                                      │ │
8│  └────────────────────────────────────────────────────────────────┘ │
9└───────────────────────────────┬─────────────────────────────────────┘
10                                │
11                                ▼
12┌─────────────────────────────────────────────────────────────────────┐
13│  Bot receives message in workflow channel                           │
14│  └─▶ Posts "Received! Formatting your knowledge..."                │
15└───────────────────────────────┬─────────────────────────────────────┘
16                                │
17                                ▼
18┌─────────────────────────────────────────────────────────────────────┐
19│  Gemini + DCP (Data Control Protocol)                               │
20│  └─▶ Structures raw data into proper markdown with frontmatter      │
21│  └─▶ Assigns doc_id, target_path automatically                      │
22└───────────────────────────────┬─────────────────────────────────────┘
23                                │
24                                ▼
25┌─────────────────────────────────────────────────────────────────────┐
26│  GitHub Service                                                     │
27│  └─▶ Creates branch: knowledge/{doc_id}                             │
28│  └─▶ Commits structured document                                    │
29│  └─▶ Opens PR for review                                            │
30└───────────────────────────────┬─────────────────────────────────────┘
31                                │
32                                ▼
33┌─────────────────────────────────────────────────────────────────────┐
34│  Bot posts success message                                          │
35│  "Done! @ops-team Please review this PR"                           │
36│  └─▶ Links to PR                                                    │
37└─────────────────────────────────────────────────────────────────────┘

The DCP Prompt

The magic happens in the DCP (Data Control Protocol) - a system prompt stored in GCS that tells Gemini exactly how to structure documents. It handles:

Inferring document type (manual, workflow, FAQ, etc.)
Generating unique doc_id by scanning the repository
Creating proper YAML frontmatter
Structuring content with headers and sections
Returning errors if required fields are missing

I won't include the full prompt here, but the key is that it outputs complete markdown with frontmatter:

1---
2doc_id: manual-42
3target_path: docs/manuals/manual-42_coffee-machine-guide.md
4title: Coffee Machine Guide
5description: How to use the office coffee machine
6created: 2026-03-23
7creator: '@john.doe'
8source_url: https://slack.com/archives/C123/p456
9contact_info: '#faq-support @ops-team'
10tags:
11  - office
12  - equipment
13---
14# Coffee Machine Guide
15
16## Overview
17...

Implementation

1. Configuration

First, I added the new environment variables:

1# app/utils/config.py
2
3WORKFLOW_CHANNEL_ID = os.getenv("WORKFLOW_CHANNEL_ID", "")
4WORKFLOW_ID = os.getenv("WORKFLOW_ID", "")
5DEFAULT_CONTACT_INFO = os.getenv("DEFAULT_CONTACT_INFO", "#support @team")
6REVIEW_TEAM_ID = os.getenv("REVIEW_TEAM_ID", "")
7REVIEW_TEAM_MENTION = f"<!subteam^{REVIEW_TEAM_ID}>" if REVIEW_TEAM_ID else "@team"
8SLACK_WORKSPACE_DOMAIN = os.getenv("SLACK_WORKSPACE_DOMAIN", "")
9DCP_PROMPT_PATH = "prompts/dcp_prompt.md"
10
11WF_FIELDS = {
12    "creator": "Requester",
13    "raw_data": "Knowledge to add",
14    "source_url": "Reference URL",
15    "contact_info": "Contact",
16}
17
18WORKFLOW_PROCESSING_MESSAGE = "Received! Formatting your knowledge, please wait..."
19WORKFLOW_SUCCESS_MESSAGE = """Done!
20<@{user_id}> submitted new knowledge for review.
21
22📄 *{title}*
23🔗 {pr_url}
24
25{review_team} Please review this PR!"""
26WORKFLOW_ERROR_MESSAGE = """Error occurred while formatting!
27<@{user_id}>, please check the following and try again.
28
29Reason: {error_details}"""

A few notes:

WORKFLOW_ID filters messages to only process from our specific workflow
WF_FIELDS maps field names to Japanese labels used in the workflow form
REVIEW_TEAM_MENTION uses Slack's subteam mention syntax <!subteam^ID>

2. Detecting Workflow Messages

I updated the message handler to detect workflow submissions:

1# app/handlers/slack_events.py
2
3@app.event("message")
4def handle_message(event, say, client, logger):
5    bot_id = event.get("bot_id")
6    channel = event.get("channel")
7
8    # Detect workflow submission
9    if bot_id and channel == WORKFLOW_CHANNEL_ID and WORKFLOW_CHANNEL_ID:
10        metadata = event.get("metadata", {})
11        event_payload = metadata.get("event_payload", {})
12        workflow_id = event_payload.get("workflow_id")
13
14        logger.info(
15            f"Workflow channel message: workflow_id={workflow_id}, "
16            f"expected={WORKFLOW_ID}"
17        )
18
19        # Only process messages from our specific workflow
20        if WORKFLOW_ID and workflow_id and workflow_id != WORKFLOW_ID:
21            logger.info(f"Ignoring message from different workflow {workflow_id}")
22            return
23
24        event_ts = event.get("event_ts") or event.get("ts")
25        if _is_duplicate(f"workflow:{event_ts}"):
26            return
27
28        handle_workflow_submission(event, say, client, logger)
29        return
30
31    # ... existing DM handling

Key points:

Bot messages only - Workflow posts come from a bot, not a user
Channel filter - Only process messages in the designated workflow channel
Workflow ID filter - The channel might have multiple workflows posting; we only want ours
Duplicate check - Slack sometimes sends duplicate events

3. Parsing the Workflow Form

The workflow posts a message with labeled fields. I parse it into a dataclass:

1# app/handlers/workflow_handler.py
2
3@dataclass
4class WorkflowFormData:
5    raw_data: str
6    contact_info: str
7    creator: str
8    source_url: Optional[str] = None
9
10
11def parse_workflow_message(text: str) -> Optional[WorkflowFormData]:
12    if not text:
13        return None
14
15    raw_data = None
16    source_url = None
17    contact_info = None
18    creator = None
19
20    lines = text.strip().split("\n")
21    current_field = None
22    raw_data_lines = []
23    in_raw_data = False
24
25    for line in lines:
26        if WF_FIELDS["creator"] in line:
27            in_raw_data = False
28            current_field = "creator"
29        elif WF_FIELDS["raw_data"] in line:
30            in_raw_data = True
31            current_field = "raw_data"
32        elif WF_FIELDS["source_url"] in line:
33            in_raw_data = False
34            current_field = "source_url"
35        elif WF_FIELDS["contact_info"] in line:
36            in_raw_data = False
37            current_field = "contact_info"
38        elif current_field == "creator" and not creator:
39            if line.strip():
40                creator = line.strip()
41        elif in_raw_data and current_field == "raw_data":
42            raw_data_lines.append(line)
43        elif current_field == "source_url" and not source_url:
44            url_match = re.search(r"https?://[^\s<>]+", line)
45            if url_match:
46                source_url = url_match.group(0)
47        elif current_field == "contact_info" and not contact_info:
48            if line.strip():
49                contact_info = line.strip()
50
51    # ... handle multiline raw_data and fallbacks

The tricky part is raw_data - it can span multiple lines, so I collect lines until hitting the next field label.

4. Resolving User Mentions

Slack converts @john.doe typed in the form to <@U12345ABCDE> before the bot receives it. I wanted the actual display name in the documentation, so I added a resolver:

1def resolve_user_mention(text: str, client) -> str:
2    match = re.search(r"<@([A-Z0-9]+)>", text)
3    if not match:
4        return text
5
6    user_id = match.group(1)
7    try:
8        result = client.users_info(user=user_id)
9        if result.get("ok"):
10            profile = result.get("user", {}).get("profile", {})
11            display_name = (
12                profile.get("display_name") or profile.get("real_name") or user_id
13            )
14            return f"@{display_name}"
15    except Exception:
16        pass
17    return text

Now <@U12345ABCDE> becomes @john.doe in the generated docs.

5. Building Slack Message URLs

If the user doesn't provide a source URL, I fall back to linking the Slack message itself:

1def build_slack_message_url(channel: str, ts: str) -> str:
2    ts_without_dot = ts.replace(".", "")
3    return f"https://{SLACK_WORKSPACE_DOMAIN}/archives/{channel}/p{ts_without_dot}"

This creates URLs like https://workspace.slack.com/archives/C123/p1234567890123456.

6. Generating Documents with Gemini

I added a new method to the Gemini service:

1# app/services/gemini.py
2
3def generate_document(
4    self,
5    raw_data: str,
6    source_url: str,
7    contact_info: str,
8    creator: str,
9) -> str:
10    dcp_prompt = self._load_gcs_prompt(DCP_PROMPT_PATH)
11    today = date.today().isoformat()
12
13    prompt = f"""{dcp_prompt}
14
15[INPUT]
16today: {today}
17creator: {creator}
18data: {raw_data}
19source_url: {source_url}
20contact_info: {contact_info}
21"""
22
23    # Use standalone model without chat history
24    standalone_model = GenerativeModel(self.model_name)
25    response = standalone_model.generate_content(prompt)
26
27    result = response.text.strip()
28    logger.info("Generated document via DCP")
29    return result

Using a standalone model (not a chat session) ensures consistent results - no conversation history affecting the output.

7. Creating GitHub PRs

The GitHub service handles branch creation, commits, and PR opening:

1# app/services/github.py
2
3class GitHubService:
4    def __init__(self) -> None:
5        self.token = os.getenv("GITHUB_TOKEN")
6        self.repo_name = os.getenv("GITHUB_MEMORY_REPO")
7
8        if not self.token:
9            raise ValueError("GITHUB_TOKEN environment variable is required")
10        if not self.repo_name:
11            raise ValueError("GITHUB_MEMORY_REPO environment variable is required")
12
13        self.github = Github(self.token)
14        self.repo: Repository = self.github.get_repo(self.repo_name)
15
16    def create_document_pr(
17        self,
18        doc_id: str,
19        file_path: str,
20        content: str,
21        title: str,
22        source_url: str,
23        creator: str,
24    ) -> str:
25        branch_name = f"knowledge/{doc_id}"
26
27        # Get default branch SHA
28        default_branch = self.repo.default_branch
29        default_ref = self.repo.get_git_ref(f"heads/{default_branch}")
30        sha = default_ref.object.sha
31
32        # Create new branch
33        self.repo.create_git_ref(f"refs/heads/{branch_name}", sha)
34
35        # Create/update file
36        self.repo.create_file(
37            path=file_path,
38            message=f"Add {title}",
39            content=content,
40            branch=branch_name,
41        )
42
43        # Create PR
44        pr = self.repo.create_pull(
45            title=f"[Knowledge] {title}",
46            body=f"Source: {source_url}\nRequested by: {creator}",
47            head=branch_name,
48            base=default_branch,
49        )
50
51        return pr.html_url

8. Extracting Frontmatter

After Gemini generates the document, I extract the frontmatter to get doc_id, target_path, and title:

1def extract_frontmatter(content: str) -> dict:
2    result = {
3        "doc_id": None,
4        "target_path": None,
5        "title": "Untitled",
6    }
7
8    doc_id_match = re.search(
9        r'^doc_id:\s*["\']?(.+?)["\']?\s*$', content, re.MULTILINE
10    )
11    if doc_id_match:
12        result["doc_id"] = doc_id_match.group(1).strip()
13
14    target_path_match = re.search(
15        r'^target_path:\s*["\']?(.+?)["\']?\s*$', content, re.MULTILINE
16    )
17    if target_path_match:
18        result["target_path"] = target_path_match.group(1).strip()
19
20    title_match = re.search(
21        r'^title:\s*["\']?(.+?)["\']?\s*$', content, re.MULTILINE
22    )
23    if title_match:
24        result["title"] = title_match.group(1).strip()
25
26    return result

9. Handling DCP Errors

If the user's input is missing required fields, DCP returns a JSON error:

1{ "status": "error", "missing": ["title", "description"] }

I detect and handle this:

1def is_dcp_error(content: str) -> Tuple[bool, Optional[list]]:
2    content_stripped = content.strip()
3
4    # Strip markdown code block wrapper if present
5    if content_stripped.startswith("```json"):
6        content_stripped = content_stripped[7:]
7    if content_stripped.endswith("```"):
8        content_stripped = content_stripped[:-3]
9    content_stripped = content_stripped.strip()
10
11    if content_stripped.startswith("{") and "status" in content_stripped:
12        try:
13            data = json.loads(content_stripped)
14            if data.get("status") == "error":
15                return True, data.get("missing", [])
16        except json.JSONDecodeError:
17            pass
18
19    return False, None

10. The Main Handler

Putting it all together:

1def handle_workflow_submission(event: dict, say, client, logger) -> None:
2    text = event.get("text", "")
3    channel = event.get("channel", "")
4    thread_ts = event.get("ts", "")
5
6    # Extract user_id for mentions
7    user_id = None
8    if "metadata" in event:
9        metadata = event.get("metadata", {})
10        if "event_payload" in metadata:
11            user_id = metadata["event_payload"].get("user_id")
12
13    if not user_id:
14        user_match = re.search(r"<@([A-Z0-9]+)>", text)
15        if user_match:
16            user_id = user_match.group(1)
17
18    if not user_id:
19        user_id = "unknown"
20
21    # Post processing message
22    say(text=WORKFLOW_PROCESSING_MESSAGE, thread_ts=thread_ts)
23
24    # Parse form data
25    form_data = parse_workflow_message(text)
26    if not form_data:
27        say(text=WORKFLOW_ERROR_MESSAGE.format(...), thread_ts=thread_ts)
28        return
29
30    # Resolve user mention to display name
31    creator = resolve_user_mention(form_data.creator, client)
32
33    # Fall back to Slack message URL if no source URL
34    source_url = form_data.source_url
35    if not source_url:
36        source_url = build_slack_message_url(channel, thread_ts)
37
38    try:
39        # Generate structured document
40        gemini = get_gemini_service()
41        document_content = gemini.generate_document(
42            raw_data=form_data.raw_data,
43            source_url=source_url,
44            contact_info=form_data.contact_info,
45            creator=creator,
46        )
47
48        # Check for DCP error
49        is_error, missing_fields = is_dcp_error(document_content)
50        if is_error:
51            say(text=WORKFLOW_ERROR_MESSAGE.format(...), thread_ts=thread_ts)
52            return
53
54        # Extract frontmatter
55        frontmatter = extract_frontmatter(document_content)
56        doc_id = frontmatter["doc_id"]
57        target_path = frontmatter["target_path"]
58        title = frontmatter["title"]
59
60        if not doc_id or not target_path:
61            say(text=WORKFLOW_ERROR_MESSAGE.format(...), thread_ts=thread_ts)
62            return
63
64        # Create PR
65        github = get_github_service()
66        pr_url = github.create_document_pr(
67            doc_id=doc_id,
68            file_path=target_path,
69            content=document_content,
70            title=title,
71            source_url=source_url,
72            creator=creator,
73        )
74
75        # Post success message
76        success_msg = WORKFLOW_SUCCESS_MESSAGE.format(
77            user_id=user_id,
78            title=title,
79            pr_url=pr_url,
80            review_team=REVIEW_TEAM_MENTION,
81        )
82        say(text=success_msg, thread_ts=thread_ts)
83
84    except Exception as e:
85        say(text=WORKFLOW_ERROR_MESSAGE.format(...), thread_ts=thread_ts)

Testing

I wrote comprehensive tests for each component:

1# tests/test_workflow_handler.py
2
3class TestResolveUserMention:
4    def test_resolves_user_mention_to_display_name(self):
5        mock_client = MagicMock()
6        mock_client.users_info.return_value = {
7            "ok": True,
8            "user": {
9                "profile": {"display_name": "john.doe", "real_name": "John Doe"}
10            },
11        }
12
13        result = resolve_user_mention("<@U12345678>", mock_client)
14
15        assert result == "@john.doe"
16        mock_client.users_info.assert_called_once_with(user="U12345678")
17
18    def test_falls_back_to_real_name_when_no_display_name(self):
19        mock_client = MagicMock()
20        mock_client.users_info.return_value = {
21            "ok": True,
22            "user": {"profile": {"display_name": "", "real_name": "John Doe"}},
23        }
24
25        result = resolve_user_mention("<@U12345678>", mock_client)
26
27        assert result == "@John Doe"
28
29
30class TestParseWorkflowMessage:
31    def test_parses_full_workflow_message(self):
32        text = """Requester
33@john.doe
34Knowledge to add
35This is the raw knowledge content.
36Reference URL
37https://example.com/doc
38Contact
39#support-channel"""
40
41        result = parse_workflow_message(text)
42
43        assert result is not None
44        assert result.creator == "@john.doe"
45        assert "raw knowledge content" in result.raw_data
46        assert result.source_url == "https://example.com/doc"
47        assert result.contact_info == "#support-channel"
48
49
50class TestIsDcpError:
51    def test_detects_error_json(self):
52        content = '{"status": "error", "missing": ["title"]}'
53
54        is_error, missing = is_dcp_error(content)
55
56        assert is_error is True
57        assert missing == ["title"]
58
59    def test_returns_false_for_valid_content(self):
60        content = """---
61doc_id: manual-1
62title: Test Document
63---
64# Content here"""
65
66        is_error, missing = is_dcp_error(content)
67
68        assert is_error is False
69        assert missing is None

For the Slack event tests, I captured the registered handlers and tested them in isolation:

1# tests/test_slack_events.py
2
3class TestWorkflowMessageFiltering:
4    def test_processes_message_from_matching_workflow_id(
5        self, mock_say, mock_client, mock_logger
6    ):
7        event = {
8            "bot_id": "B123",
9            "channel": "C_WORKFLOW",
10            "ts": "123.456",
11            "text": "Requester\n@user\nKnowledge to add\ntest",
12            "metadata": {"event_payload": {"workflow_id": "Wf_CORRECT"}},
13        }
14
15        with (
16            patch("app.handlers.slack_events.WORKFLOW_CHANNEL_ID", "C_WORKFLOW"),
17            patch("app.handlers.slack_events.WORKFLOW_ID", "Wf_CORRECT"),
18            patch("app.handlers.slack_events.handle_workflow_submission") as mock_handler,
19            patch("app.handlers.slack_events._is_duplicate", return_value=False),
20        ):
21            # ... register and call handler
22
23            mock_handler.assert_called_once()
24
25    def test_ignores_message_from_different_workflow_id(
26        self, mock_say, mock_client, mock_logger
27    ):
28        event = {
29            "bot_id": "B123",
30            "channel": "C_WORKFLOW",
31            "ts": "123.456",
32            "text": "Some message",
33            "metadata": {"event_payload": {"workflow_id": "Wf_DIFFERENT"}},
34        }
35
36        with (
37            patch("app.handlers.slack_events.WORKFLOW_CHANNEL_ID", "C_WORKFLOW"),
38            patch("app.handlers.slack_events.WORKFLOW_ID", "Wf_CORRECT"),
39            patch("app.handlers.slack_events.handle_workflow_submission") as mock_handler,
40        ):
41            # ... register and call handler
42
43            mock_handler.assert_not_called()

Deployment

I added the new dependencies and secrets:

1# pyproject.toml
2[project]
3dependencies = [
4    # ... existing deps
5    "PyGithub>=2.1.1",  # New!
6]

And updated the Cloud Run deployment:

1# .github/workflows/deploy.yml
2
3env:
4  WORKFLOW_CHANNEL_ID: C0123456789
5  WORKFLOW_ID: Wf0123456789
6  REVIEW_TEAM_ID: S0123456789
7  SLACK_WORKSPACE_DOMAIN: your-workspace.slack.com
8  GITHUB_MEMORY_REPO: your-org/your-bot-memory
9
10# In the deploy step
11--set-env-vars "\
12  WORKFLOW_CHANNEL_ID=${{ env.WORKFLOW_CHANNEL_ID }},\
13  WORKFLOW_ID=${{ env.WORKFLOW_ID }},\
14  REVIEW_TEAM_ID=${{ env.REVIEW_TEAM_ID }},\
15  SLACK_WORKSPACE_DOMAIN=${{ env.SLACK_WORKSPACE_DOMAIN }},\
16  GITHUB_MEMORY_REPO=${{ env.GITHUB_MEMORY_REPO }}\
17" \
18--set-secrets "\
19  GITHUB_TOKEN=GITHUB_TOKEN:latest\
20"

The workflow channel might have multiple workflows posting to it. Without filtering by workflow_id, the bot would try to process every workflow message. The workflow ID is in event.metadata.event_payload.workflow_id.

2. Slack Converts Mentions

When a user types @john.doe in a Slack form, it arrives as <@U12345ABCDE>. I had to add the resolve_user_mention function to convert it back to a readable name using the Slack API.

Important: This requires the users:read OAuth scope. Without it, you'll get a missing_scope error:

1{'ok': False, 'error': 'missing_scope', 'needed': 'users:read', ...}

Add the scope in your Slack app settings under OAuth & Permissions → Bot Token Scopes, then reinstall the app to your workspace.

1Reference already exists: 422 {"message": "Reference already exists", ...}

I fixed this by adding a Unix timestamp suffix to branch names:

1timestamp = int(time.time())
2branch_name = f"knowledge/{doc_id}-{timestamp}"

Now branches like knowledge/ref-20260323-abc-1774228173 are always unique.

Wrapping Up

The complete flow:

User fills out Slack Workflow form with raw notes
Bot detects the message, posts "Processing..."
Gemini + DCP structures the content into proper documentation
Bot creates a GitHub PR with the formatted doc
Bot posts success message with PR link and @mentions the review team

What used to require manual markdown writing, file uploading, and PR creation now happens in seconds with a Slack form. The knowledge actually gets documented.

Files changed:

app/handlers/workflow_handler.py - New orchestration handler
app/services/github.py - New GitHub API service
app/services/gemini.py - Added generate_document() method
app/handlers/slack_events.py - Added workflow detection
app/utils/config.py - New configuration constants

Adding Knowledge via Slack Workflow: Automating Documentation with Gemini

Table of Contents

Intro

What We're Building

The DCP Prompt

Implementation

1. Configuration

2. Detecting Workflow Messages

3. Parsing the Workflow Form

4. Resolving User Mentions

5. Building Slack Message URLs

6. Generating Documents with Gemini

7. Creating GitHub PRs

8. Extracting Frontmatter

9. Handling DCP Errors

10. The Main Handler

Testing

Deployment

Gotchas

1. Workflow ID Filtering

2. Slack Converts Mentions

3. Source URL Fallback

4. DCP Prompt in GCS

5. Thread Replies

6. Branch Name Collisions

Wrapping Up

Project Navigation

Series: Building a Slack AI Assistant with Flask + Gemini