Automating Spreadsheet to GitHub PRs: A Production Workflow

7 min read
PythonGHA

Table of Contents

Intro

Product managers love spreadsheets. Engineers love Git. Bridging these two worlds can be painful, especially when you're managing weekly releases that involve copying data from a Google Sheet into YAML files, validating everything, and creating PRs.

I built an automation that reads data from Google Sheets every Thursday at 1AM, validates it, generates YAML entries, and creates a pull request automatically. If validation fails, it sends a Slack alert instead. Here's how it works.

The Problem

Every week, PMs would fill out a Google Sheet with release details (start dates, URLs, tags, etc.) for that week. Someone on the team had to:

  1. Manually copy data from the sheet
  2. Transform it into YAML format
  3. Validate all required fields
  4. Create a PR with proper formatting
  5. Notify the team

This took 30-60 minutes every week and was error-prone. Missing fields, incorrect date formats, or malformed URLs would cause production issues.

The Solution: Python + Google Sheets API + GitHub Actions

Architecture Overview

The automation has three main components:

  1. Google Sheets Reader - Fetches spreadsheet data using service account credentials
  2. Python Validation & YAML Generation - Validates data and transforms it into YAML
  3. GitHub Actions Workflow - Orchestrates everything and creates the PR

Setting Up Google Sheets API Access

First, I set up a service account to access the Google Sheet programmatically.

Creating Service Account Credentials

  1. Enable Google Sheets API in Google Cloud Console
  2. Create a service account
  3. Generate JSON key
  4. Share the spreadsheet with the service account email (viewer access)

The credentials are stored as a GitHub secret (GOOGLE_SHEETS_CREDENTIALS) and loaded at runtime:

1import json
2import os
3from google.oauth2.service_account import Credentials
4import gspread
5
6class SheetsReader:
7 def __init__(self):
8 # Load credentials from environment variable
9 creds_json = os.getenv("GOOGLE_SHEETS_CREDENTIALS")
10 if not creds_json:
11 raise ValueError("GOOGLE_SHEETS_CREDENTIALS not set")
12
13 creds_dict = json.loads(creds_json)
14
15 scopes = [
16 "https://www.googleapis.com/auth/spreadsheets.readonly",
17 "https://www.googleapis.com/auth/drive.readonly",
18 ]
19 credentials = Credentials.from_service_account_info(creds_dict, scopes=scopes)
20 self.client = gspread.authorize(credentials)

Reading and Filtering Data

The script reads all rows starting from row 5 (header row) and filters entries where the start date falls between last Friday and this Thursday:

1def get_data(self) -> List[Dict[str, Any]]:
2 sheet = self.client.open_by_key(SPREADSHEET_ID).worksheet(SHEET_NAME)
3
4 # Get all values including headers
5 all_values = sheet.get("A5:AZ")
6
7 if not all_values:
8 return []
9
10 headers = all_values[0]
11 data_rows = all_values[1:]
12
13 # Convert to list of dictionaries
14 data = []
15 for row in data_rows:
16 padded_row = row + [""] * (len(headers) - len(row))
17 row_dict = dict(zip(headers, padded_row))
18 data.append(row_dict)
19
20 return data

The date filtering logic handles Japanese date formats like 2024/3/1 (金):

1def _parse_japanese_date(date_str: str) -> datetime.date:
2 # Remove day of week in parentheses if present
3 date_str = date_str.split("(")[0].strip()
4
5 formats = ["%Y/%m/%d", "%Y-%m-%d", "%Y/%m/%d %H:%M", "%Y-%m-%d %H:%M"]
6
7 for fmt in formats:
8 try:
9 return datetime.strptime(date_str, fmt).date()
10 except ValueError:
11 continue
12
13 raise ValueError(f"Unable to parse date: {date_str}")

Data Validation

Before generating YAML, the script validates required fields:

1class DataValidator:
2 def validate_entry(self, entry_data: Dict[str, Any]) -> List[ValidationError]:
3 errors = []
4 entry_name = entry_data.get("name", "Unknown Entry")
5
6 # Check required fields
7 for field in REQUIRED_FIELDS:
8 value = entry_data.get(field, "").strip()
9 if not value:
10 errors.append(
11 ValidationError(
12 entry_name=entry_name,
13 field=field,
14 message=f"{field} is required"
15 )
16 )
17
18 # Validate URL format
19 url = entry_data.get("url", "").strip()
20 if url and not url.startswith("https://example.com/"):
21 errors.append(
22 ValidationError(
23 entry_name=entry_name,
24 field="url",
25 message="URL must start with 'https://example.com/'"
26 )
27 )
28
29 return errors

If validation fails, the script sends a Slack alert and exits without creating a PR. This prevents bad data from reaching production.

YAML Generation

The script transforms spreadsheet data into YAML entries:

1def generate_entry(self, data: Dict[str, Any]) -> Dict[str, Any]:
2 name = data.get("name", "").strip()
3 category_input = data.get("category", "").strip()
4 category = category_input.upper() if category_input else DEFAULT_CATEGORY
5 url = data.get("url", "").strip()
6
7 start_datetime = self._parse_datetime(
8 data.get("start_date", ""),
9 data.get("start_time", "")
10 )
11
12 end_datetime = self._parse_datetime(
13 data.get("end_date", ""),
14 data.get("end_time", "")
15 )
16
17 project_id = self._parse_project_id(data.get("project_id", ""))
18
19 entry = {
20 "name": name,
21 "category": category,
22 "url": url,
23 "start_datetime": start_datetime,
24 "end_datetime": end_datetime,
25 "project_id": project_id,
26 }
27
28 return entry

GitHub Actions Workflow

The GitHub Actions workflow ties everything together:

1on:
2 schedule:
3 - cron: '0 1 * * 4' # Every Thursday at 1:00 AM
4 workflow_dispatch: # Manual trigger
5
6name: Spreadsheet Sync Workflow
7
8jobs:
9 create-sync-pull-request:
10 runs-on: ubuntu-latest
11 steps:
12 - uses: actions/checkout@v4
13
14 - name: Set up Python
15 uses: actions/setup-python@v5
16 with:
17 python-version: '3.11'
18
19 - name: Install dependencies
20 run: pip install -r scripts/automation/requirements.txt
21
22 - name: Run automation script
23 env:
24 GOOGLE_SHEETS_CREDENTIALS: ${{ secrets.GOOGLE_SHEETS_CREDENTIALS }}
25 SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
26 run: |
27 cd scripts/automation
28 python main.py
29 continue-on-error: true
30
31 - name: Create branch and commit changes
32 if: steps.check_data.outputs.has_data == 'true'
33 run: |
34 DATE=$(date "+%Y%m%d")
35 BRANCH_NAME="${DATE}_update_data"
36
37 git config --global user.email "actions@github.com"
38 git config --global user.name "GitHub Actions"
39
40 git fetch origin main:main
41 git checkout main
42 git checkout -b ${BRANCH_NAME}
43
44 git add data/master.yaml
45 git commit -m "Update data from spreadsheet (${{ steps.check_data.outputs.count }} entries)"
46
47 git push -u origin ${BRANCH_NAME}
48
49 - name: Create Pull Request
50 run: |
51 gh pr create \
52 --base main \
53 --head ${{ env.BRANCH_NAME }} \
54 --title "Data Update - ${{ env.DATE }}" \
55 --body-file /tmp/pr_body.txt
56 env:
57 GH_TOKEN: ${{ github.token }}

Key Lessons

1. Service Account Permissions Matter

Initially, I tried using OAuth tokens, but service accounts are way better for automation. Just make sure to share the spreadsheet with the service account email.

2. Validation Before Automation

The first version didn't validate data and would create PRs with missing fields. Adding validation and Slack alerts saved hours of back-and-forth with PMs.

3. Date Parsing is Messy

Japanese date formats like 2024/3/1 (金) required custom parsing logic. Supporting multiple formats (YYYY/MM/DD, YYYY-MM-DD) made the script more resilient.

4. Indentation Matters in YAML

Appending new entries to existing YAML files is tricky. I had to manually indent each line by 2 spaces to match the existing structure:

1indented_lines = []
2for line in new_yaml.split('\n'):
3 if line.strip():
4 indented_lines.append(' ' + line)

5. Manual Trigger is Essential

While the scheduled cron job works great, having workflow_dispatch lets PMs manually trigger the workflow if they need to add data outside the regular schedule.

Impact

This automation:

  • Saves 30-60 minutes every week
  • Reduces human error in data entry
  • Provides immediate feedback via Slack when data is incomplete
  • Creates consistent, well-formatted PRs
  • Allows PMs to work in their preferred tool (Google Sheets) without blocking engineering

The workflow has been running in production for several months with zero incidents related to data formatting.

Pro Tips

Use continue-on-error: true for the Python script step, then check if output files exist before proceeding. This prevents the workflow from failing when there's no data to process.

Store the PR body in a file (/tmp/pr_body.txt) rather than trying to pass it as a string in GitHub Actions. Multi-line strings in workflow YAML are painful.

Default missing times to noon (12:00) instead of midnight. This prevents accidental early releases when exact times aren't specified.

Auto-generate what you can but validate what matters (like URLs and dates). This reduces manual work while ensuring data quality.

Wrapping Up

Automating spreadsheet-to-GitHub workflows isn't glamorous, but it's the kind of operational improvement that compounds over time. By integrating Google Sheets API with GitHub Actions, you can let PMs work in familiar tools while maintaining code-based workflows for engineers.

The full code structure includes separate modules for reading, validating, generating YAML, and sending Slack notifications, making it easy to extend or modify individual components without touching the rest of the system.

Related Articles