Intro
Product managers love spreadsheets. Engineers love Git. Bridging these two worlds can be painful, especially when you're managing weekly releases that involve copying data from a Google Sheet into YAML files, validating everything, and creating PRs.
I built an automation that reads data from Google Sheets every Thursday at 1AM, validates it, generates YAML entries, and creates a pull request automatically. If validation fails, it sends a Slack alert instead. Here's how it works.
The Problem
Every week, PMs would fill out a Google Sheet with release details (start dates, URLs, tags, etc.) for that week. Someone on the team had to:
- Manually copy data from the sheet
- Transform it into YAML format
- Validate all required fields
- Create a PR with proper formatting
- Notify the team
This took 30-60 minutes every week and was error-prone. Missing fields, incorrect date formats, or malformed URLs would cause production issues.
The Solution: Python + Google Sheets API + GitHub Actions
Architecture Overview
The automation has three main components:
- Google Sheets Reader - Fetches spreadsheet data using service account credentials
- Python Validation & YAML Generation - Validates data and transforms it into YAML
- GitHub Actions Workflow - Orchestrates everything and creates the PR
Setting Up Google Sheets API Access
First, I set up a service account to access the Google Sheet programmatically.
Creating Service Account Credentials
- Enable Google Sheets API in Google Cloud Console
- Create a service account
- Generate JSON key
- Share the spreadsheet with the service account email (viewer access)
The credentials are stored as a GitHub secret (GOOGLE_SHEETS_CREDENTIALS
) and loaded at runtime:
1import json2import os3from google.oauth2.service_account import Credentials4import gspread56class SheetsReader:7 def __init__(self):8 # Load credentials from environment variable9 creds_json = os.getenv("GOOGLE_SHEETS_CREDENTIALS")10 if not creds_json:11 raise ValueError("GOOGLE_SHEETS_CREDENTIALS not set")1213 creds_dict = json.loads(creds_json)1415 scopes = [16 "https://www.googleapis.com/auth/spreadsheets.readonly",17 "https://www.googleapis.com/auth/drive.readonly",18 ]19 credentials = Credentials.from_service_account_info(creds_dict, scopes=scopes)20 self.client = gspread.authorize(credentials)
Reading and Filtering Data
The script reads all rows starting from row 5 (header row) and filters entries where the start date falls between last Friday and this Thursday:
1def get_data(self) -> List[Dict[str, Any]]:2 sheet = self.client.open_by_key(SPREADSHEET_ID).worksheet(SHEET_NAME)34 # Get all values including headers5 all_values = sheet.get("A5:AZ")67 if not all_values:8 return []910 headers = all_values[0]11 data_rows = all_values[1:]1213 # Convert to list of dictionaries14 data = []15 for row in data_rows:16 padded_row = row + [""] * (len(headers) - len(row))17 row_dict = dict(zip(headers, padded_row))18 data.append(row_dict)1920 return data
The date filtering logic handles Japanese date formats like 2024/3/1 (金)
:
1def _parse_japanese_date(date_str: str) -> datetime.date:2 # Remove day of week in parentheses if present3 date_str = date_str.split("(")[0].strip()45 formats = ["%Y/%m/%d", "%Y-%m-%d", "%Y/%m/%d %H:%M", "%Y-%m-%d %H:%M"]67 for fmt in formats:8 try:9 return datetime.strptime(date_str, fmt).date()10 except ValueError:11 continue1213 raise ValueError(f"Unable to parse date: {date_str}")
Data Validation
Before generating YAML, the script validates required fields:
1class DataValidator:2 def validate_entry(self, entry_data: Dict[str, Any]) -> List[ValidationError]:3 errors = []4 entry_name = entry_data.get("name", "Unknown Entry")56 # Check required fields7 for field in REQUIRED_FIELDS:8 value = entry_data.get(field, "").strip()9 if not value:10 errors.append(11 ValidationError(12 entry_name=entry_name,13 field=field,14 message=f"{field} is required"15 )16 )1718 # Validate URL format19 url = entry_data.get("url", "").strip()20 if url and not url.startswith("https://example.com/"):21 errors.append(22 ValidationError(23 entry_name=entry_name,24 field="url",25 message="URL must start with 'https://example.com/'"26 )27 )2829 return errors
If validation fails, the script sends a Slack alert and exits without creating a PR. This prevents bad data from reaching production.
YAML Generation
The script transforms spreadsheet data into YAML entries:
1def generate_entry(self, data: Dict[str, Any]) -> Dict[str, Any]:2 name = data.get("name", "").strip()3 category_input = data.get("category", "").strip()4 category = category_input.upper() if category_input else DEFAULT_CATEGORY5 url = data.get("url", "").strip()67 start_datetime = self._parse_datetime(8 data.get("start_date", ""),9 data.get("start_time", "")10 )1112 end_datetime = self._parse_datetime(13 data.get("end_date", ""),14 data.get("end_time", "")15 )1617 project_id = self._parse_project_id(data.get("project_id", ""))1819 entry = {20 "name": name,21 "category": category,22 "url": url,23 "start_datetime": start_datetime,24 "end_datetime": end_datetime,25 "project_id": project_id,26 }2728 return entry
GitHub Actions Workflow
The GitHub Actions workflow ties everything together:
1on:2 schedule:3 - cron: '0 1 * * 4' # Every Thursday at 1:00 AM4 workflow_dispatch: # Manual trigger56name: Spreadsheet Sync Workflow78jobs:9 create-sync-pull-request:10 runs-on: ubuntu-latest11 steps:12 - uses: actions/checkout@v41314 - name: Set up Python15 uses: actions/setup-python@v516 with:17 python-version: '3.11'1819 - name: Install dependencies20 run: pip install -r scripts/automation/requirements.txt2122 - name: Run automation script23 env:24 GOOGLE_SHEETS_CREDENTIALS: ${{ secrets.GOOGLE_SHEETS_CREDENTIALS }}25 SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}26 run: |27 cd scripts/automation28 python main.py29 continue-on-error: true3031 - name: Create branch and commit changes32 if: steps.check_data.outputs.has_data == 'true'33 run: |34 DATE=$(date "+%Y%m%d")35 BRANCH_NAME="${DATE}_update_data"3637 git config --global user.email "actions@github.com"38 git config --global user.name "GitHub Actions"3940 git fetch origin main:main41 git checkout main42 git checkout -b ${BRANCH_NAME}4344 git add data/master.yaml45 git commit -m "Update data from spreadsheet (${{ steps.check_data.outputs.count }} entries)"4647 git push -u origin ${BRANCH_NAME}4849 - name: Create Pull Request50 run: |51 gh pr create \52 --base main \53 --head ${{ env.BRANCH_NAME }} \54 --title "Data Update - ${{ env.DATE }}" \55 --body-file /tmp/pr_body.txt56 env:57 GH_TOKEN: ${{ github.token }}
Key Lessons
1. Service Account Permissions Matter
Initially, I tried using OAuth tokens, but service accounts are way better for automation. Just make sure to share the spreadsheet with the service account email.
2. Validation Before Automation
The first version didn't validate data and would create PRs with missing fields. Adding validation and Slack alerts saved hours of back-and-forth with PMs.
3. Date Parsing is Messy
Japanese date formats like 2024/3/1 (金)
required custom parsing logic. Supporting multiple formats (YYYY/MM/DD
, YYYY-MM-DD
) made the script more resilient.
4. Indentation Matters in YAML
Appending new entries to existing YAML files is tricky. I had to manually indent each line by 2 spaces to match the existing structure:
1indented_lines = []2for line in new_yaml.split('\n'):3 if line.strip():4 indented_lines.append(' ' + line)
5. Manual Trigger is Essential
While the scheduled cron job works great, having workflow_dispatch
lets PMs manually trigger the workflow if they need to add data outside the regular schedule.
Impact
This automation:
- Saves 30-60 minutes every week
- Reduces human error in data entry
- Provides immediate feedback via Slack when data is incomplete
- Creates consistent, well-formatted PRs
- Allows PMs to work in their preferred tool (Google Sheets) without blocking engineering
The workflow has been running in production for several months with zero incidents related to data formatting.
Pro Tips
Use continue-on-error: true
for the Python script step, then check if output files exist before proceeding. This prevents the workflow from failing when there's no data to process.
Store the PR body in a file (/tmp/pr_body.txt
) rather than trying to pass it as a string in GitHub Actions. Multi-line strings in workflow YAML are painful.
Default missing times to noon (12:00) instead of midnight. This prevents accidental early releases when exact times aren't specified.
Auto-generate what you can but validate what matters (like URLs and dates). This reduces manual work while ensuring data quality.
Wrapping Up
Automating spreadsheet-to-GitHub workflows isn't glamorous, but it's the kind of operational improvement that compounds over time. By integrating Google Sheets API with GitHub Actions, you can let PMs work in familiar tools while maintaining code-based workflows for engineers.
The full code structure includes separate modules for reading, validating, generating YAML, and sending Slack notifications, making it easy to extend or modify individual components without touching the rest of the system.