The Problem
I previously wrote about solving Google Sheets timeout issues by generating static JSON at build time. While that solved the timeout problem, I still had limitations:
- Google Sheets wasn't designed as a database
- Complex queries required loading entire sheets into memory
- Data aggregation happened in JavaScript rather than at the database level
- Limited to 2000 rows due to memory constraints
The solution? Migrate to BigQuery and fetch data directly from the data warehouse.
Why BigQuery?
BigQuery is Google's serverless data warehouse that offers:
- SQL queries: Complex aggregations and joins happen at the database level
- Scalability: Can handle millions of rows without memory issues
- Performance: Distributed query processing is much faster than processing in Node.js
- Integration: Works seamlessly with Google Cloud authentication
Implementation
Step 1: Install BigQuery Client
1npm install @google-cloud/bigquery
Step 2: Set Up Authentication
Create a BigQuery reader class that handles authentication for both local development and production:
1import * as fs from 'fs'2import * as path from 'path'3import { BigQuery } from '@google-cloud/bigquery'45type Credentials = {6 client_email: string7 private_key: string8 project_id?: string9}1011export class BigQueryReader {12 private credentials: Credentials13 private bigQueryClient: BigQuery1415 constructor() {16 this.credentials = this.loadCredentials()17 this.bigQueryClient = new BigQuery({18 projectId: 'your-project-id',19 credentials: {20 client_email: this.credentials.client_email,21 private_key: this.credentials.private_key,22 },23 })24 }2526 private loadCredentials(): Credentials {27 const credsJson = this.getCredentialsJson()2829 try {30 const parsed = JSON.parse(credsJson)3132 if (!parsed.client_email || !parsed.private_key) {33 throw new Error(34 'BIGQUERY_CREDENTIALS must contain client_email and private_key',35 )36 }3738 return {39 client_email: parsed.client_email,40 private_key: parsed.private_key,41 project_id: parsed.project_id,42 }43 } catch (e) {44 throw new Error(45 `Invalid JSON in BIGQUERY_CREDENTIALS: ${e instanceof Error ? e.message : String(e)}`,46 )47 }48 }4950 private getCredentialsJson(): string {51 // For local development, read from JSON file52 if (process.env.NEXT_PUBLIC_APP_ENV === 'local') {53 const credentialsPath = path.join(54 process.cwd(),55 'bigQueryCredentials.json',56 )5758 if (fs.existsSync(credentialsPath)) {59 try {60 return fs.readFileSync(credentialsPath, 'utf-8')61 } catch (e) {62 throw new Error(63 `Failed to read bigQueryCredentials.json: ${e instanceof Error ? e.message : String(e)}`,64 )65 }66 }67 }6869 // For production, use environment variable70 const envCreds = process.env.BIGQUERY_CREDENTIALS7172 if (!envCreds) {73 throw new Error(74 'BIGQUERY_CREDENTIALS environment variable not set and bigQueryCredentials.json not found',75 )76 }7778 return envCreds79 }80}
Step 3: Write SQL Queries
Define your queries in a config file. Here's an example that aggregates data:
1WITH items AS (2 SELECT3 item_id,4 item_name,5 reference_key,6 period_start,7 period_end8 FROM `project.dataset.items`9),1011metrics_a AS (12 SELECT13 reference_key,14 period_start,15 period_end,16 SUM(metric_value) AS total_metric_a17 FROM `project.dataset.metrics_table_a`18 WHERE record_date >= "2022-01-01"19 GROUP BY reference_key, period_start, period_end20),2122metrics_b AS (23 SELECT24 reference_key,25 period_start,26 period_end,27 SUM(value_1) AS total_value_1,28 SUM(value_2) AS total_value_2,29 SUM(value_3) AS total_value_3,30 SUM(amount) AS total_amount31 FROM `project.dataset.metrics_table_b`32 WHERE record_date >= "2022-01-01"33 GROUP BY reference_key, period_start, period_end34)3536SELECT37 items.item_id,38 SUM(metrics_a.total_metric_a) AS metric_a,39 SUM(metrics_b.total_value_1) AS value_1,40 SUM(metrics_b.total_value_2) AS value_2,41 SUM(metrics_b.total_value_3) AS value_3,42 SUM(metrics_b.total_amount) AS amount,43 ROUND(SAFE_DIVIDE(SUM(metrics_b.total_value_1), SUM(metrics_a.total_metric_a)) * 100, 1) AS rate_percentage44FROM items45LEFT JOIN metrics_a46 ON items.reference_key = metrics_a.reference_key47 AND items.period_start = metrics_a.period_start48LEFT JOIN metrics_b49 ON items.reference_key = metrics_b.reference_key50 AND items.period_start = metrics_b.period_start51GROUP BY items.item_id52ORDER BY items.item_id DESC
You can store this in a TypeScript config file as a string constant:
1export const DATA_QUERY = `...` // SQL above
Key SQL techniques:
- CTEs (Common Table Expressions): Use
WITHto break down complex queries into readable chunks - Aggregation with GROUP BY: Combine multiple time periods per item
- LEFT JOINs: Ensure all items are included even if metrics data is missing
- SAFE_DIVIDE: Prevents division by zero errors
- Percentage calculation: Multiply by 100 and round for readable rates
Step 4: Query BigQuery and Map Results
1type BigQueryRow = {2 item_id: string3 metric_a: number4 value_1: number5 value_2: number6 value_3: number7 amount: number8 rate_percentage: number9}1011export type DataItem = {12 id: string13 metricA: string14 value1: string15 value2: string16 value3: string17 amount: string18 ratePercentage: string19}2021export class BigQueryReader {22 // ... previous code ...2324 async getAllData(): Promise<DataItem[]> {25 try {26 const [rows] = await this.bigQueryClient.query({27 query: DATA_QUERY,28 location: 'US', // or your dataset location29 })3031 return rows.map((row: BigQueryRow) => this.mapRowToData(row))32 } catch (error) {33 console.error('Error fetching data from BigQuery:', error)34 throw error35 }36 }3738 private mapRowToData(row: BigQueryRow): DataItem {39 return {40 id: String(row.item_id || ''),41 metricA: row.metric_a !== null ? String(row.metric_a) : '',42 value1: row.value_1 !== null ? String(row.value_1) : '',43 value2: row.value_2 !== null ? String(row.value_2) : '',44 value3: row.value_3 !== null ? String(row.value_3) : '',45 amount: row.amount !== null ? String(row.amount) : '',46 ratePercentage:47 row.rate_percentage !== null ? String(row.rate_percentage) : '',48 }49 }50}5152export const fetchAllData = async (): Promise<DataItem[]> => {53 const reader = new BigQueryReader()54 return reader.getAllData()55}
Step 5: Generate Static JSON at Build Time
Create a script to fetch BigQuery data and save it as JSON:
1// scripts/generate-data.ts2import * as fs from 'fs'3import * as path from 'path'4import { fetchAllData } from '../src/libs/bigQuery'56async function generateDataJson() {7 try {8 const data = await fetchAllData()9 const outputPath = path.join(process.cwd(), 'public', 'data.json')1011 const publicDir = path.join(process.cwd(), 'public')12 if (!fs.existsSync(publicDir)) {13 fs.mkdirSync(publicDir, { recursive: true })14 }1516 fs.writeFileSync(outputPath, JSON.stringify(data, null, 2))17 } catch (error) {18 console.error('Failed to generate data JSON:', error)19 process.exit(1)20 }21}2223generateDataJson()
Update package.json:
1{2 "scripts": {3 "generate-data": "tsx scripts/generate-data.ts",4 "generate-data:local": "NEXT_PUBLIC_APP_ENV=local tsx scripts/generate-data.ts",5 "prebuild": "npm run generate-data",6 "build": "npm run prebuild && next build"7 }8}
Step 6: GitHub Actions Setup
For CI/CD, you have two authentication options:
Option A: Service Account Key (simpler)
1env:2 BIGQUERY_CREDENTIALS: ${{secrets.BIGQUERY_CREDENTIALS}}34steps:5 - name: Checkout6 uses: actions/checkout@v478 - name: Setup Node.js9 uses: actions/setup-node@v410 with:11 node-version: 201213 - name: Install Dependencies14 run: npm ci1516 - name: Build17 run: npm run build
Option B: Workload Identity Federation (more secure)
1permissions:2 id-token: write3 contents: read45steps:6 - name: Checkout7 uses: actions/checkout@v489 - name: Authenticate to Google Cloud10 uses: google-github-actions/auth@v211 with:12 workload_identity_provider: ${{ secrets.WIF_PROVIDER }}13 service_account: ${{ secrets.WIF_SERVICE_ACCOUNT }}1415 - name: Setup Node.js16 uses: actions/setup-node@v417 with:18 node-version: 201920 - name: Build21 run: npm run build
Required IAM Roles
Your service account needs these roles on the BigQuery project:
- BigQuery Job User (
roles/bigquery.jobUser) - to create and run queries - BigQuery Data Viewer (
roles/bigquery.dataViewer) - to read table data
Without these roles, you'll get permission errors like:
1Access Denied: User does not have bigquery.jobs.create permission
Common Issues and Solutions
Issue 1: Duplicate IDs with Different Metrics
Problem: Same ID appearing multiple times due to multiple time periods or categories.
Solution: Use GROUP BY to aggregate at the ID level:
1SELECT2 items.item_id,3 SUM(metric_a) AS total_metric_a,4 SUM(value_1) AS total_value_1,5 ROUND(SAFE_DIVIDE(SUM(value_1), SUM(metric_a)) * 100, 1) AS rate6FROM items7LEFT JOIN metrics ON ...8GROUP BY items.item_id -- Aggregate all periods per ID
Issue 2: BigQuery API Not Enabled
Error: BigQuery API has not been used in project [ID] before or it is disabled
Solution: Enable the BigQuery API for your service account's project:
1https://console.developers.google.com/apis/api/bigquery.googleapis.com/overview?project=PROJECT_ID
Issue 3: Rate Displaying as Decimal
Problem: Rate shows as 0.042 instead of 4.2%
Solution: Multiply by 100 in the SQL query:
1ROUND(SAFE_DIVIDE(value_1, metric_a) * 100, 1) AS rate_percentage
Performance Comparison
Before (Google Sheets):
- Query time: 30-60 seconds (with timeouts)
- Memory usage: High (loading entire sheets)
- Row limit: ~2000 rows
- Aggregation: In JavaScript
After (BigQuery):
- Query time: 3-5 seconds
- Memory usage: Low (streaming results)
- Row limit: Millions of rows
- Aggregation: In SQL (much faster)
Local Development
Test locally with:
1# Make sure bigQueryCredentials.json exists in project root2npm run generate-data:local
Add to .gitignore:
1bigQueryCredentials.json
Conclusion
Migrating from Google Sheets to BigQuery brought significant improvements:
- ✅ Faster queries: 3-5 seconds vs 30-60 seconds
- ✅ Better aggregation: SQL handles complex queries efficiently
- ✅ No row limits: Can scale to millions of records
- ✅ Cleaner code: SQL logic separated from application code
- ✅ No more timeouts: BigQuery is built for large datasets
The key insight: Use the right tool for the job. Google Sheets is great for collaboration, but for production data pipelines, a proper data warehouse like BigQuery is the way to go.
