Integrating BigQuery with Node.js - From Google Sheets to Scalable Data Fetching

The Problem

I previously wrote about solving Google Sheets timeout issues by generating static JSON at build time. While that solved the timeout problem, I still had limitations:

Google Sheets wasn't designed as a database
Complex queries required loading entire sheets into memory
Data aggregation happened in JavaScript rather than at the database level
Limited to 2000 rows due to memory constraints

The solution? Migrate to BigQuery and fetch data directly from the data warehouse.

Why BigQuery?

BigQuery is Google's serverless data warehouse that offers:

SQL queries: Complex aggregations and joins happen at the database level
Scalability: Can handle millions of rows without memory issues
Performance: Distributed query processing is much faster than processing in Node.js
Integration: Works seamlessly with Google Cloud authentication

Implementation

Step 1: Install BigQuery Client

1npm install @google-cloud/bigquery

Step 2: Set Up Authentication

Create a BigQuery reader class that handles authentication for both local development and production:

1import * as fs from 'fs'
2import * as path from 'path'
3import { BigQuery } from '@google-cloud/bigquery'
4
5type Credentials = {
6  client_email: string
7  private_key: string
8  project_id?: string
9}
10
11export class BigQueryReader {
12  private credentials: Credentials
13  private bigQueryClient: BigQuery
14
15  constructor() {
16    this.credentials = this.loadCredentials()
17    this.bigQueryClient = new BigQuery({
18      projectId: 'your-project-id',
19      credentials: {
20        client_email: this.credentials.client_email,
21        private_key: this.credentials.private_key,
22      },
23    })
24  }
25
26  private loadCredentials(): Credentials {
27    const credsJson = this.getCredentialsJson()
28
29    try {
30      const parsed = JSON.parse(credsJson)
31
32      if (!parsed.client_email || !parsed.private_key) {
33        throw new Error(
34          'BIGQUERY_CREDENTIALS must contain client_email and private_key',
35        )
36      }
37
38      return {
39        client_email: parsed.client_email,
40        private_key: parsed.private_key,
41        project_id: parsed.project_id,
42      }
43    } catch (e) {
44      throw new Error(
45        `Invalid JSON in BIGQUERY_CREDENTIALS: ${e instanceof Error ? e.message : String(e)}`,
46      )
47    }
48  }
49
50  private getCredentialsJson(): string {
51    // For local development, read from JSON file
52    if (process.env.NEXT_PUBLIC_APP_ENV === 'local') {
53      const credentialsPath = path.join(
54        process.cwd(),
55        'bigQueryCredentials.json',
56      )
57
58      if (fs.existsSync(credentialsPath)) {
59        try {
60          return fs.readFileSync(credentialsPath, 'utf-8')
61        } catch (e) {
62          throw new Error(
63            `Failed to read bigQueryCredentials.json: ${e instanceof Error ? e.message : String(e)}`,
64          )
65        }
66      }
67    }
68
69    // For production, use environment variable
70    const envCreds = process.env.BIGQUERY_CREDENTIALS
71
72    if (!envCreds) {
73      throw new Error(
74        'BIGQUERY_CREDENTIALS environment variable not set and bigQueryCredentials.json not found',
75      )
76    }
77
78    return envCreds
79  }
80}

Step 3: Write SQL Queries

Define your queries in a config file. Here's an example that aggregates data:

1WITH items AS (
2  SELECT
3    item_id,
4    item_name,
5    reference_key,
6    period_start,
7    period_end
8  FROM `project.dataset.items`
9),
10
11metrics_a AS (
12  SELECT
13    reference_key,
14    period_start,
15    period_end,
16    SUM(metric_value) AS total_metric_a
17  FROM `project.dataset.metrics_table_a`
18  WHERE record_date >= "2022-01-01"
19  GROUP BY reference_key, period_start, period_end
20),
21
22metrics_b AS (
23  SELECT
24    reference_key,
25    period_start,
26    period_end,
27    SUM(value_1) AS total_value_1,
28    SUM(value_2) AS total_value_2,
29    SUM(value_3) AS total_value_3,
30    SUM(amount) AS total_amount
31  FROM `project.dataset.metrics_table_b`
32  WHERE record_date >= "2022-01-01"
33  GROUP BY reference_key, period_start, period_end
34)
35
36SELECT
37  items.item_id,
38  SUM(metrics_a.total_metric_a) AS metric_a,
39  SUM(metrics_b.total_value_1) AS value_1,
40  SUM(metrics_b.total_value_2) AS value_2,
41  SUM(metrics_b.total_value_3) AS value_3,
42  SUM(metrics_b.total_amount) AS amount,
43  ROUND(SAFE_DIVIDE(SUM(metrics_b.total_value_1), SUM(metrics_a.total_metric_a)) * 100, 1) AS rate_percentage
44FROM items
45LEFT JOIN metrics_a
46  ON items.reference_key = metrics_a.reference_key
47  AND items.period_start = metrics_a.period_start
48LEFT JOIN metrics_b
49  ON items.reference_key = metrics_b.reference_key
50  AND items.period_start = metrics_b.period_start
51GROUP BY items.item_id
52ORDER BY items.item_id DESC

You can store this in a TypeScript config file as a string constant:

1export const DATA_QUERY = `...` // SQL above

Key SQL techniques:

CTEs (Common Table Expressions): Use WITH to break down complex queries into readable chunks
Aggregation with GROUP BY: Combine multiple time periods per item
LEFT JOINs: Ensure all items are included even if metrics data is missing
SAFE_DIVIDE: Prevents division by zero errors
Percentage calculation: Multiply by 100 and round for readable rates

Step 4: Query BigQuery and Map Results

1type BigQueryRow = {
2  item_id: string
3  metric_a: number
4  value_1: number
5  value_2: number
6  value_3: number
7  amount: number
8  rate_percentage: number
9}
10
11export type DataItem = {
12  id: string
13  metricA: string
14  value1: string
15  value2: string
16  value3: string
17  amount: string
18  ratePercentage: string
19}
20
21export class BigQueryReader {
22  // ... previous code ...
23
24  async getAllData(): Promise<DataItem[]> {
25    try {
26      const [rows] = await this.bigQueryClient.query({
27        query: DATA_QUERY,
28        location: 'US', // or your dataset location
29      })
30
31      return rows.map((row: BigQueryRow) => this.mapRowToData(row))
32    } catch (error) {
33      console.error('Error fetching data from BigQuery:', error)
34      throw error
35    }
36  }
37
38  private mapRowToData(row: BigQueryRow): DataItem {
39    return {
40      id: String(row.item_id || ''),
41      metricA: row.metric_a !== null ? String(row.metric_a) : '',
42      value1: row.value_1 !== null ? String(row.value_1) : '',
43      value2: row.value_2 !== null ? String(row.value_2) : '',
44      value3: row.value_3 !== null ? String(row.value_3) : '',
45      amount: row.amount !== null ? String(row.amount) : '',
46      ratePercentage:
47        row.rate_percentage !== null ? String(row.rate_percentage) : '',
48    }
49  }
50}
51
52export const fetchAllData = async (): Promise<DataItem[]> => {
53  const reader = new BigQueryReader()
54  return reader.getAllData()
55}

Step 5: Generate Static JSON at Build Time

Create a script to fetch BigQuery data and save it as JSON:

1// scripts/generate-data.ts
2import * as fs from 'fs'
3import * as path from 'path'
4import { fetchAllData } from '../src/libs/bigQuery'
5
6async function generateDataJson() {
7  try {
8    const data = await fetchAllData()
9    const outputPath = path.join(process.cwd(), 'public', 'data.json')
10
11    const publicDir = path.join(process.cwd(), 'public')
12    if (!fs.existsSync(publicDir)) {
13      fs.mkdirSync(publicDir, { recursive: true })
14    }
15
16    fs.writeFileSync(outputPath, JSON.stringify(data, null, 2))
17  } catch (error) {
18    console.error('Failed to generate data JSON:', error)
19    process.exit(1)
20  }
21}
22
23generateDataJson()

Update package.json:

1{
2  "scripts": {
3    "generate-data": "tsx scripts/generate-data.ts",
4    "generate-data:local": "NEXT_PUBLIC_APP_ENV=local tsx scripts/generate-data.ts",
5    "prebuild": "npm run generate-data",
6    "build": "npm run prebuild && next build"
7  }
8}

Step 6: GitHub Actions Setup

For CI/CD, you have two authentication options:

Option A: Service Account Key (simpler)

1env:
2  BIGQUERY_CREDENTIALS: ${{secrets.BIGQUERY_CREDENTIALS}}
3
4steps:
5  - name: Checkout
6    uses: actions/checkout@v4
7
8  - name: Setup Node.js
9    uses: actions/setup-node@v4
10    with:
11      node-version: 20
12
13  - name: Install Dependencies
14    run: npm ci
15
16  - name: Build
17    run: npm run build

Option B: Workload Identity Federation (more secure)

1permissions:
2  id-token: write
3  contents: read
4
5steps:
6  - name: Checkout
7    uses: actions/checkout@v4
8
9  - name: Authenticate to Google Cloud
10    uses: google-github-actions/auth@v2
11    with:
12      workload_identity_provider: ${{ secrets.WIF_PROVIDER }}
13      service_account: ${{ secrets.WIF_SERVICE_ACCOUNT }}
14
15  - name: Setup Node.js
16    uses: actions/setup-node@v4
17    with:
18      node-version: 20
19
20  - name: Build
21    run: npm run build

Required IAM Roles

Your service account needs these roles on the BigQuery project:

BigQuery Job User (roles/bigquery.jobUser) - to create and run queries
BigQuery Data Viewer (roles/bigquery.dataViewer) - to read table data

Without these roles, you'll get permission errors like:

1Access Denied: User does not have bigquery.jobs.create permission

Common Issues and Solutions

Issue 1: Duplicate IDs with Different Metrics

Problem: Same ID appearing multiple times due to multiple time periods or categories.

Solution: Use GROUP BY to aggregate at the ID level:

1SELECT
2  items.item_id,
3  SUM(metric_a) AS total_metric_a,
4  SUM(value_1) AS total_value_1,
5  ROUND(SAFE_DIVIDE(SUM(value_1), SUM(metric_a)) * 100, 1) AS rate
6FROM items
7LEFT JOIN metrics ON ...
8GROUP BY items.item_id  -- Aggregate all periods per ID

Issue 2: BigQuery API Not Enabled

Error: BigQuery API has not been used in project [ID] before or it is disabled

Solution: Enable the BigQuery API for your service account's project:

1https://console.developers.google.com/apis/api/bigquery.googleapis.com/overview?project=PROJECT_ID

Issue 3: Rate Displaying as Decimal

Problem: Rate shows as 0.042 instead of 4.2%

Solution: Multiply by 100 in the SQL query:

1ROUND(SAFE_DIVIDE(value_1, metric_a) * 100, 1) AS rate_percentage

Performance Comparison

Before (Google Sheets):

Query time: 30-60 seconds (with timeouts)
Memory usage: High (loading entire sheets)
Row limit: ~2000 rows
Aggregation: In JavaScript

After (BigQuery):

Query time: 3-5 seconds
Memory usage: Low (streaming results)
Row limit: Millions of rows
Aggregation: In SQL (much faster)

Local Development

Test locally with:

1# Make sure bigQueryCredentials.json exists in project root
2npm run generate-data:local

Add to .gitignore:

1bigQueryCredentials.json

Conclusion

Migrating from Google Sheets to BigQuery brought significant improvements:

✅ Faster queries: 3-5 seconds vs 30-60 seconds
✅ Better aggregation: SQL handles complex queries efficiently
✅ No row limits: Can scale to millions of records
✅ Cleaner code: SQL logic separated from application code
✅ No more timeouts: BigQuery is built for large datasets

The key insight: Use the right tool for the job. Google Sheets is great for collaboration, but for production data pipelines, a proper data warehouse like BigQuery is the way to go.