Integrating BigQuery with Node.js - From Google Sheets to Scalable Data Fetching

8 min read
Node.jsBigQuery

Table of Contents

The Problem

I previously wrote about solving Google Sheets timeout issues by generating static JSON at build time. While that solved the timeout problem, I still had limitations:

  • Google Sheets wasn't designed as a database
  • Complex queries required loading entire sheets into memory
  • Data aggregation happened in JavaScript rather than at the database level
  • Limited to 2000 rows due to memory constraints

The solution? Migrate to BigQuery and fetch data directly from the data warehouse.

Why BigQuery?

BigQuery is Google's serverless data warehouse that offers:

  • SQL queries: Complex aggregations and joins happen at the database level
  • Scalability: Can handle millions of rows without memory issues
  • Performance: Distributed query processing is much faster than processing in Node.js
  • Integration: Works seamlessly with Google Cloud authentication

Implementation

Step 1: Install BigQuery Client

1npm install @google-cloud/bigquery

Step 2: Set Up Authentication

Create a BigQuery reader class that handles authentication for both local development and production:

1import * as fs from 'fs'
2import * as path from 'path'
3import { BigQuery } from '@google-cloud/bigquery'
4
5type Credentials = {
6 client_email: string
7 private_key: string
8 project_id?: string
9}
10
11export class BigQueryReader {
12 private credentials: Credentials
13 private bigQueryClient: BigQuery
14
15 constructor() {
16 this.credentials = this.loadCredentials()
17 this.bigQueryClient = new BigQuery({
18 projectId: 'your-project-id',
19 credentials: {
20 client_email: this.credentials.client_email,
21 private_key: this.credentials.private_key,
22 },
23 })
24 }
25
26 private loadCredentials(): Credentials {
27 const credsJson = this.getCredentialsJson()
28
29 try {
30 const parsed = JSON.parse(credsJson)
31
32 if (!parsed.client_email || !parsed.private_key) {
33 throw new Error(
34 'BIGQUERY_CREDENTIALS must contain client_email and private_key',
35 )
36 }
37
38 return {
39 client_email: parsed.client_email,
40 private_key: parsed.private_key,
41 project_id: parsed.project_id,
42 }
43 } catch (e) {
44 throw new Error(
45 `Invalid JSON in BIGQUERY_CREDENTIALS: ${e instanceof Error ? e.message : String(e)}`,
46 )
47 }
48 }
49
50 private getCredentialsJson(): string {
51 // For local development, read from JSON file
52 if (process.env.NEXT_PUBLIC_APP_ENV === 'local') {
53 const credentialsPath = path.join(
54 process.cwd(),
55 'bigQueryCredentials.json',
56 )
57
58 if (fs.existsSync(credentialsPath)) {
59 try {
60 return fs.readFileSync(credentialsPath, 'utf-8')
61 } catch (e) {
62 throw new Error(
63 `Failed to read bigQueryCredentials.json: ${e instanceof Error ? e.message : String(e)}`,
64 )
65 }
66 }
67 }
68
69 // For production, use environment variable
70 const envCreds = process.env.BIGQUERY_CREDENTIALS
71
72 if (!envCreds) {
73 throw new Error(
74 'BIGQUERY_CREDENTIALS environment variable not set and bigQueryCredentials.json not found',
75 )
76 }
77
78 return envCreds
79 }
80}

Step 3: Write SQL Queries

Define your queries in a config file. Here's an example that aggregates data:

1WITH items AS (
2 SELECT
3 item_id,
4 item_name,
5 reference_key,
6 period_start,
7 period_end
8 FROM `project.dataset.items`
9),
10
11metrics_a AS (
12 SELECT
13 reference_key,
14 period_start,
15 period_end,
16 SUM(metric_value) AS total_metric_a
17 FROM `project.dataset.metrics_table_a`
18 WHERE record_date >= "2022-01-01"
19 GROUP BY reference_key, period_start, period_end
20),
21
22metrics_b AS (
23 SELECT
24 reference_key,
25 period_start,
26 period_end,
27 SUM(value_1) AS total_value_1,
28 SUM(value_2) AS total_value_2,
29 SUM(value_3) AS total_value_3,
30 SUM(amount) AS total_amount
31 FROM `project.dataset.metrics_table_b`
32 WHERE record_date >= "2022-01-01"
33 GROUP BY reference_key, period_start, period_end
34)
35
36SELECT
37 items.item_id,
38 SUM(metrics_a.total_metric_a) AS metric_a,
39 SUM(metrics_b.total_value_1) AS value_1,
40 SUM(metrics_b.total_value_2) AS value_2,
41 SUM(metrics_b.total_value_3) AS value_3,
42 SUM(metrics_b.total_amount) AS amount,
43 ROUND(SAFE_DIVIDE(SUM(metrics_b.total_value_1), SUM(metrics_a.total_metric_a)) * 100, 1) AS rate_percentage
44FROM items
45LEFT JOIN metrics_a
46 ON items.reference_key = metrics_a.reference_key
47 AND items.period_start = metrics_a.period_start
48LEFT JOIN metrics_b
49 ON items.reference_key = metrics_b.reference_key
50 AND items.period_start = metrics_b.period_start
51GROUP BY items.item_id
52ORDER BY items.item_id DESC

You can store this in a TypeScript config file as a string constant:

1export const DATA_QUERY = `...` // SQL above

Key SQL techniques:

  1. CTEs (Common Table Expressions): Use WITH to break down complex queries into readable chunks
  2. Aggregation with GROUP BY: Combine multiple time periods per item
  3. LEFT JOINs: Ensure all items are included even if metrics data is missing
  4. SAFE_DIVIDE: Prevents division by zero errors
  5. Percentage calculation: Multiply by 100 and round for readable rates

Step 4: Query BigQuery and Map Results

1type BigQueryRow = {
2 item_id: string
3 metric_a: number
4 value_1: number
5 value_2: number
6 value_3: number
7 amount: number
8 rate_percentage: number
9}
10
11export type DataItem = {
12 id: string
13 metricA: string
14 value1: string
15 value2: string
16 value3: string
17 amount: string
18 ratePercentage: string
19}
20
21export class BigQueryReader {
22 // ... previous code ...
23
24 async getAllData(): Promise<DataItem[]> {
25 try {
26 const [rows] = await this.bigQueryClient.query({
27 query: DATA_QUERY,
28 location: 'US', // or your dataset location
29 })
30
31 return rows.map((row: BigQueryRow) => this.mapRowToData(row))
32 } catch (error) {
33 console.error('Error fetching data from BigQuery:', error)
34 throw error
35 }
36 }
37
38 private mapRowToData(row: BigQueryRow): DataItem {
39 return {
40 id: String(row.item_id || ''),
41 metricA: row.metric_a !== null ? String(row.metric_a) : '',
42 value1: row.value_1 !== null ? String(row.value_1) : '',
43 value2: row.value_2 !== null ? String(row.value_2) : '',
44 value3: row.value_3 !== null ? String(row.value_3) : '',
45 amount: row.amount !== null ? String(row.amount) : '',
46 ratePercentage:
47 row.rate_percentage !== null ? String(row.rate_percentage) : '',
48 }
49 }
50}
51
52export const fetchAllData = async (): Promise<DataItem[]> => {
53 const reader = new BigQueryReader()
54 return reader.getAllData()
55}

Step 5: Generate Static JSON at Build Time

Create a script to fetch BigQuery data and save it as JSON:

1// scripts/generate-data.ts
2import * as fs from 'fs'
3import * as path from 'path'
4import { fetchAllData } from '../src/libs/bigQuery'
5
6async function generateDataJson() {
7 try {
8 const data = await fetchAllData()
9 const outputPath = path.join(process.cwd(), 'public', 'data.json')
10
11 const publicDir = path.join(process.cwd(), 'public')
12 if (!fs.existsSync(publicDir)) {
13 fs.mkdirSync(publicDir, { recursive: true })
14 }
15
16 fs.writeFileSync(outputPath, JSON.stringify(data, null, 2))
17 } catch (error) {
18 console.error('Failed to generate data JSON:', error)
19 process.exit(1)
20 }
21}
22
23generateDataJson()

Update package.json:

1{
2 "scripts": {
3 "generate-data": "tsx scripts/generate-data.ts",
4 "generate-data:local": "NEXT_PUBLIC_APP_ENV=local tsx scripts/generate-data.ts",
5 "prebuild": "npm run generate-data",
6 "build": "npm run prebuild && next build"
7 }
8}

Step 6: GitHub Actions Setup

For CI/CD, you have two authentication options:

Option A: Service Account Key (simpler)

1env:
2 BIGQUERY_CREDENTIALS: ${{secrets.BIGQUERY_CREDENTIALS}}
3
4steps:
5 - name: Checkout
6 uses: actions/checkout@v4
7
8 - name: Setup Node.js
9 uses: actions/setup-node@v4
10 with:
11 node-version: 20
12
13 - name: Install Dependencies
14 run: npm ci
15
16 - name: Build
17 run: npm run build

Option B: Workload Identity Federation (more secure)

1permissions:
2 id-token: write
3 contents: read
4
5steps:
6 - name: Checkout
7 uses: actions/checkout@v4
8
9 - name: Authenticate to Google Cloud
10 uses: google-github-actions/auth@v2
11 with:
12 workload_identity_provider: ${{ secrets.WIF_PROVIDER }}
13 service_account: ${{ secrets.WIF_SERVICE_ACCOUNT }}
14
15 - name: Setup Node.js
16 uses: actions/setup-node@v4
17 with:
18 node-version: 20
19
20 - name: Build
21 run: npm run build

Required IAM Roles

Your service account needs these roles on the BigQuery project:

  1. BigQuery Job User (roles/bigquery.jobUser) - to create and run queries
  2. BigQuery Data Viewer (roles/bigquery.dataViewer) - to read table data

Without these roles, you'll get permission errors like:

1Access Denied: User does not have bigquery.jobs.create permission

Common Issues and Solutions

Issue 1: Duplicate IDs with Different Metrics

Problem: Same ID appearing multiple times due to multiple time periods or categories.

Solution: Use GROUP BY to aggregate at the ID level:

1SELECT
2 items.item_id,
3 SUM(metric_a) AS total_metric_a,
4 SUM(value_1) AS total_value_1,
5 ROUND(SAFE_DIVIDE(SUM(value_1), SUM(metric_a)) * 100, 1) AS rate
6FROM items
7LEFT JOIN metrics ON ...
8GROUP BY items.item_id -- Aggregate all periods per ID

Issue 2: BigQuery API Not Enabled

Error: BigQuery API has not been used in project [ID] before or it is disabled

Solution: Enable the BigQuery API for your service account's project:

1https://console.developers.google.com/apis/api/bigquery.googleapis.com/overview?project=PROJECT_ID

Issue 3: Rate Displaying as Decimal

Problem: Rate shows as 0.042 instead of 4.2%

Solution: Multiply by 100 in the SQL query:

1ROUND(SAFE_DIVIDE(value_1, metric_a) * 100, 1) AS rate_percentage

Performance Comparison

Before (Google Sheets):

  • Query time: 30-60 seconds (with timeouts)
  • Memory usage: High (loading entire sheets)
  • Row limit: ~2000 rows
  • Aggregation: In JavaScript

After (BigQuery):

  • Query time: 3-5 seconds
  • Memory usage: Low (streaming results)
  • Row limit: Millions of rows
  • Aggregation: In SQL (much faster)

Local Development

Test locally with:

1# Make sure bigQueryCredentials.json exists in project root
2npm run generate-data:local

Add to .gitignore:

1bigQueryCredentials.json

Conclusion

Migrating from Google Sheets to BigQuery brought significant improvements:

  • Faster queries: 3-5 seconds vs 30-60 seconds
  • Better aggregation: SQL handles complex queries efficiently
  • No row limits: Can scale to millions of records
  • Cleaner code: SQL logic separated from application code
  • No more timeouts: BigQuery is built for large datasets

The key insight: Use the right tool for the job. Google Sheets is great for collaboration, but for production data pipelines, a proper data warehouse like BigQuery is the way to go.

Related Articles