Logo

Integrating BigQuery with Node.js - From Google Sheets to Scalable Data Fetching

8 min read
Node.jsBigQuery

Table of Contents

The Problem

I previously wrote about solving Google Sheets timeout issues by generating static JSON at build time. While that solved the timeout problem, I still had limitations:

The solution? Migrate to BigQuery and fetch data directly from the data warehouse.

Why BigQuery?

BigQuery is Google's serverless data warehouse that offers:

Implementation

Step 1: Install BigQuery Client

1npm install @google-cloud/bigquery

Step 2: Set Up Authentication

Create a BigQuery reader class that handles authentication for both local development and production:

1import * as fs from 'fs'
2import * as path from 'path'
3import { BigQuery } from '@google-cloud/bigquery'
4
5type Credentials = {
6 client_email: string
7 private_key: string
8 project_id?: string
9}
10
11export class BigQueryReader {
12 private credentials: Credentials
13 private bigQueryClient: BigQuery
14
15 constructor() {
16 this.credentials = this.loadCredentials()
17 this.bigQueryClient = new BigQuery({
18 projectId: 'your-project-id',
19 credentials: {
20 client_email: this.credentials.client_email,
21 private_key: this.credentials.private_key,
22 },
23 })
24 }
25
26 private loadCredentials(): Credentials {
27 const credsJson = this.getCredentialsJson()
28
29 try {
30 const parsed = JSON.parse(credsJson)
31
32 if (!parsed.client_email || !parsed.private_key) {
33 throw new Error(
34 'BIGQUERY_CREDENTIALS must contain client_email and private_key',
35 )
36 }
37
38 return {
39 client_email: parsed.client_email,
40 private_key: parsed.private_key,
41 project_id: parsed.project_id,
42 }
43 } catch (e) {
44 throw new Error(
45 `Invalid JSON in BIGQUERY_CREDENTIALS: ${e instanceof Error ? e.message : String(e)}`,
46 )
47 }
48 }
49
50 private getCredentialsJson(): string {
51 // For local development, read from JSON file
52 if (process.env.NEXT_PUBLIC_APP_ENV === 'local') {
53 const credentialsPath = path.join(
54 process.cwd(),
55 'bigQueryCredentials.json',
56 )
57
58 if (fs.existsSync(credentialsPath)) {
59 try {
60 return fs.readFileSync(credentialsPath, 'utf-8')
61 } catch (e) {
62 throw new Error(
63 `Failed to read bigQueryCredentials.json: ${e instanceof Error ? e.message : String(e)}`,
64 )
65 }
66 }
67 }
68
69 // For production, use environment variable
70 const envCreds = process.env.BIGQUERY_CREDENTIALS
71
72 if (!envCreds) {
73 throw new Error(
74 'BIGQUERY_CREDENTIALS environment variable not set and bigQueryCredentials.json not found',
75 )
76 }
77
78 return envCreds
79 }
80}

Step 3: Write SQL Queries

Define your queries in a config file. Here's an example that aggregates data:

1WITH items AS (
2 SELECT
3 item_id,
4 item_name,
5 reference_key,
6 period_start,
7 period_end
8 FROM `project.dataset.items`
9),
10
11metrics_a AS (
12 SELECT
13 reference_key,
14 period_start,
15 period_end,
16 SUM(metric_value) AS total_metric_a
17 FROM `project.dataset.metrics_table_a`
18 WHERE record_date >= "2022-01-01"
19 GROUP BY reference_key, period_start, period_end
20),
21
22metrics_b AS (
23 SELECT
24 reference_key,
25 period_start,
26 period_end,
27 SUM(value_1) AS total_value_1,
28 SUM(value_2) AS total_value_2,
29 SUM(value_3) AS total_value_3,
30 SUM(amount) AS total_amount
31 FROM `project.dataset.metrics_table_b`
32 WHERE record_date >= "2022-01-01"
33 GROUP BY reference_key, period_start, period_end
34)
35
36SELECT
37 items.item_id,
38 SUM(metrics_a.total_metric_a) AS metric_a,
39 SUM(metrics_b.total_value_1) AS value_1,
40 SUM(metrics_b.total_value_2) AS value_2,
41 SUM(metrics_b.total_value_3) AS value_3,
42 SUM(metrics_b.total_amount) AS amount,
43 ROUND(SAFE_DIVIDE(SUM(metrics_b.total_value_1), SUM(metrics_a.total_metric_a)) * 100, 1) AS rate_percentage
44FROM items
45LEFT JOIN metrics_a
46 ON items.reference_key = metrics_a.reference_key
47 AND items.period_start = metrics_a.period_start
48LEFT JOIN metrics_b
49 ON items.reference_key = metrics_b.reference_key
50 AND items.period_start = metrics_b.period_start
51GROUP BY items.item_id
52ORDER BY items.item_id DESC

You can store this in a TypeScript config file as a string constant:

1export const DATA_QUERY = `...` // SQL above

Key SQL techniques:

  1. CTEs (Common Table Expressions): Use WITH to break down complex queries into readable chunks
  2. Aggregation with GROUP BY: Combine multiple time periods per item
  3. LEFT JOINs: Ensure all items are included even if metrics data is missing
  4. SAFE_DIVIDE: Prevents division by zero errors
  5. Percentage calculation: Multiply by 100 and round for readable rates

Step 4: Query BigQuery and Map Results

1type BigQueryRow = {
2 item_id: string
3 metric_a: number
4 value_1: number
5 value_2: number
6 value_3: number
7 amount: number
8 rate_percentage: number
9}
10
11export type DataItem = {
12 id: string
13 metricA: string
14 value1: string
15 value2: string
16 value3: string
17 amount: string
18 ratePercentage: string
19}
20
21export class BigQueryReader {
22 // ... previous code ...
23
24 async getAllData(): Promise<DataItem[]> {
25 try {
26 const [rows] = await this.bigQueryClient.query({
27 query: DATA_QUERY,
28 location: 'US', // or your dataset location
29 })
30
31 return rows.map((row: BigQueryRow) => this.mapRowToData(row))
32 } catch (error) {
33 console.error('Error fetching data from BigQuery:', error)
34 throw error
35 }
36 }
37
38 private mapRowToData(row: BigQueryRow): DataItem {
39 return {
40 id: String(row.item_id || ''),
41 metricA: row.metric_a !== null ? String(row.metric_a) : '',
42 value1: row.value_1 !== null ? String(row.value_1) : '',
43 value2: row.value_2 !== null ? String(row.value_2) : '',
44 value3: row.value_3 !== null ? String(row.value_3) : '',
45 amount: row.amount !== null ? String(row.amount) : '',
46 ratePercentage:
47 row.rate_percentage !== null ? String(row.rate_percentage) : '',
48 }
49 }
50}
51
52export const fetchAllData = async (): Promise<DataItem[]> => {
53 const reader = new BigQueryReader()
54 return reader.getAllData()
55}

Step 5: Generate Static JSON at Build Time

Create a script to fetch BigQuery data and save it as JSON:

1// scripts/generate-data.ts
2import * as fs from 'fs'
3import * as path from 'path'
4import { fetchAllData } from '../src/libs/bigQuery'
5
6async function generateDataJson() {
7 try {
8 const data = await fetchAllData()
9 const outputPath = path.join(process.cwd(), 'public', 'data.json')
10
11 const publicDir = path.join(process.cwd(), 'public')
12 if (!fs.existsSync(publicDir)) {
13 fs.mkdirSync(publicDir, { recursive: true })
14 }
15
16 fs.writeFileSync(outputPath, JSON.stringify(data, null, 2))
17 } catch (error) {
18 console.error('Failed to generate data JSON:', error)
19 process.exit(1)
20 }
21}
22
23generateDataJson()

Update package.json:

1{
2 "scripts": {
3 "generate-data": "tsx scripts/generate-data.ts",
4 "generate-data:local": "NEXT_PUBLIC_APP_ENV=local tsx scripts/generate-data.ts",
5 "prebuild": "npm run generate-data",
6 "build": "npm run prebuild && next build"
7 }
8}

Step 6: GitHub Actions Setup

For CI/CD, you have two authentication options:

Option A: Service Account Key (simpler)

1env:
2 BIGQUERY_CREDENTIALS: ${{secrets.BIGQUERY_CREDENTIALS}}
3
4steps:
5 - name: Checkout
6 uses: actions/checkout@v4
7
8 - name: Setup Node.js
9 uses: actions/setup-node@v4
10 with:
11 node-version: 20
12
13 - name: Install Dependencies
14 run: npm ci
15
16 - name: Build
17 run: npm run build

Option B: Workload Identity Federation (more secure)

1permissions:
2 id-token: write
3 contents: read
4
5steps:
6 - name: Checkout
7 uses: actions/checkout@v4
8
9 - name: Authenticate to Google Cloud
10 uses: google-github-actions/auth@v2
11 with:
12 workload_identity_provider: ${{ secrets.WIF_PROVIDER }}
13 service_account: ${{ secrets.WIF_SERVICE_ACCOUNT }}
14
15 - name: Setup Node.js
16 uses: actions/setup-node@v4
17 with:
18 node-version: 20
19
20 - name: Build
21 run: npm run build

Required IAM Roles

Your service account needs these roles on the BigQuery project:

  1. BigQuery Job User (roles/bigquery.jobUser) - to create and run queries
  2. BigQuery Data Viewer (roles/bigquery.dataViewer) - to read table data

Without these roles, you'll get permission errors like:

1Access Denied: User does not have bigquery.jobs.create permission

Common Issues and Solutions

Issue 1: Duplicate IDs with Different Metrics

Problem: Same ID appearing multiple times due to multiple time periods or categories.

Solution: Use GROUP BY to aggregate at the ID level:

1SELECT
2 items.item_id,
3 SUM(metric_a) AS total_metric_a,
4 SUM(value_1) AS total_value_1,
5 ROUND(SAFE_DIVIDE(SUM(value_1), SUM(metric_a)) * 100, 1) AS rate
6FROM items
7LEFT JOIN metrics ON ...
8GROUP BY items.item_id -- Aggregate all periods per ID

Issue 2: BigQuery API Not Enabled

Error: BigQuery API has not been used in project [ID] before or it is disabled

Solution: Enable the BigQuery API for your service account's project:

1https://console.developers.google.com/apis/api/bigquery.googleapis.com/overview?project=PROJECT_ID

Issue 3: Rate Displaying as Decimal

Problem: Rate shows as 0.042 instead of 4.2%

Solution: Multiply by 100 in the SQL query:

1ROUND(SAFE_DIVIDE(value_1, metric_a) * 100, 1) AS rate_percentage

Performance Comparison

Before (Google Sheets):

After (BigQuery):

Local Development

Test locally with:

1# Make sure bigQueryCredentials.json exists in project root
2npm run generate-data:local

Add to .gitignore:

1bigQueryCredentials.json

Conclusion

Migrating from Google Sheets to BigQuery brought significant improvements:

The key insight: Use the right tool for the job. Google Sheets is great for collaboration, but for production data pipelines, a proper data warehouse like BigQuery is the way to go.

Related Articles