Writing about Data Engineering problems and SQL
by Brahmanand Singh
Monitoring and optimizing costs is essential in BigQuery’s serverless environment, where queries can quickly accumulate expenses based on data scanned and compute slots used. In this post, we’ll explore a straightforward SQL query that leverages BigQuery’s INFORMATION_SCHEMA.JOBS view to uncover your most costly queries. By focusing on bytes billed and slot usage, this query helps data engineers and analysts pinpoint optimization opportunities, such as inefficient joins or unpartitioned scans.
This can be scheduled via Cloud Composer or run ad-hoc in the BigQuery console.
Adjust the region (region-US) to match your setup (e.g., region-eu). For broader history, add a date filter: AND DATE(creation_time) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY).
sql
SELECT
creation_time,
total_bytes_billed,
SAFE_DIVIDE(total_slot_ms, TIMESTAMP_DIFF(end_time, start_time, MILLISECOND)) AS job_avg_slots,
ROUND((total_slot_ms / 1000.0 / 60 / 60) * 0.04, 8) AS estimated_slot_cost_usd
FROM `your_project_id`.`region-US`.INFORMATION_SCHEMA.JOBS
WHERE
job_type = 'QUERY'
AND (statement_type != 'SCRIPT' OR statement_type IS NULL)
AND COALESCE(total_bytes_billed, 0) > 0
ORDER BY total_bytes_billed DESC;
You will get an output like below, adjut the output rows by limiting to top 10 byte processed queries, you can get the query text used and work with the team on optimization.
Bigquery Information schema provides complete metadata about the jobs that can be used to proactively optimize and control spend. Pair it with reservations for flat-rate savings or BI tools for dashboards
tags: GCP - Bigquery - FinOps