Writing about Data Engineering problems and SQL
by Brahmanand Singh
When working with time-sensitive data across different geographical locations, handling time zones correctly is crucial. Google BigQuery provides powerful functions to manage time zones, making it easier to standardize your data to Coordinated Universal Time (UTC). In this post, we’ll explore the importance of time zones, how BigQuery stores time-related fields, and how to use BigQuery’s timezone functions to convert input timestamps to UTC.
Time zones are critical when dealing with global data. Events recorded in different regions—such as a meeting in New York, a conference in Tokyo, or a transaction in London—occur in local time zones that may differ significantly due to geographic location and daylight saving time (DST) adjustments. These variations can lead to inconsistencies in data analysis if not handled properly. For example, a timestamp like 2024-09-08 14:30:00
in New York (America/New_York) is not equivalent to the same timestamp in Tokyo (Asia/Tokyo) due to the time zone offset.
Why UTC Matters:
By converting all timestamps to UTC in your data pipeline, you create a unified, reliable foundation for reporting, analytics, and auditing.
BigQuery offers several data types for handling time-related data, each with specific use cases and behaviors when dealing with time zones:
2024-09-08 14:30:00 America/New_York
), BigQuery converts it to UTC internally (e.g., 2024-09-08 18:30:00Z
). When querying, you can display it in any time zone using functions like TIMESTAMP
or DATETIME
.
TIMESTAMP("2024-09-08 14:30:00", "America/New_York")
becomes 2024-09-08 18:30:00Z
in UTC.2024-09-08 14:30:00
) but does not inherently know the time zone. You must explicitly specify the time zone when converting to UTC.
DATETIME("2024-09-08 14:30:00", "America/New_York")
can be converted to UTC using DATETIME(..., "UTC")
.2024-09-08
), with no time or time zone information.14:30:00
), with no date or time zone information.TIMESTAMP
, BigQuery always stores it in UTC. Any time zone information provided during ingestion is used to convert to UTC.DATETIME
fields do not store time zone information, so you must explicitly handle time zones during conversions (e.g., using DATETIME(..., timezone)
).PARSE_TIMESTAMP
or PARSE_DATETIME
to convert string-based timestamps (e.g., “2024-09-08 14:30:00”) into TIMESTAMP
or DATETIME
types, specifying the source time zone if known.
PARSE_TIMESTAMP("%Y-%m-%d %H:%M:%S", "2024-09-08 14:30:00", "America/New_York")
yields a UTC TIMESTAMP
.BigQuery provides functions like TIMESTAMP
, DATETIME
, PARSE_TIMESTAMP
, and PARSE_DATETIME
to convert timestamps between time zones. Here’s a quick example using a table with events in different time zones:
BigQuery offers several functions to work with time zones. We’ll focus on two main functions:
TIMESTAMP
: Conevert timestamp from onetimezone to another.DATETIME
: Converts a datetime from one timezone to another .PARSE_TIMESTAMP
: Parses a string to TIMESTAMP and converts to another.PARSE_DATETIME
: Parses a string to DATETIME and converts to another.Let’s say we have timestamps from different time zones, and we want to convert them all to UTC. Here’s how we can do it:
Imagine we have a table global_events
with events from different time zones:
CREATE TABLE global_events (
event_name STRING,
local_timestamp STRING,
timezone STRING
);
INSERT INTO global_events
VALUES
('New York Event', '2024-09-08 14:30:00', 'America/New_York'),
('London Meeting', '2024-09-08 19:30:00', 'Europe/London'),
('Tokyo Conference', '2024-09-09 10:30:00', 'Asia/Tokyo');
('India Conference', '2024-09-09 12:30:00', 'Asia/Calcutta');
Now, let’s convert all these local times to UTC:
SELECT
event_name,
local_timestamp,
timezone,
DATETIME(TIMESTAMP(local_timestamp, timezone), 'UTC') AS utc_timestamp
FROM
`project_id.test_dataset.global_events`;
This query will produce the following result:
## Event Timestamps Converted to UTC
Below is a table displaying event details with their local timestamps and corresponding UTC timestamps, based on the specified timezones.
| Event Name | Local Timestamp | Timezone | UTC Timestamp |
|-------------------|----------------------|-------------------|---------------------|
| New York Event | 2024-09-08 14:30:00 | America/New_York | 2024-09-08T18:30:00 |
| London Meeting | 2024-09-08 19:30:00 | Europe/London | 2024-09-08T18:30:00 |
| Tokyo Conference | 2024-09-09 10:30:00 | Asia/Tokyo | 2024-09-09T01:30:00 |
| India Conference | 2024-09-09 12:30:00 | Asia/Calcutta | 2024-09-09T07:00:00 |
Note: The UTC Timestamp
reflects the conversion of the Local Timestamp
from the specified Timezone
to UTC, accounting for timezone offsets (e.g., America/New_York was in EDT, -4 hours; Europe/London was in BST, +1 hour; Asia/Tokyo, +9 hours; Asia/Calcutta, +5:30 hours).
Handling time zones correctly is essential for maintaining data integrity in global applications. BigQuery’s timezone functions make it easy to standardize your timestamps to UTC, ensuring consistency across your data pipelines and analyses.
By converting all timestamps to UTC at the point of data ingestion or processing, we can simplify data pipelines and avoid the complexities of dealing with multiple time zones in your queries and applications.
Remember, when it comes to time data, UTC is your friend!
I hope this blog post helps you and your readers better understand working with time zones in BigQuery. Happy querying!
tags: bigquery - timezone - utc - sql