Brahma's Personal Blog

Writing about Data Engineering problems and SQL

View the Project on GitHub brahma19/blog

7 September 2024

Mastering Time Zones in BigQuery: Converting to UTC

by Brahmanand Singh

Mastering Time Zones in BigQuery: Converting to UTC

When working with time-sensitive data across different geographical locations, handling time zones correctly is crucial. Google BigQuery provides powerful functions to manage time zones, making it easier to standardize your data to Coordinated Universal Time (UTC). In this post, we’ll explore the importance of time zones, how BigQuery stores time-related fields, and how to use BigQuery’s timezone functions to convert input timestamps to UTC.

The Importance of Time Zones and UTC

Time zones are critical when dealing with global data. Events recorded in different regions—such as a meeting in New York, a conference in Tokyo, or a transaction in London—occur in local time zones that may differ significantly due to geographic location and daylight saving time (DST) adjustments. These variations can lead to inconsistencies in data analysis if not handled properly. For example, a timestamp like 2024-09-08 14:30:00 in New York (America/New_York) is not equivalent to the same timestamp in Tokyo (Asia/Tokyo) due to the time zone offset.

Why UTC Matters:

By converting all timestamps to UTC in your data pipeline, you create a unified, reliable foundation for reporting, analytics, and auditing.

How BigQuery Stores Time Fields

BigQuery offers several data types for handling time-related data, each with specific use cases and behaviors when dealing with time zones:

Key Points About Time Zone Handling in BigQuery

Converting Timestamps to UTC in BigQuery

BigQuery provides functions like TIMESTAMP, DATETIME, PARSE_TIMESTAMP, and PARSE_DATETIME to convert timestamps between time zones. Here’s a quick example using a table with events in different time zones:

BigQuery Timezone Functions

BigQuery offers several functions to work with time zones. We’ll focus on two main functions:

  1. TIMESTAMP: Conevert timestamp from onetimezone to another.
  2. DATETIME: Converts a datetime from one timezone to another .
  3. PARSE_TIMESTAMP: Parses a string to TIMESTAMP and converts to another.
  4. PARSE_DATETIME: Parses a string to DATETIME and converts to another.

Converting Local Time to UTC

Let’s say we have timestamps from different time zones, and we want to convert them all to UTC. Here’s how we can do it:

Imagine we have a table global_events with events from different time zones:

CREATE TABLE global_events (
  event_name STRING,
  local_timestamp STRING,
  timezone STRING
);

INSERT INTO global_events
VALUES
  ('New York Event', '2024-09-08 14:30:00', 'America/New_York'),
  ('London Meeting', '2024-09-08 19:30:00', 'Europe/London'),
  ('Tokyo Conference', '2024-09-09 10:30:00', 'Asia/Tokyo');
  ('India Conference', '2024-09-09 12:30:00', 'Asia/Calcutta');

Now, let’s convert all these local times to UTC:

SELECT
  event_name,
  local_timestamp,
  timezone,
  DATETIME(TIMESTAMP(local_timestamp, timezone), 'UTC') AS utc_timestamp
FROM
  `project_id.test_dataset.global_events`;


This query will produce the following result:

## Event Timestamps Converted to UTC

Below is a table displaying event details with their local timestamps and corresponding UTC timestamps, based on the specified timezones.

| Event Name        | Local Timestamp      | Timezone          | UTC Timestamp       |
|-------------------|----------------------|-------------------|---------------------|
| New York Event    | 2024-09-08 14:30:00 | America/New_York  | 2024-09-08T18:30:00 |
| London Meeting    | 2024-09-08 19:30:00 | Europe/London     | 2024-09-08T18:30:00 |
| Tokyo Conference  | 2024-09-09 10:30:00 | Asia/Tokyo        | 2024-09-09T01:30:00 |
| India Conference  | 2024-09-09 12:30:00 | Asia/Calcutta     | 2024-09-09T07:00:00 |

Note: The UTC Timestamp reflects the conversion of the Local Timestamp from the specified Timezone to UTC, accounting for timezone offsets (e.g., America/New_York was in EDT, -4 hours; Europe/London was in BST, +1 hour; Asia/Tokyo, +9 hours; Asia/Calcutta, +5:30 hours).

Best Practices

  1. Always ask the input timestamp / datetime fields timezone if the value is missing the timezone offset.
  2. Convert to local time zones only when presenting data to users (UI Layer) .
  3. Be aware of daylight saving time changes when working with historical data.
  4. Use the IANA time zone database names (like ‘America/New_York’) instead of abbreviations (like ‘EST’) for more accurate results.
  5. If the time zone is unknown, document assumptions (e.g., assume UTC or a default like America/New_York) and validate with stakeholders.

Conclusion

Handling time zones correctly is essential for maintaining data integrity in global applications. BigQuery’s timezone functions make it easy to standardize your timestamps to UTC, ensuring consistency across your data pipelines and analyses.

By converting all timestamps to UTC at the point of data ingestion or processing, we can simplify data pipelines and avoid the complexities of dealing with multiple time zones in your queries and applications.

Remember, when it comes to time data, UTC is your friend!


I hope this blog post helps you and your readers better understand working with time zones in BigQuery. Happy querying!

tags: bigquery - timezone - utc - sql