Brahma's Personal Blog

Writing about Data Engineering problems and SQL

View the Project on GitHub brahma19/blog

1 February 2026

High-Volume Data Transfer from Teradata to BigQuery Using Teradata Parallel Transporter (TPT)

by Brahmanand Singh

In enterprise environments, migrating or syncing large-scale datasets from legacy on-premises systems like Teradata to cloud-native warehouses like Google BigQuery is a common requirement. When dealing with terabytes or petabytes of data, efficiency and parallelism are critical to minimize downtime and costs. Teradata Parallel Transporter (TPT) is a powerful utility designed for high-throughput data unloading from Teradata databases. In this post, we’ll explore how to use TPT to export data at scale, stage it in Google Cloud Storage (GCS), and load it into BigQuery.This approach is ideal for one-time migrations, historical data backfills, or ongoing high-volume transfers where custom control over parallelism and partitioning is needed.

Why This Approach?

TPT stands out for high-volume transfers because:

Compared to tools like JDBC or basic exports, TPT can achieve significantly higher throughput. For fully managed migrations, consider BigQuery Data Transfer Service with a migration agent (which can use TPT under the hood), but for custom high-volume control, manual TPT is excellent.

Architecture Overview

The pipeline consists of:

For optimal performance with high volume:Run TPT on a VM with high CPU/disk (e.g., n2-standard-32 with persistent SSD). Partition Teradata tables and export partition-by-partition.

Implementation Highlights

Prerequisites Install Teradata Tools & Utilities (TTU) including TPT on a Linux server/VM. If using direct GCS export: Install Teradata Access Module for GCS and configure credentials (JSON key file for service account). Create a GCS bucket and BigQuery dataset. Grant necessary IAM roles: Storage Object Creator/Admin for GCS, BigQuery Data Editor for loading. Prepare schema JSON for BigQuery (use bq show –schema from Teradata table or generate manually).

Code:

TPT Export Script ExampleCreate a TPT job script (export_table.tpt) for a table export:tpt

DEFINE JOB Export_To_GCS_Direct
DESCRIPTION 'High-volume parallel export from Teradata directly to GCS bucket'
(
  DEFINE OPERATOR Export_Operator
  TYPE EXPORT
  SCHEMA *
  ATTRIBUTES
  (
    VARCHAR TdpId         = 'your_teradata_host',
    VARCHAR UserName      = 'your_user',
    VARCHAR UserPassword  = 'your_password',
    VARCHAR SelectStmt,
    VARCHAR MaxSessions   = '32',   -- Tune based on Teradata capacity
    VARCHAR MinSessions   = '8'
  );

  DEFINE OPERATOR GCS_Writer
  TYPE DATACONNECTOR PRODUCER
  SCHEMA *
  ATTRIBUTES
  (
    VARCHAR AccessModuleName   = 'gcsaxsmod',  -- The GCS Access Module
    VARCHAR AccessModuleInitStr = 'GCSBucket=your-bucket-name;GCSCredentialsFile=/path/to/your-service-account-key.json;GCSProjectId=your-gcp-project-id',
    VARCHAR FileName           = 'your_db/your_table/part_@jobid.dat',  -- GCS object path (prefix + filename)
    VARCHAR Format             = 'Delimited',
    VARCHAR TextDelimiter      = '|',
    VARCHAR OpenMode           = 'Write',
    VARCHAR IndicatorMode      = 'N',
    VARCHAR GCSMaxObjectSize   = '5GB'   -- Optional: chunk large files (supports multiple instances)
  );

  APPLY
    TO OPERATOR (GCS_Writer [@num_instances])   -- Parallel writer instances
  SELECT * FROM OPERATOR (Export_Operator [@num_instances]
    ATTRIBUTES (SelectStmt = 'SELECT * FROM your_db.your_table WHERE partition_condition;')
  );
);

Run with: tbuild -f export_table.tpt -v “num_instances=16” -v “SelectStmt=…”

For partitioned/chunked export: Generate multiple scripts or use substitution variables for WHERE clauses (e.g., date ranges). Example for daily partitions: Loop over dates, set SelectStmt accordingly.

Loading to BigQueryUse bq CLI or API:


bq load --source_format=CSV \
  --field_delimiter='|' \
  --skip_leading_rows=0 \
  --max_bad_records=100 \
  --replace \
  your_dataset.your_table \
  gs://your-bucket/path/table_name/*.dat \
  your_schema.json

For partitioned tables: Add --time_partitioning_field=partition_col.
Wildcard * enables parallel load across files.
For schema auto-detection: Use --autodetect.

Handling Multiple Tables / High Volume Best PracticesParallelize:

Testing

Export a small partition and load to a test BigQuery table. Validate row counts and sample data. Monitor throughput (GB/hour) and adjust parallelism. Clean up staging files after successful load.

Conclusion

Using Teradata Parallel Transporter (TPT) enables efficient, high-volume data transfer to BigQuery by leveraging parallelism at the source and scalable ingestion in GCP. Whether for full migrations or incremental syncs, this method provides control and performance. For production, consider orchestrating with Cloud Composer and monitoring costs.Explore Teradata’s Access Module for GCS for direct exports and BigQuery Data Transfer Service for managed alternatives.

tags: GCP - BigQuery - Teradata - TPT - DataMigration - ETL