About Me
Hello! I’m a Data Engineer with a strong focus on Google Cloud Platform (GCP) and Hadoop services. I’m passionate about big data and cloud computing, solving complex data challenges with cutting-edge technology. I thrive as a problem solver who loves to work with data.
GCP Certifications
- Professional Data Engineer
- Professional Cloud Database Engineer
- Professional Cloud Architect
Leadership
- Team Leadership: Proven track record in leading high-performance teams of data engineers, fostering a culture of excellence and innovation.
- Mentoring: Committed to developing team capabilities through mentorship, enhancing skills in data management and cloud technologies.
Strategic Impact
- Solution Design: Architect and implement scalable data solutions that align with business objectives.
- Migration Strategy: Lead strategic migrations to cloud environments, ensuring cost-efficiency and performance optimization.
Budget Management
- Managing > $50 million USD Opex budget on GCP, with a focus on cost optimization and resource efficiency.
Passion
- Dedicated to solving complex data challenges, driving technological advancement, and leveraging data for strategic insights.
In my free time, I enjoy playing chess. You can find me on Chess.com or Lichess playing blitz games.
Expertise
- Google Cloud Platform (GCP)
- BigQuery
- Cloud Run
- Cloud Storage
- Dataflow
- Dataproc
- Pub/Sub
- Cloud Composer (Airflow)
- Hadoop Ecosystem
- HDFS
- Hive
- Pig
- Spark
- Oozie
- Legacy Data Warehouse Modernization
- Expertise in transforming and modernizing legacy on-premises data warehouses, including Oracle Exadata and Teradata, to Google Cloud Platform, enabling scalable, cost-efficient, and high-performance cloud-native solutions.
- Programming Languages
- Skills
- Strong SQL skills with BigQuery optimization
- Cloud financial operations (FinOps)
- Data warehousing
- Dimensional modeling
Projects
1. Real-Time Data Pipeline with GCP
Designed and implemented a real-time data pipeline using Google Cloud Pub/Sub, Dataflow, and BigQuery to process and analyze streaming data.
2. Hadoop Cluster Optimization
Optimized a large-scale Hadoop cluster, improving performance and reducing costs through resource allocation fine-tuning and implementing best practices.
3. Data Lake on GCP
Migrated on-premises Hadoop workloads and data to GCP, integrating with BigQuery and Dataproc for analytics and machine learning.
4. BigQuery ELT for Wireless Customers
Developed an ELT process in BigQuery capable of processing 70 TB of data per hour within a 10-minute SLA, aggregating session-level information and calculating hourly KPIs for wireless customers.
5. Cost Optimization with Cloud FinOps
Implemented cost-saving measures across GCP services, reducing project-level costs by 45% by modernizing Spark/Hadoop workloads from Dataproc Long Running clusters to Dataproc Serverless and native BigQuery ELT, leveraging FinOps principles.
Feel free to reach out for collaboration or if you have questions about data engineering on GCP and Hadoop!
This page was generated using GitHub Pages.