ANIRBAN SANYAL

GCP Data Engineer | BigQuery | Dataflow | Cloud Pipelines
Kolkata, IN.

About

Senior Data Engineer with over 7 years of expertise in architecting and deploying production-grade GCP data pipelines for enterprise banking clients. Specializes in BigQuery, Dataflow (Apache Beam), and Cloud Composer, driving robust batch and streaming ETL/ELT solutions. Proven ability to build fault-tolerant, idempotent systems with advanced data quality controls and metadata tracking, ensuring high data integrity and operational efficiency within international Agile environments.

Work

Deloitte USI
|

Data Engineer - Banking & Financial Services

India, India, India

Summary

Architected and deployed production-grade GCP data pipelines for enterprise banking clients, enabling efficient data processing and robust data retrieval.

Highlights

Architected and deployed an end-to-end Dataflow pipeline (Apache Beam / Python) that ingested mainframe files into Bigtable, achieving sub-10ms API-based data retrieval for a US banking system processing millions of daily transactions.

Implemented robust idempotency checks within Dataflow pipelines, eliminating duplicate records during re-runs and ensuring 100% data consistency and integrity across all load cycles.

Developed a comprehensive metadata tracking framework (ABC Framework) in BigQuery, capturing pipeline-level record counts, timestamps, and status flags to enable full data lineage and operational observability.

Engineered structured error capture mechanisms into BigQuery, routing failed records with detailed context to facilitate rapid root-cause analysis and significantly reduce incident resolution time.

Delivered an event-driven orchestration layer utilizing Cloud Functions and Cloud Composer (Airflow) to automatically trigger Dataflow jobs, enabling near real-time processing and reducing manual intervention.

IBM India
|

Data Engineer - Analytics & Production Support

India, India, India

Summary

Managed and optimized large-scale analytical datasets and supported production workloads, ensuring high data availability and performance.

Highlights

Managed and optimized large-scale analytical datasets in BigQuery and Cloud Storage, supporting production workloads processing tens of millions of records daily across multiple business domains.

Resolved critical data incidents through deep SQL analysis and root-cause investigation, consistently maintaining SLA compliance and minimizing disruptions to customer-facing analytics and reporting pipelines.

Maintained and enhanced CI/CD pipelines using GitHub Actions and Cloud Build, enabling zero-downtime deployments and supporting comprehensive data investigations for funnel and customer journey analytics.

Tata Consultancy Services
|

Data Engineer - Pipeline Design & Data Migration

India, India, India

Summary

Designed and implemented scalable batch and streaming data pipelines, ensuring reliable data ingestion and processing for enterprise workloads.

Highlights

Designed and implemented scalable batch and streaming data pipelines using Pub/Sub, Python, SQL, and Cloud Composer, ensuring reliable and performant data ingestion across enterprise workloads.

Developed robust data ingestion, transformation, and migration frameworks using Oracle and Oracle Data Integrator (ODI), ensuring high data accuracy and seamless system transitions across upgrades and consolidations.

Collaborated closely with business stakeholders to translate complex requirements into scalable technical pipeline designs and resolved data quality issues through structured root-cause analysis, improving system reliability.

Education

Maulana Abul Kalam Azad University
India, India, India

B.Tech

Computer Science Engineering

Grade: 8.18 / 10

Languages

English

Certificates

Google Professional Data Engineer

Issued By

Google

Google Associate Cloud Engineer

Issued By

Google

Skills

GCP Services

BigQuery, Dataflow (Apache Beam), Bigtable, Cloud Storage, Pub/Sub, Cloud Composer (Airflow), Cloud Functions, Cloud Build, Workflows.

Data Engineering Concepts

ETL/ELT, Batch & Streaming Pipelines, Data Modelling, Data Warehousing, Data Lake Architecture, API Integration, Idempotency, Data Quality, Data Lineage, Metadata Frameworks, Parquet / Avro / ORC.

Programming Languages & Tools

Python, SQL (BigQuery, Oracle).

CI/CD & Version Control

GitHub Actions, GitLab CI, Cloud Build, Git.

Methodologies

Agile / Scrum.