ANIRBAN SANYAL
GCP Data Engineer | BigQuery | Dataflow | Cloud Pipelines
Kolkata, IN.About
Senior Data Engineer with over 7 years of expertise in architecting and deploying production-grade GCP data pipelines for enterprise banking clients. Specializes in BigQuery, Dataflow (Apache Beam), and Cloud Composer, driving robust batch and streaming ETL/ELT solutions. Proven ability to build fault-tolerant, idempotent systems with advanced data quality controls and metadata tracking, ensuring high data integrity and operational efficiency within international Agile environments.
Work
Deloitte USI
|Data Engineer - Banking & Financial Services
India, India, India
→
Summary
Architected and deployed production-grade GCP data pipelines for enterprise banking clients, enabling efficient data processing and robust data retrieval.
Highlights
Architected and deployed an end-to-end Dataflow pipeline (Apache Beam / Python) that ingested mainframe files into Bigtable, achieving sub-10ms API-based data retrieval for a US banking system processing millions of daily transactions.
Implemented robust idempotency checks within Dataflow pipelines, eliminating duplicate records during re-runs and ensuring 100% data consistency and integrity across all load cycles.
Developed a comprehensive metadata tracking framework (ABC Framework) in BigQuery, capturing pipeline-level record counts, timestamps, and status flags to enable full data lineage and operational observability.
Engineered structured error capture mechanisms into BigQuery, routing failed records with detailed context to facilitate rapid root-cause analysis and significantly reduce incident resolution time.
Delivered an event-driven orchestration layer utilizing Cloud Functions and Cloud Composer (Airflow) to automatically trigger Dataflow jobs, enabling near real-time processing and reducing manual intervention.
IBM India
|Data Engineer - Analytics & Production Support
India, India, India
→
Summary
Managed and optimized large-scale analytical datasets and supported production workloads, ensuring high data availability and performance.
Highlights
Managed and optimized large-scale analytical datasets in BigQuery and Cloud Storage, supporting production workloads processing tens of millions of records daily across multiple business domains.
Resolved critical data incidents through deep SQL analysis and root-cause investigation, consistently maintaining SLA compliance and minimizing disruptions to customer-facing analytics and reporting pipelines.
Maintained and enhanced CI/CD pipelines using GitHub Actions and Cloud Build, enabling zero-downtime deployments and supporting comprehensive data investigations for funnel and customer journey analytics.
Tata Consultancy Services
|Data Engineer - Pipeline Design & Data Migration
India, India, India
→
Summary
Designed and implemented scalable batch and streaming data pipelines, ensuring reliable data ingestion and processing for enterprise workloads.
Highlights
Designed and implemented scalable batch and streaming data pipelines using Pub/Sub, Python, SQL, and Cloud Composer, ensuring reliable and performant data ingestion across enterprise workloads.
Developed robust data ingestion, transformation, and migration frameworks using Oracle and Oracle Data Integrator (ODI), ensuring high data accuracy and seamless system transitions across upgrades and consolidations.
Collaborated closely with business stakeholders to translate complex requirements into scalable technical pipeline designs and resolved data quality issues through structured root-cause analysis, improving system reliability.
Education
Maulana Abul Kalam Azad University
→
B.Tech
Computer Science Engineering
Grade: 8.18 / 10
Languages
English
Certificates
Google Professional Data Engineer
Issued By
Google Associate Cloud Engineer
Issued By
Skills
GCP Services
BigQuery, Dataflow (Apache Beam), Bigtable, Cloud Storage, Pub/Sub, Cloud Composer (Airflow), Cloud Functions, Cloud Build, Workflows.
Data Engineering Concepts
ETL/ELT, Batch & Streaming Pipelines, Data Modelling, Data Warehousing, Data Lake Architecture, API Integration, Idempotency, Data Quality, Data Lineage, Metadata Frameworks, Parquet / Avro / ORC.
Programming Languages & Tools
Python, SQL (BigQuery, Oracle).
CI/CD & Version Control
GitHub Actions, GitLab CI, Cloud Build, Git.
Methodologies
Agile / Scrum.