Data Engineer with 7+ years of experience in PySpark, SQL, Python, Hive, Airflow, DBT, Databricks, AWS Redshift, S3, Glue, Snowflake, Trino, Iceberg, —transforming complex data into clear, strategic business insights.
Orchestration: Airflow / Databricks Jobs / Step Functions
Monitoring: Datadog / Cloudwatch / LakeHouse
Lakehouse: Iceberg / Delta Lake
DBT
CRM Tools: Salesforce, Hubspot
ETL & Reverse ETL: Hightouch / Fivetran
Work History
American Airlines
Data Engineer
08.2024 - Current
Job overview
Designed and built end-to-end data pipelines, from ingestion to transformation, quality checks, and loading into the DWH.
Automated ETL/ELT with PySpark, Python, DBT, Spark SQL, Trino, AWS Glue, and Airflow.
Migrated petabyte-scale S3 data to Delta Lake for schema enforcement, time travel, and faster analytics.
Built low-latency PySpark streaming with Kafka on Delta Lake & S3, driving operational insights.
Developed DBT models for Redshift and Delta Lake on S3, implementing incremental logic, and optimized SQL queries.
Built scalable Python data pipelines to process and transform JSON, XML, and relational as well as other semi-structured datasets.
Optimized Oracle, PostgreSQL, and Teradata with advanced SQLs, including CTEs, window functions, and joins.
Built CI/CD pipelines using GitHub Actions to automate testing and deployment of data workflows.
Created AI-driven code generators using Ollama LLM to automate writing DBT models & data modeling.
Monitored logs and metrics via CloudWatch, Datadog, Lakehouse Monitoring to ensure reliability.
Enforced data quality via automated checks (AWS Deequ & DQx) to catch anomalies early.
Implemented real-time CDC ingestion pipelines using Debezium and Kafka to sync changes from OLTP to data lake with minimal latency.
Airbnb
Data Engineer
02.2024 - 08.2024
Job overview
Automated the execution of 4000+ lines of Snowflake / Redshift SQL code with 50+ CTEs using Python, dynamically extracting and executing each CTE to streamline data validation and analysis.
Ingested API, Google Analytics, and external data into Snowflake/Redshift using PySpark and Airflow.
Orchestrated the end-to-end DWH migration from Snowflake to Amazon Redshift, ensuring seamless data transfer, schema alignment, and performance optimization.
Pinterest
Data Engineer
02.2023 - 02.2024
Job overview
Engineered scalable, efficient ETL data pipelines using Python, Spark, dbt, Hadoop, and Big Data tools.
Integrated Trino with Apache Iceberg for faster queries and scalable data management.
Containerized PySpark and Python ETL pipelines using Docker, reducing deployment time by 70%.
Successfully migrated legacy databases to AWS using DMS, cutting costs by 20% with zero data loss.
Designed Databricks pipelines on Delta Lake for a 10TB lake, boosting query performance by 40%.
Configured Datadog alerts to detect anomalies in data pipelines, reducing downtime by 30%.
Optimized AWS Redshift to handle complex analytics on 50B+ records, improving dashboard performance by 45%.
Architected data warehousing solutions on Redshift, Snowflake, and Hive for historical and real-time analytics.
PNC Bank
Data Engineer
09.2019 - 02.2023
Job overview
Created data pipeline to migrate data from Oracle to Redshift saving $750,000 with a performance increase of 23%.
Developed and maintained PySpark scripts to automate ETL workflows between AWS S3, Glue, Hive, Redshift, and other data sources.
Developed and implemented a Python-based automation script for data quality checks, enhancing data integrity across 1M+ records.
Developed batch data pipelines, designing ETL/ELT processes to ingest large, complex, and diverse datasets into hive tables.
Designed and implemented data pipeline to process huge dataset by integrating 150 million raw records from 10+ data sources.
Utilized AWS Athena for seamless querying of large-scale datasets stored in S3, reducing data retrieval times by 30% and enabling cost effective ad-hoc data analysis.
RBC Bank
Data Engineer
01.2018 - 09.2018
Job overview
Constructed a data pipeline using PySpark and Python to process semi-structured data by incorporating 100 million raw records from 15 data sources.
Created Python Library to parse and reformat data from external vendors, reducing error in the data pipeline by 12%.
Automated ETL process across billions of rows of data, which reduced manual workload by 70% monthly.
Developed and optimized ETL processes using Talend & Informatica for data cleansing, transformation, and aggregation.
Experienced in data manipulation using Python for loading and extraction as well as with python libraries such as NumPy, and Pandas for data analysis and numerical computations.
Optimized big workflows by working on Spark and MapReduce optimization techniques like partitioning, bucketing, repartition, coalesce, cache and persist.
Morgan Stanley
Data Engineer
06.2017 - 12.2017
Job overview
Developed PySpark and Python scripts to reconcile credit card and debit card transaction data using Data Vault.
Collaborated with business users to define data requirements for transaction analytics and Data Vault modeling.
Engineered scalable ETL pipelines using PySpark to efficiently process large volumes of transaction data, enabling timely and accurate financial reporting for key stakeholders.
Education
GNDU
Bachelor of Science from Electrical, Electronics and Communications Engineering
07.2014
IIT Roorkee
Big Data Engineer from Data Engineer
University Overview
Certified with Big Data Engineer from IIT Roorkee.
Griffith University
Big Data Analytics from Data Analyst
University Overview
Certified with Big Data Analyst from Griffith University.
Metropolitan School Of Business
Business Intelligence from Business Intelligence & Data Warehousing
University Overview
Professional certificate in Business Intelligence & Data Warehousing from Metropolitan School Of Business And Management, UK.
Certification
Big Data Engineer, IIT Roorkee, Certified Big Data Engineer from IIT Roorkee.
Big Data Analytics, Griffith University, Big Data Analyst from Griffith University.
Business Intelligence And Data Warehousing, Metropolitan School Of Business, UK, Professional certificate in Business Intelligence & Data Warehousing from Metropolitan School Of Business And Management, UK.
Certified Business Analysis Professional, Simplilearn, Doing certificate training course on Certified Business Analysis Professional Certification.
Certified Big Data And Hadoop Training, Eduonix, Certified Big Data And Hadoop Training, Eduonix.