Summary
Overview
Skills
Work History
Education
Certification
Websites
Timeline
Hi, I’m

Ankur Chopra

Senior Data Engineer
Toronto,ON
Ankur Chopra

Summary

Data Engineer with 7+ years of experience in PySpark, SQL, Python, Hive, Airflow, DBT, Databricks, AWS Redshift, S3, Glue, Snowflake, Trino, Iceberg, —transforming complex data into clear, strategic business insights.

Overview

8
years of professional experience
5

Certifications

Skills

  • Python / Pyspark
  • Spark Streaming / Kafka
  • AWS: S3, Glue, Redshift, Lambda, Deequ, Step Functions, Athena, EMR
  • Databricks: Delta Tables(lake), DQx, Unity Catalog, Lakehouse Monitoring
  • Hive / Trino
  • Snowflake
  • RDBMS: Oracle / Postgres
  • Orchestration: Airflow / Databricks Jobs / Step Functions
  • Monitoring: Datadog / Cloudwatch / LakeHouse
  • Lakehouse: Iceberg / Delta Lake
  • DBT
  • CRM Tools: Salesforce, Hubspot
  • ETL & Reverse ETL: Hightouch / Fivetran

Work History

American Airlines

Data Engineer
08.2024 - Current

Job overview

  • Designed and built end-to-end data pipelines, from ingestion to transformation, quality checks, and loading into the DWH.
  • Automated ETL/ELT with PySpark, Python, DBT, Spark SQL, Trino, AWS Glue, and Airflow.
  • Migrated petabyte-scale S3 data to Delta Lake for schema enforcement, time travel, and faster analytics.
  • Built low-latency PySpark streaming with Kafka on Delta Lake & S3, driving operational insights.
  • Developed DBT models for Redshift and Delta Lake on S3, implementing incremental logic, and optimized SQL queries.
  • Built scalable Python data pipelines to process and transform JSON, XML, and relational as well as other semi-structured datasets.
  • Optimized Oracle, PostgreSQL, and Teradata with advanced SQLs, including CTEs, window functions, and joins.
  • Built CI/CD pipelines using GitHub Actions to automate testing and deployment of data workflows.
  • Created AI-driven code generators using Ollama LLM to automate writing DBT models & data modeling.
  • Monitored logs and metrics via CloudWatch, Datadog, Lakehouse Monitoring to ensure reliability.
  • Enforced data quality via automated checks (AWS Deequ & DQx) to catch anomalies early.
  • Implemented real-time CDC ingestion pipelines using Debezium and Kafka to sync changes from OLTP to data lake with minimal latency.

Airbnb

Data Engineer
02.2024 - 08.2024

Job overview

  • Automated the execution of 4000+ lines of Snowflake / Redshift SQL code with 50+ CTEs using Python, dynamically extracting and executing each CTE to streamline data validation and analysis.
  • Ingested API, Google Analytics, and external data into Snowflake/Redshift using PySpark and Airflow.
  • Orchestrated the end-to-end DWH migration from Snowflake to Amazon Redshift, ensuring seamless data transfer, schema alignment, and performance optimization.

Pinterest

Data Engineer
02.2023 - 02.2024

Job overview

  • Engineered scalable, efficient ETL data pipelines using Python, Spark, dbt, Hadoop, and Big Data tools.
  • Integrated Trino with Apache Iceberg for faster queries and scalable data management.
  • Containerized PySpark and Python ETL pipelines using Docker, reducing deployment time by 70%.
  • Successfully migrated legacy databases to AWS using DMS, cutting costs by 20% with zero data loss.
  • Designed Databricks pipelines on Delta Lake for a 10TB lake, boosting query performance by 40%.
  • Configured Datadog alerts to detect anomalies in data pipelines, reducing downtime by 30%.
  • Optimized AWS Redshift to handle complex analytics on 50B+ records, improving dashboard performance by 45%.
  • Architected data warehousing solutions on Redshift, Snowflake, and Hive for historical and real-time analytics.

PNC Bank

Data Engineer
09.2019 - 02.2023

Job overview

  • Created data pipeline to migrate data from Oracle to Redshift saving $750,000 with a performance increase of 23%.
  • Developed and maintained PySpark scripts to automate ETL workflows between AWS S3, Glue, Hive, Redshift, and other data sources.
  • Developed and implemented a Python-based automation script for data quality checks, enhancing data integrity across 1M+ records.
  • Developed batch data pipelines, designing ETL/ELT processes to ingest large, complex, and diverse datasets into hive tables.
  • Designed and implemented data pipeline to process huge dataset by integrating 150 million raw records from 10+ data sources.
  • Utilized AWS Athena for seamless querying of large-scale datasets stored in S3, reducing data retrieval times by 30% and enabling cost effective ad-hoc data analysis.

RBC Bank

Data Engineer
01.2018 - 09.2018

Job overview

  • Constructed a data pipeline using PySpark and Python to process semi-structured data by incorporating 100 million raw records from 15 data sources.
  • Created Python Library to parse and reformat data from external vendors, reducing error in the data pipeline by 12%.
  • Automated ETL process across billions of rows of data, which reduced manual workload by 70% monthly.
  • Developed and optimized ETL processes using Talend & Informatica for data cleansing, transformation, and aggregation.
  • Experienced in data manipulation using Python for loading and extraction as well as with python libraries such as NumPy, and Pandas for data analysis and numerical computations.
  • Optimized big workflows by working on Spark and MapReduce optimization techniques like partitioning, bucketing, repartition, coalesce, cache and persist.

Morgan Stanley

Data Engineer
06.2017 - 12.2017

Job overview

  • Developed PySpark and Python scripts to reconcile credit card and debit card transaction data using Data Vault.
  • Collaborated with business users to define data requirements for transaction analytics and Data Vault modeling.
  • Engineered scalable ETL pipelines using PySpark to efficiently process large volumes of transaction data, enabling timely and accurate financial reporting for key stakeholders.

Education

GNDU

Bachelor of Science from Electrical, Electronics and Communications Engineering
07.2014

IIT Roorkee

Big Data Engineer from Data Engineer

University Overview

Certified with Big Data Engineer from IIT Roorkee.

Griffith University

Big Data Analytics from Data Analyst

University Overview

Certified with Big Data Analyst from Griffith University.

Metropolitan School Of Business

Business Intelligence from Business Intelligence & Data Warehousing

University Overview

Professional certificate in Business Intelligence & Data Warehousing from Metropolitan School Of Business And Management, UK.

Certification

  • Big Data Engineer, IIT Roorkee, Certified Big Data Engineer from IIT Roorkee.
  • Big Data Analytics, Griffith University, Big Data Analyst from Griffith University.
  • Business Intelligence And Data Warehousing, Metropolitan School Of Business, UK, Professional certificate in Business Intelligence & Data Warehousing from Metropolitan School Of Business And Management, UK.
  • Certified Business Analysis Professional, Simplilearn, Doing certificate training course on Certified Business Analysis Professional Certification.
  • Certified Big Data And Hadoop Training, Eduonix, Certified Big Data And Hadoop Training, Eduonix.

Timeline

Data Engineer

American Airlines
08.2024 - Current

Data Engineer

Airbnb
02.2024 - 08.2024

Data Engineer

Pinterest
02.2023 - 02.2024

Data Engineer

PNC Bank
09.2019 - 02.2023

Data Engineer

RBC Bank
01.2018 - 09.2018

Data Engineer

Morgan Stanley
06.2017 - 12.2017

GNDU

Bachelor of Science from Electrical, Electronics and Communications Engineering

IIT Roorkee

Big Data Engineer from Data Engineer

Griffith University

Big Data Analytics from Data Analyst

Metropolitan School Of Business

Business Intelligence from Business Intelligence & Data Warehousing
Ankur ChopraSenior Data Engineer