Summary

Overview

Skills

Work History

Education

Certification

Websites

Timeline

Hi, I’m

Ankur Chopra

Senior Data Engineer

Toronto,ON

Summary

Data Engineer with 7+ years of experience in PySpark, SQL, Python, Hive, Airflow, DBT, Databricks, AWS Redshift, S3, Glue, Snowflake, Trino, Iceberg, —transforming complex data into clear, strategic business insights.

Overview

years of professional experience

Certifications

Skills

Python / Pyspark
Spark Streaming / Kafka
AWS: S3, Glue, Redshift, Lambda, Deequ, Step Functions, Athena, EMR
Databricks: Delta Tables(lake), DQx, Unity Catalog, Lakehouse Monitoring
Hive / Trino
Snowflake
RDBMS: Oracle / Postgres

Orchestration: Airflow / Databricks Jobs / Step Functions
Monitoring: Datadog / Cloudwatch / LakeHouse
Lakehouse: Iceberg / Delta Lake
DBT
CRM Tools: Salesforce, Hubspot
ETL & Reverse ETL: Hightouch / Fivetran

Work History

American Airlines

Data Engineer

08.2024 - Current

Job overview

Designed and built end-to-end data pipelines, from ingestion to transformation, quality checks, and loading into the DWH.
Automated ETL/ELT with PySpark, Python, DBT, Spark SQL, Trino, AWS Glue, and Airflow.
Migrated petabyte-scale S3 data to Delta Lake for schema enforcement, time travel, and faster analytics.
Built low-latency PySpark streaming with Kafka on Delta Lake & S3, driving operational insights.
Developed DBT models for Redshift and Delta Lake on S3, implementing incremental logic, and optimized SQL queries.
Built scalable Python data pipelines to process and transform JSON, XML, and relational as well as other semi-structured datasets.
Optimized Oracle, PostgreSQL, and Teradata with advanced SQLs, including CTEs, window functions, and joins.
Built CI/CD pipelines using GitHub Actions to automate testing and deployment of data workflows.
Created AI-driven code generators using Ollama LLM to automate writing DBT models & data modeling.
Monitored logs and metrics via CloudWatch, Datadog, Lakehouse Monitoring to ensure reliability.
Enforced data quality via automated checks (AWS Deequ & DQx) to catch anomalies early.
Implemented real-time CDC ingestion pipelines using Debezium and Kafka to sync changes from OLTP to data lake with minimal latency.

Airbnb

Data Engineer

02.2024 - 08.2024

Job overview

Automated the execution of 4000+ lines of Snowflake / Redshift SQL code with 50+ CTEs using Python, dynamically extracting and executing each CTE to streamline data validation and analysis.
Ingested API, Google Analytics, and external data into Snowflake/Redshift using PySpark and Airflow.
Orchestrated the end-to-end DWH migration from Snowflake to Amazon Redshift, ensuring seamless data transfer, schema alignment, and performance optimization.

Data Engineer

02.2023 - 02.2024

Job overview

Engineered scalable, efficient ETL data pipelines using Python, Spark, dbt, Hadoop, and Big Data tools.
Integrated Trino with Apache Iceberg for faster queries and scalable data management.
Containerized PySpark and Python ETL pipelines using Docker, reducing deployment time by 70%.
Successfully migrated legacy databases to AWS using DMS, cutting costs by 20% with zero data loss.
Designed Databricks pipelines on Delta Lake for a 10TB lake, boosting query performance by 40%.
Configured Datadog alerts to detect anomalies in data pipelines, reducing downtime by 30%.
Optimized AWS Redshift to handle complex analytics on 50B+ records, improving dashboard performance by 45%.
Architected data warehousing solutions on Redshift, Snowflake, and Hive for historical and real-time analytics.

PNC Bank

Data Engineer

09.2019 - 02.2023

Job overview

Created data pipeline to migrate data from Oracle to Redshift saving $750,000 with a performance increase of 23%.
Developed and maintained PySpark scripts to automate ETL workflows between AWS S3, Glue, Hive, Redshift, and other data sources.
Developed and implemented a Python-based automation script for data quality checks, enhancing data integrity across 1M+ records.
Developed batch data pipelines, designing ETL/ELT processes to ingest large, complex, and diverse datasets into hive tables.
Designed and implemented data pipeline to process huge dataset by integrating 150 million raw records from 10+ data sources.
Utilized AWS Athena for seamless querying of large-scale datasets stored in S3, reducing data retrieval times by 30% and enabling cost effective ad-hoc data analysis.

RBC Bank

Data Engineer

01.2018 - 09.2018

Job overview

Constructed a data pipeline using PySpark and Python to process semi-structured data by incorporating 100 million raw records from 15 data sources.
Created Python Library to parse and reformat data from external vendors, reducing error in the data pipeline by 12%.
Automated ETL process across billions of rows of data, which reduced manual workload by 70% monthly.
Developed and optimized ETL processes using Talend & Informatica for data cleansing, transformation, and aggregation.
Experienced in data manipulation using Python for loading and extraction as well as with python libraries such as NumPy, and Pandas for data analysis and numerical computations.
Optimized big workflows by working on Spark and MapReduce optimization techniques like partitioning, bucketing, repartition, coalesce, cache and persist.

Morgan Stanley

Data Engineer

06.2017 - 12.2017

Job overview

Developed PySpark and Python scripts to reconcile credit card and debit card transaction data using Data Vault.
Collaborated with business users to define data requirements for transaction analytics and Data Vault modeling.
Engineered scalable ETL pipelines using PySpark to efficiently process large volumes of transaction data, enabling timely and accurate financial reporting for key stakeholders.

Education

GNDU

Bachelor of Science from Electrical, Electronics and Communications Engineering

07.2014

IIT Roorkee

Big Data Engineer from Data Engineer

University Overview

Certified with Big Data Engineer from IIT Roorkee.

Griffith University

Big Data Analytics from Data Analyst

University Overview

Certified with Big Data Analyst from Griffith University.

Metropolitan School Of Business

Business Intelligence from Business Intelligence & Data Warehousing

University Overview

Professional certificate in Business Intelligence & Data Warehousing from Metropolitan School Of Business And Management, UK.

Certification

Big Data Engineer, IIT Roorkee, Certified Big Data Engineer from IIT Roorkee.
Big Data Analytics, Griffith University, Big Data Analyst from Griffith University.
Business Intelligence And Data Warehousing, Metropolitan School Of Business, UK, Professional certificate in Business Intelligence & Data Warehousing from Metropolitan School Of Business And Management, UK.
Certified Business Analysis Professional, Simplilearn, Doing certificate training course on Certified Business Analysis Professional Certification.
Certified Big Data And Hadoop Training, Eduonix, Certified Big Data And Hadoop Training, Eduonix.

Websites

Timeline

Data Engineer

American Airlines

08.2024 - Current

Data Engineer

Airbnb

02.2024 - 08.2024

Data Engineer

02.2023 - 02.2024

Data Engineer

PNC Bank

09.2019 - 02.2023

Data Engineer

RBC Bank

01.2018 - 09.2018

Data Engineer

Morgan Stanley

06.2017 - 12.2017

GNDU

Bachelor of Science from Electrical, Electronics and Communications Engineering

IIT Roorkee

Big Data Engineer from Data Engineer

Griffith University

Big Data Analytics from Data Analyst

Metropolitan School Of Business

Business Intelligence from Business Intelligence & Data Warehousing

Similar Profiles

Crystal Michelle BarayCrystal Michelle Baray
MSR Bank Secrecy Act - Compliance at American Airlines, American Airlines HDQMSR Bank Secrecy Act - Compliance at American Airlines, American Airlines HDQ
Sue RobinsonSue Robinson
Sr Specialist, HRIS Business Process at American Airlines, American Airlines HDQSr Specialist, HRIS Business Process at American Airlines, American Airlines HDQ
Jilleian K. Sessions-StackhouseJilleian K. Sessions-Stackhouse
Human Resources Sr. Specialist, Policy at American Airlines, American Airlines HDQHuman Resources Sr. Specialist, Policy at American Airlines, American Airlines HDQ
CRYSTAL FENNELCRYSTAL FENNEL
Airline-customer-service-agent/ Airline-Ramp Agent at US Airways American Airlines Piedmont AirlinesAirline-customer-service-agent/ Airline-Ramp Agent at US Airways American Airlines Piedmont Airlines
Luke TomicLuke Tomic
Site Safety Coordinator at GroundupservicesSite Safety Coordinator at Groundupservices

CREATE PROFILE

Summary

Overview

Skills

Work History

American Airlines

Job overview

Airbnb

Job overview

Pinterest

Job overview

PNC Bank

Job overview

RBC Bank

Job overview

Morgan Stanley

Job overview

Education

GNDU

IIT Roorkee

University Overview

Griffith University

University Overview

Metropolitan School Of Business

University Overview

Certification

Websites

Timeline

Data Engineer

Data Engineer

Data Engineer

Data Engineer

Data Engineer

Data Engineer

GNDU

IIT Roorkee

Griffith University

Metropolitan School Of Business

Similar Profiles

Crystal Michelle BarayCrystal Michelle Baray

Sue RobinsonSue Robinson

Jilleian K. Sessions-StackhouseJilleian K. Sessions-Stackhouse

CRYSTAL FENNELCRYSTAL FENNEL

Luke TomicLuke Tomic