Summary
Overview
Work History
Education
Skills
Websites
Timeline
Generic

Shweta Dwivedi

Toronto,Canada

Summary

Data Engineer with 6+ years of experience in PySpark, SQL, Python, Hive, Airflow, Databricks, DBT, AWS Redshift, S3, Glue, Snowflake —transforming complex data into clear, strategic business insights.

Overview

7
7
years of professional experience

Work History

Data Engineer

Walmart
08.2024 - Current
  • Pioneered ETL/ELT automation & analytics using PySpark, Python, DBT, Spark SQL, Trino, and AWS Glue, coupled with Airflow for workflow management, streamlining data operations.
  • Migrated petabyte-scale S3 data to Delta Lake for schema enforcement, time travel, and faster analytics.
  • Built low-latency PySpark streaming with Kafka on Delta Lake & S3, driving operational insights.
  • Managed transactional data using Oracle, PostgreSQL, and Teradata, enhancing data storage and manipulation.
  • Enforced data quality via automated checks (AWS Deequ & Databricks DQx) to catch anomalies early.
  • Ingested API, Google Analytics, and external data into Snowflake/Redshift using PySpark and Airflow.
  • Optimized Spark jobs and SQL queries to minimize runtime and resource consumption.
  • Built CI/CD pipelines using GitHub Actions to automate testing and deployment of data workflows.
  • Created AI-driven code generation tools using Ollama LLM models for automated data modeling and version control.

Data Engineer

Shopify
08.2022 - 07.2024
  • Led the development of scalable Databricks pipelines, leveraging Delta Lake for improved data reliability in a 10TB lake, enhancing query performance by 40%.
  • Extensive experience in ETL/ELT automation of data extraction, data cleaning and data preparation using Pyspark, Python, Spark SQL, AWS Glue, Airflow.
  • Automated data ingestion into Redshift using AWS Glue Crawlers, improving data availability for downstream users.
  • Built and managed secure data lakes on Amazon S3, enabling advanced analytics and machine learning workloads.
  • Designed and implemented data pipeline to process huge dataset by integrating 150 million raw records from 10+ data sources.
  • Successfully migrated legacy database systems to AWS cloud environments using AWS DMS, reducing infrastructure costs by 20% while ensuring zero data loss during the transition.
  • Implemented Hightouch reverse ETL to sync Redshift and Iceberg data into Salesforce and HubSpot.

Data Engineer

Delta Airlines
11.2020 - 07.2022
  • Created data pipeline to migrate data from Oracle to Redshift saving $750,000 with a performance increase of 23%.
  • Developed and maintained PySpark scripts to automate ETL workflows between AWS S3, Glue, Hive, Redshift, and other data sources.
  • Developed and implemented a Python-based automation script for data quality checks, enhancing data integrity across 1M+ records.
  • Created Python Library to parse and reformat data from external vendors, reducing error in the data pipeline across 1M+.

Data Engineer

JP Morgan
03.2018 - 11.2020
  • Developed PySpark and Python scripts to reconcile credit card and debit card transaction data using Data Vault.
  • Collaborated with business users to define data requirements for transaction analytics and Data Vault modeling.
  • Engineered scalable ETL pipelines using PySpark to efficiently process large volumes of transaction data, enabling timely and accurate financial reporting for key stakeholders.

Education

Masters of Technology -

VIT University
Pune
01.2021

Skills

  • Python
  • Pyspark
  • AWS: S3, Lambda, Redshift, Sagemaker, Deequ, Glue, Step Functions, Athena
  • Databricks
  • Airflow
  • LakeHouse - Delta Lake / Iceberg
  • DBT
  • Snowflake
  • Hive
  • Oracle / Postgres
  • Spark Streaming & Kafka

Timeline

Data Engineer

Walmart
08.2024 - Current

Data Engineer

Shopify
08.2022 - 07.2024

Data Engineer

Delta Airlines
11.2020 - 07.2022

Data Engineer

JP Morgan
03.2018 - 11.2020

Masters of Technology -

VIT University
Shweta Dwivedi