Data Engineer with 6+ years of experience in PySpark, SQL, Python, Hive, Airflow, Databricks, DBT, AWS Redshift, S3, Glue, Snowflake —transforming complex data into clear, strategic business insights.
Overview
7
7
years of professional experience
Work History
Data Engineer
Walmart
08.2024 - Current
Pioneered ETL/ELT automation & analytics using PySpark, Python, DBT, Spark SQL, Trino, and AWS Glue, coupled with Airflow for workflow management, streamlining data operations.
Migrated petabyte-scale S3 data to Delta Lake for schema enforcement, time travel, and faster analytics.
Built low-latency PySpark streaming with Kafka on Delta Lake & S3, driving operational insights.
Managed transactional data using Oracle, PostgreSQL, and Teradata, enhancing data storage and manipulation.
Enforced data quality via automated checks (AWS Deequ & Databricks DQx) to catch anomalies early.
Ingested API, Google Analytics, and external data into Snowflake/Redshift using PySpark and Airflow.
Optimized Spark jobs and SQL queries to minimize runtime and resource consumption.
Built CI/CD pipelines using GitHub Actions to automate testing and deployment of data workflows.
Created AI-driven code generation tools using Ollama LLM models for automated data modeling and version control.
Data Engineer
Shopify
08.2022 - 07.2024
Led the development of scalable Databricks pipelines, leveraging Delta Lake for improved data reliability in a 10TB lake, enhancing query performance by 40%.
Extensive experience in ETL/ELT automation of data extraction, data cleaning and data preparation using Pyspark, Python, Spark SQL, AWS Glue, Airflow.
Automated data ingestion into Redshift using AWS Glue Crawlers, improving data availability for downstream users.
Built and managed secure data lakes on Amazon S3, enabling advanced analytics and machine learning workloads.
Designed and implemented data pipeline to process huge dataset by integrating 150 million raw records from 10+ data sources.
Successfully migrated legacy database systems to AWS cloud environments using AWS DMS, reducing infrastructure costs by 20% while ensuring zero data loss during the transition.
Implemented Hightouch reverse ETL to sync Redshift and Iceberg data into Salesforce and HubSpot.
Data Engineer
Delta Airlines
11.2020 - 07.2022
Created data pipeline to migrate data from Oracle to Redshift saving $750,000 with a performance increase of 23%.
Developed and maintained PySpark scripts to automate ETL workflows between AWS S3, Glue, Hive, Redshift, and other data sources.
Developed and implemented a Python-based automation script for data quality checks, enhancing data integrity across 1M+ records.
Created Python Library to parse and reformat data from external vendors, reducing error in the data pipeline across 1M+.
Data Engineer
JP Morgan
03.2018 - 11.2020
Developed PySpark and Python scripts to reconcile credit card and debit card transaction data using Data Vault.
Collaborated with business users to define data requirements for transaction analytics and Data Vault modeling.
Engineered scalable ETL pipelines using PySpark to efficiently process large volumes of transaction data, enabling timely and accurate financial reporting for key stakeholders.