Summary
Overview
Work History
Education
Skills
Timeline
Generic

Abhinesh Dasari

Toronto,ON

Summary

Experienced and detail-oriented Senior Data Engineer with 7 years of experience in architecting and maintaining scalable data solutions across cloud and big data ecosystems. Expertise in PySpark, Databricks, SQL, Apache NiFi, and Hive, with a strong background in data migration, workflow orchestration, and ETL pipelines. Proven success in the banking and finance domain, with hands-on experience in AWS, Azure, IBM Cloud, and regulatory reporting systems. Recognized for delivering clean, compliant, and high-performance data frameworks under Agile environments.

Overview

9
9
years of professional experience

Work History

Senior Big Data Engineer

Scotiabank
07.2024 - Current
  • Led the data engineering effort for a liquidity risk reporting initiative focused on FR 2052A compliance reporting.
  • Developed scalable pipelines using Aws eco system and transformed data from legacy application(Apache NiFi, Talend Jobs) using PySpark, and Spark SQL to process sensitive financial data.
  • Orchestrated data flows using Airflow and Apache NiFi and managed processing to trigger PySpark Glue Jobs.
  • Provisioned infrastructure using AWS CDK using python, for reproducible deployment of S3 buckets, Glue jobs, and IAM roles.
  • Designed and triggered ETL pipelines using AWS Step Functions for sequential orchestration.
  • Used AWS Lambda functions for lightweight data enrichment and post-processing tasks.
  • Queried large-scale datasets using Amazon Athena for ad-hoc compliance insights and validation checks.
  • Leveraged Amazon EMR for distributed PySpark processing of high-volume batch workloads.
  • Stored raw and curated datasets in Amazon S3 using structured partitioning and optimized formats like Parquet.
  • Loaded transformed data into Amazon Redshift for downstream analytics and compliance reporting.
  • Built serverless ETL workflows in AWS Glue for daily ingestion and transformation of liquidity data from S3 sources.
  • Used Talend for pre-processing and transformation of incoming datasets.
  • Built CI/CD pipelines using Jenkins and Docker; managed releases and disaster recovery testing.
  • Implemented Spark optimizations and validated data reliability.
  • Read data from Kafka topics and ElasticSearch endpoints as part of ingestion and reconciliation.

Senior Data Engineer

IRIS Software
11.2022 - 06.2024
  • Key contributor to a large-scale migration project moving AML data from on-premise Hadoop to IBM S3 Cloud.
  • Developed PySpark jobs to run on clusters and internally triggered DISTCP for data transfer.
  • Performed Transformations using PySpark and built DQ framework
  • Logged operational metadata in SQL and built reconciliation and retrieval modules.
  • Built robust notification systems for job tracking and status updates.
  • Scheduled and managed workflows with AutoSys.
  • Integrated data pipelines that fetched information from Kafka and Elasticsearch for cross-platform audits.

Big Data Engineer

RBC
09.2021 - 09.2022
  • Developed ingestion pipelines using Python and Sqoop.
  • Managed SSIS-based ETL components and oversaw server upgrades.
  • Integrated Azure DevOps into CI/CD processes for automated deployment and testing of data solutions.
  • Managed data movement and storage using Azure Data Lake and monitored workflows using Azure Monitor.
  • Utilized Azure Synapse Analytics for querying, analyzing, and integrating structured data into reporting models.
  • Developed and maintained data pipelines using Azure Data Factory (ADF) for orchestrating ETL workflows.
  • Used DataBricks for ETL.
  • Handled release management for deployments and performed data validation.
  • Maintained metadata-driven frameworks and documented system architecture.

Big Data Engineer

TransUnion
08.2018 - 03.2019
  • Built and deployed a robust credit scoring engine using PySpark, Spark SQL, and Hive.
  • Designed ingestion pipelines with Sqoop and Hive for data consolidation.
  • Managed metadata in Hive Meta store and exported scoring results to Cassandra.
  • Held responsible for Data ingestions from different sources
  • Manged multiple Internal and external Hive tables

Hadoop Developer

Capgemini Technologies
10.2016 - 08.2018
  • Developed scalable pipelines using Hadoop, Hive, Spark, and Sqoop.
  • Created Python-based ingestion scripts to load data for analytics.
  • Performed HiveQL tuning with partitioning and bucketing.
  • Extensive python intense data manipulations using Pandas library on custom Applications
  • Worked on Abinition tool and built custom applications

Education

Post-Graduate Diploma - Computer Networking

Fleming College
Canada

Bachelor of Technology - Information Technology

JNTU University

Skills

  • Workflow Orchestration: Apache Airflow, NiFi, AutoSys
  • Big Data & ETL: PySpark, Hive, Hadoop, Talend, Sqoop, Spark SQL, SSIS
  • Cloud Platforms: AWS (Glue, Redshift, S3, EMR, Athena, Lambda, Step Functions, CD), Azure (Synapse, ADF, Databricks, Data Lake), IBM Cloud
  • Languages: Python, Scala, SQL, Shell Scripting
  • Version Control & CI/CD: Git, Bitbucket, Jenkins, Docker, AWS CodeCommit, Azure DevOps
  • Databases: Amazon Redshift, Azure Synapse Analytics, MySQL, Oracle, IBM Cloud S3

Timeline

Senior Big Data Engineer

Scotiabank
07.2024 - Current

Senior Data Engineer

IRIS Software
11.2022 - 06.2024

Big Data Engineer

RBC
09.2021 - 09.2022

Big Data Engineer

TransUnion
08.2018 - 03.2019

Hadoop Developer

Capgemini Technologies
10.2016 - 08.2018

Bachelor of Technology - Information Technology

JNTU University

Post-Graduate Diploma - Computer Networking

Fleming College
Abhinesh Dasari