Experienced Data Engineer with 5+ years of experience in designing data-centric solutions.
Developed various pipelines for Data Engineering, Data Validation, Feature Engineering for structured and unstructured datasets with various financial and non-financial datasets
Used deep business domain knowledge to independently lead the analytics process to identify valuable and innovative insights.
Established and maintained collaboration with research and business teams to converge on the best solutions in industries like Finance, Telecommunication, and Insurance
Extended prototypes into fully functional, scalable and polished solutions ready for internal and/or external use
Overview
6
6
years of professional experience
Work History
Technical Lead for Data products for Capital Markets
OMERS
02.2022 - Current
Project Description: Technical Lead for delivering data services to OMERS Capital Markets
Worked with the business to deliver Investment and Risk data coming from the book of records directly to the Capital markets division. Worked closely with the data owners and integrated multiple data sources to provide an API and the front end. Currently provides an outlook on daily investment decisions dating 2020 and supports Portfolio managers, Analysts, Traders, and developers directly.
Developed Gen AI RAG pipeline in Databricks to get better and more customized performance than Copilot. Also integrated the solution with tools like Azure Document Intelligence.
Led the team to set up a vector database on Azure Kubernetes Service to store T4 filings. Currently processing data worth 50M vectors in milliseconds
Delivered self-serve platform to the Portfolio Analytics team which automated manual running python jobs. This saved 2 FTE in 1 year and enabled automation within the team
Support Python libraries which enable technical users to gather data from various sources and also lets them do quantitative functions directly in one place.
Lead Data Engineer at a Retail Company
EY
08.2021 - 02.2022
Project Description: Delivering solution for cloud migration for a major Canadian retail company
Led the source data analysis to develop data lineage to establish visibility between project owners and source owners
Coordinated with Data stewards to develop proper triage processes for metadata capture, CDE identification, Data Quality Rule development and Ingestion pattern development
Bridged gap between Business users and Technical team by implementing custom DQ framework for daily DQ dashboards.
Lead Data Engineer at a Crown Corporation
EY
03.2020 - 08.2021
Project Description: Solution design and implementation of a no- fault Insurance model for a crown corporation in British Columbia
Owned the delivery of state of the art pipelines on Scala and Apache spark to run on a MapR Hadoop environment which involved transactions over 5 Million + rows everyday
Worked with Key business stake holders to gather requirements for the development in an Agile environment
Took ownership of the design and implementation of elements that will help the end users (Actuaries) analyze the premium rates in a better way
Optimized the pipeline execution by almost 3 hours or 60%
Triaged defects in an agile environment using Continuous Integration Continuous Deployment Method (CICD)
Presented the outcomes of the development to the business that will help reduce the insurance premium of the people of British Columbia by an amount of 250$ on an average per year per driver
Senior Data Engineer at a Leading Canadian Pension Plan
EY
06.2019 - 03.2020
Project Description: Enabling Natural Language Processing capabilities for a Canadian Pension Plan
Developed Pipeline in Azure Databricks for Data Engineering of unstructured text data for data cleaning steps like stop words removal, lemmatization etc.
Created features like n-grams, word to vec, sparse matrix of important keywords etc for supporting Machine Learning tasks.
Optimized the data engineering and feature generation process by 60% using the parallel processing of Apache spark.
Developed a novel way of Topic Modeling which uses business input and Guided LDA to extract the topic distribution of an article. Achieved 80% accuracy in predicting the top 2 topics.
Created a sentiment analysis model for analyzing the sentiment of the text articles which involves using retrofitting methods to manipulate word embedding. Achieved 75% accuracy when used with a logistic regression model to classify the sentiment
Used XGBoost classification method for prioritizing emails for FinOps department. These emails consisted of daily trade calls from banks and needed human effort. Achieved 95% accuracy and saved approximately 3 FTEs
Data Engineer at a Canadian Telecom company
EY
09.2018 - 05.2019
Project Description: Reporting engine for IFRS calculations for a Canadian Telecommunication Company
Used SQL on Teradata for designing and implementation of the solution
Created various key deliverables including the metrics report and rolling asset which was a month to date summary of all the data received. This report has been leveraged and shared with the client for all knowledge sharing meetings and discussions
Education
Master of Engineering - Electrical And Computer Engineering
Toronto Metropolitan University
05.2018
Bachelor of Engineering -
PEC University of Technology
05.2016
Skills
Programming languages : Python, SQL, R, Scala, Java