Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

KARTHIK M

Data Engineer
Toronto,ON

Summary

Data engineer with 5 years of experience in designing, building, and optimizing data pipelines using Azure, AWS, and Apache technologies, and holding a master's degree in computer science. Seeking to contribute to a forward-thinking organization by leveraging expertise in ETL development, data transformation, and workflow automation using tools such as Azure Data Factory, AWS Glue, Apache Spark, Apache Airflow, and Terraform. Committed to delivering scalable, efficient, and reliable data solutions in both collaborative and remote environments.

Overview

6
6
years of professional experience
5
5
years of post-secondary education
1
1
Certification

Work History

Azure Data Engineer

CGI
Toronto, ON
01.2024 - Current
  • Designed and deployed Azure Synapse pipelines to ingest structured data from Azure Blob Storage into SQL pools, incorporating event-driven triggers via Azure Event Grid.
  • Contributed to the migration of legacy Microsoft SQL Server data warehouse solutions to Snowflake Data Cloud on Microsoft Azure, improving scalability, performance, and data availability.
  • Built scalable ETL workflows using PySpark in Azure Databricks, orchestrated through Azure Data Factory, reducing processing time by 60%.
  • Developed and maintained fact and dimension tables in Snowflake to support enterprise-level reporting and analytics requirements.
  • Achieved up to 70 percent improvement in query performance and over 50 percent reduction in support costs through optimized data architecture and cloud-based pipeline design.
  • Developed and optimized stored procedures in T-SQL for data transformation, cleansing, standardization, and enrichment processes.
  • Developed data quality checks, error handling, and audit logging in Azure Synapse, and optimized SQL queries for high concurrency and large-scale data loads in Synapse Dedicated SQL Pools.
  • Implemented Medallion Architecture for data warehousing solutions, optimizing data processing and analytics workflows.
  • Collaborated with data science teams to deliver insights by building data pipelines that ingest and transform data into formats usable for machine learning models.
  • Worked extensively with Delta files to ensure efficient data storage, real-time data processing, and seamless data updates, leading to improved data reliability and performance.

AWS Data Engineer

Cognizant
Toronto, Ontario
08.2021 - 08.2022
  • Designed and built an end-to-end AWS-based data pipeline to data from multiple external vendors into Amazon S3.
  • Developed scalable ETL workflows using AWS Glue and Pyspark to clean, standardize, and transform multi-source datasets, reducing processing time by 60%.
  • Implemented event-driven ingestion using Lambda, Kinesis, and S3 triggers, enabling real-time updates with minimal manual intervention.
  • Integrated Redshift and Athena for analytics, reducing query latency by 50% and improving report generation speed for business users.
  • Automated job orchestration using AWS Step Functions, improving operational efficiency and reducing pipeline maintenance by 40%.
  • Designed client-specific data delivery models supporting dynamic schema changes, enabling flexible access for downstream systems.
  • Deployed proactive alerting using Cloudwatch and SNS, cutting downtime by 30% through early failure detection and resolution.

Junior Data Engineer

Cognizant
Toronto, Ontario
05.2020 - 08.2021
  • Contributed to the end-to-end migration of a legacy data processing system from on-premises SQL Server to Microsoft Azure, focused on processing insurance-related files such as payments, rejections, and enrollments.
  • Refactored existing T-SQL stored procedures to be compatible with Azure Synapse Analytics (dedicated SQL pool) for post-migration data transformation and processing.
  • Designed and implemented Azure Data Factory (ADF) pipelines to automate the ingestion of raw data files into Azure Data Lake Storage Gen2 (ADLS).
  • Built dynamic, parameterized pipelines in ADF to support the processing of multiple file types with configurable scheduling and triggers.
  • Developed workflows in Azure Synapse to extract data from ADLS and execute stored procedures for data validation, cleansing, and loading into target tables.
  • Replicated on-premises ETL logic in the Azure ecosystem using ADF for orchestration, ADLS for storage, and Synapse SQL for processing and reporting.
  • Conducted data validation and reconciliation to ensure data integrity and accuracy during and after the migration process.
  • Implemented a Power BI dashboard to visualize key metrics, enabling real-time insights and data-driven decision-making, and improving performance tracking and reporting accuracy.

Program Analyst

Cognizant
Toronto, Ontario
05.2019 - 04.2020
  • Developed complex database objects, including Stored Procedures, Functions, Triggers, Indexes, and Constraints, to streamline data manipulation and ensure efficient data access.
  • Designed and optimized SQL queries to handle large datasets, reducing query execution time by up to 40% and improving overall system performance.
  • Utilized SQL Server to create and maintain schemas, manage data integrity, and perform advanced troubleshooting and performance tuning on queries.
  • Created comprehensive reports and dashboards using SQL for data analysis, providing insights that aided strategic decision-making and enhanced business intelligence capabilities.
  • Effectively resolved Jira tickets by troubleshooting and addressing issues related to data pipelines and infrastructure, ensuring timely resolution and maintaining project timelines.

Education

Masters - Applied Computer Science

St Francis Xavier University
Antigonish, NS
09.2022 - 04.2024

Bachelor of Technology - Computer Science & Engineering

Sree Vidynaikethan Engineering College
Antigonish, NS
06.2015 - 04.2019

Skills

Operating Systems: Windows, Linux, Mac, Unix

Programming Languages: Python, PySpark, Pandas, Scala, Java, C, C, R, C#

Cloud Platforms: Azure, AWS

Databases: Oracle, MySQL, SQL Server, MongoDB, Cassandra, DynamoDB, PostgreSQL

Data Warehousing: Redshift, Snowflake, Azure Synapse Analytics

Big Data Technologies: Apache Spark, Databricks Hadoop, MapReduce, HDFS, PIG, Hive, Kafka, Zookeeper

Machine Learning: Scikit-Learn, PyTorch, XGBoost, Azure Machine Learning

Streaming Technologies: Apache Kafka, Amazon Kinesis, Apache Flink, Azure Event Hubs

Monitoring Tools: Apache Airflow, Amazon CloudWatch, Azure Monitor

Visualization/ Reporting: Tableau, SSRS, Amazon QuickSight and Power BI

Certification

DP-203: Data Engineering on Microsoft Azure, Microsoft, 07/01/24, 1E9BE3-849D7G

Timeline

Azure Data Engineer

CGI
01.2024 - Current

Masters - Applied Computer Science

St Francis Xavier University
09.2022 - 04.2024

AWS Data Engineer

Cognizant
08.2021 - 08.2022

Junior Data Engineer

Cognizant
05.2020 - 08.2021

Program Analyst

Cognizant
05.2019 - 04.2020

Bachelor of Technology - Computer Science & Engineering

Sree Vidynaikethan Engineering College
06.2015 - 04.2019
KARTHIK MData Engineer