Summary
Overview
Work History
Education
Skills
Timeline
Generic

Ruchit Thakkar

Toronto

Summary

Data Engineer/ETL Developer with 7+ years of experience in the IT industry. Specialized in Cloud platforms including AWS and Azure. Expertise in Data Analysis, Statistical Analysis, Machine Learning, Deep Learning, and Data mining. Skilled in handling large data sets of structured and unstructured data sources, including Big Data. Proficient in Python, SQL, and Tableau for end-to-end data science solutions. Experienced in using Spark with Scala for advanced analytics on Hadoop clusters and PostgreSQL for robust data engineering tasks. Domain expertise in Investments Management with Informatica Power Center for complex data extractions. Well-versed in ETL processes, Dimensional Data Modeling, SCD, Performance Tuning, and Data Warehousing. Familiar with big data technologies like Hadoop, Spark, and Hive. Strong communication and interpersonal abilities. Hands-on experience in AWS & Azure Cloud platform operations.

Overview

8
8
years of professional experience

Work History

Data Engineer / ETL Developer

TMX Group
11.2021 - Current
  • Conducting preliminary data analysis with descriptive statistics, rectifying anomalies like removing duplicates, and imputing missing values
  • Developing monitoring and notification tools using Python
  • AWS Glue for ETL (Extract, Transform, Load) service
  • It’s particularly useful for transforming data and moving it between different storage systems, including S3, Redshift, and RDS
  • Executing various MYSQL database queries from Python using Python-MySQL connector and MySQL database package
  • Automated ETL tasks and data processing pipelines using Python, reducing manual intervention and increasing overall system efficiency
  • AWS Lambda for Triggering an ETL job when new data is uploaded to S3
  • Amazon Redshift for large-scale data warehousing, using SQL for querying and managing large datasets
  • HDFS (Hadoop Distributed File System) For distributed storage of large datasets across a cluster
  • Use MapReduce for batch processing tasks in HADOOP consider using Apache Spark for faster data processing and easier development with its rich API
  • Designed, developed, and optimized DBT models for transforming and organizing data within AWS Redshift/Snowflake/BigQuery
  • Built modular, reusable, and efficient SQL transformations using DBT to enable data analysts and business intelligence teams
  • Integrated DBT with AWS services such as S3, Glue, Athena, Lambda, and Step Functions to support scalable ETL pipelines
  • Developed and optimized complex SQL queries for data extraction, transformation, and reporting, ensuring data accuracy and consistency across multiple platforms
  • Apache Airflow provides a UI to monitor the status of your DAGs and tasks
  • You can also integrate
  • Designed and implemented server less AWS Lambda functions to automate data processing tasks, integrating seamlessly with other AWS services like S3, SNS, and DynamoDB
  • Deploy, configure, and maintain Kubernetes clusters (EKS, AKS, GKE )
  • Proficient in writing SQL Queries and implementing stored procedures, functions, packages, tables, views, Cursors, triggers
  • Developing data pipelines and integrating them with services like AWS Glue, Lambda, and Redshift
  • Verified and validated data accuracy, completeness, and consistency by using ETL tools and writing complex SQL queries across various data sources
  • Designed, developed, and implemented scalable ETL pipelines using AWS Glue, Lambda, and DataSync to automate data ingestion and processing workflows
  • Deep domain knowledge to understanding business logic, data flows, and processes for efficient test case creation and execution
  • Integrated AWS Glue with other AWS services like S3, Redshift, and RDS (Aurora/PostgreSQL) to build end-to-end data pipelines for business intelligence and analytics
  • Managed data storage solutions on AWS S3, Aurora (PostgreSQL), and DynamoDB, ensuring high availability, scalability, and security for critical business data
  • Design, develop, and maintain scalable ETL (Extract, Transform, Load) pipelines using Data Bricks and Apache Spark
  • Data Bricks collaborative workspace for data engineers, data scientists, and analysts to work with big data
  • CI/CD pipelines for automating the deployment of ETL jobs or Lambda functions using services like AWS Code Pipeline, GitLab CI, or Jenkins
  • Collaborated closely with developers, data analysts, and business stakeholders to gather and understand technical and business requirements for ETL testing

Cloud Data Engineer

Synechron
09.2018 - 10.2021
  • Conducted analysis, design, and construction of contemporary data solutions using Azure PaaS services to facilitate data visualization, assessing their impact on existing business processes
  • Extracted, transformed, and loaded data from source systems into Azure Data Storage services, employing a blend of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics
  • Managed data ingestion to various Azure services such as Azure Data Lake, Azure Storage, Azure SQL, and Azure DW, processing data within Azure Data Bricks
  • Configured pipelines in ADF using Linked Services, Datasets, and Pipelines to extract, transform, and load data from diverse sources, including Azure SQL, Blob storage, and Azure SQL Data Warehouse
  • Design and implement efficient data models to optimize performance in Power BI, ensuring data integrity and accuracy
  • Create interactive and visually appealing reports and dashboards using Power BI Desktop and Power BI Service, tailored to business requirements
  • Identify and resolve issues related to data connectivity, report performance, and user access in Power BI
  • Developed and deployed data pipelines using Azure Data Bricks (ADB) for distributed data processing and machine learning workflows, enabling real-time analytics and advanced data processing capabilities
  • Designed, developed, and orchestrated ETL/ELT pipelines using Azure Data Factory (ADF) to automate the extraction, transformation, and loading of data from various data sources into data lakes and warehouses
  • Architected and implemented data storage solutions using Azure Data Lake Storage (ADLS), ensuring efficient storage and retrieval of large datasets in a secure, scalable, and cost-effective manner
  • Utilized Apache Spark (on Azure Data Bricks) for distributed data processing and analytics, ensuring scalability, performance, and efficient resource management
  • Worked extensively with Azure platform services for end-to-end data engineering workflows, including data ingestion, storage, transformation, and orchestration
  • Designed, developed, and maintained data solutions using Azure Data Lake Storage (ADLS), Azure Dat Bricks (ADB), and Azure Data Factory (ADF) to support scalable data ingestion, processing, and storage solutions
  • Built data transformation workflows in ADF using built-in activities (like copy, lookup, and data flow activities) to transform raw data and load it into target systems for reporting and analytics
  • Integrated data pipelines with Azure Data Lake and other cloud-based data services for efficient data storage and retrieval
  • Identify and evaluate various data sources, including databases, Excel files, and APIs, to ensure comprehensive data integration, and Establish connections to diverse data sources, ensuring secure and efficient data retrieval for Power BI
  • Collaborated within an agile framework, utilizing JIRA for managing project stories from requirements gathering to design, development, and testing

ETL Developer

Axtria
07.2017 - 07.2018
  • Implemented Agile methodology in SDLC using JIRA, overseeing daily scrums, sprint reviews, backlog refinement, sprint planning, Sprint demo, and Sprint retro sessions
  • Executed a Proof of Concept for integrating Oracle and flat files to Salesforce using Informatica Cloud
  • Utilized HTTP transformations to retrieve XML data from websites
  • Created mappings, built workflows, and monitored processes using Informatica
  • Developed a Python script to convert .csv to .xlsx files, installing necessary modules
  • Managed multiple time-sensitive reporting projects within proposed budgets
  • Designed hundreds of mappings on Informatica PowerCenter, including SCD type 1 and Type 2
  • Automated INFA jobs on PROD using Maestro
  • Employed SQL repository queries in PowerCenter to identify modified objects and capture them for migration
  • Facilitated workflow deployment using UDeploy
  • Leveraged Python scripting for file movement and FTP processes
  • Implemented CDC methodology in mappings to ensure the latest data for reporting

Education

Bechlor’s of Engineering -

HGCE College of Engineering And Technology
Ahmedabad, India
06-2017

DataEngineering

IBM

AWS Cloud

Amazon

Azure SQL

Microsoft

Python, Data science and AI development

IBM

Big data Spark & HADOOP

IBM

Skills

SQL

MYSQL

PostgreSQL

Big Data Processing Frameworks: Apache Spark

HADOOP

HDFS

Hive

JIRA

Cloud Platform: AWS

AWS EC2

AWS S3

AMAZON REDSHIFT

AWS GLUE

AWS Kinesis

AWS Lambda

AWS EMR

Languages: Python

Scala

Powershell

Reporting Tools: MS Office (Word/Excel/Power Point/Visio)

Azure Data Factory

Azure Data Lake Storage

Azure Synapse Analytics

Azure Data Bricks

Tableau

Power BI

Data warehousing

Data modeling

ETL pipeline design

Real-time processing

Data migration

Data cleansing

Big data processing

Data validation

Data profiling

Real-time analytics

API development

Timeline

Data Engineer / ETL Developer

TMX Group
11.2021 - Current

Cloud Data Engineer

Synechron
09.2018 - 10.2021

ETL Developer

Axtria
07.2017 - 07.2018

AWS Cloud

Amazon

Azure SQL

Microsoft

Python, Data science and AI development

IBM

Big data Spark & HADOOP

IBM

Bechlor’s of Engineering -

HGCE College of Engineering And Technology

DataEngineering

IBM
Ruchit Thakkar