Summary

Overview

Work History

Education

Skills

Timeline

Ruchit Thakkar

Toronto

Summary

Data Engineer/ETL Developer with 7+ years of experience in the IT industry. Specialized in Cloud platforms including AWS and Azure. Expertise in Data Analysis, Statistical Analysis, Machine Learning, Deep Learning, and Data mining. Skilled in handling large data sets of structured and unstructured data sources, including Big Data. Proficient in Python, SQL, and Tableau for end-to-end data science solutions. Experienced in using Spark with Scala for advanced analytics on Hadoop clusters and PostgreSQL for robust data engineering tasks. Domain expertise in Investments Management with Informatica Power Center for complex data extractions. Well-versed in ETL processes, Dimensional Data Modeling, SCD, Performance Tuning, and Data Warehousing. Familiar with big data technologies like Hadoop, Spark, and Hive. Strong communication and interpersonal abilities. Hands-on experience in AWS & Azure Cloud platform operations.

Overview

years of professional experience

Work History

Data Engineer / ETL Developer

TMX Group

11.2021 - Current

Conducting preliminary data analysis with descriptive statistics, rectifying anomalies like removing duplicates, and imputing missing values
Developing monitoring and notification tools using Python
AWS Glue for ETL (Extract, Transform, Load) service
It’s particularly useful for transforming data and moving it between different storage systems, including S3, Redshift, and RDS
Executing various MYSQL database queries from Python using Python-MySQL connector and MySQL database package
Automated ETL tasks and data processing pipelines using Python, reducing manual intervention and increasing overall system efficiency
AWS Lambda for Triggering an ETL job when new data is uploaded to S3
Amazon Redshift for large-scale data warehousing, using SQL for querying and managing large datasets
HDFS (Hadoop Distributed File System) For distributed storage of large datasets across a cluster
Use MapReduce for batch processing tasks in HADOOP consider using Apache Spark for faster data processing and easier development with its rich API
Designed, developed, and optimized DBT models for transforming and organizing data within AWS Redshift/Snowflake/BigQuery
Built modular, reusable, and efficient SQL transformations using DBT to enable data analysts and business intelligence teams
Integrated DBT with AWS services such as S3, Glue, Athena, Lambda, and Step Functions to support scalable ETL pipelines
Developed and optimized complex SQL queries for data extraction, transformation, and reporting, ensuring data accuracy and consistency across multiple platforms
Apache Airflow provides a UI to monitor the status of your DAGs and tasks
You can also integrate
Designed and implemented server less AWS Lambda functions to automate data processing tasks, integrating seamlessly with other AWS services like S3, SNS, and DynamoDB
Deploy, configure, and maintain Kubernetes clusters (EKS, AKS, GKE )
Proficient in writing SQL Queries and implementing stored procedures, functions, packages, tables, views, Cursors, triggers
Developing data pipelines and integrating them with services like AWS Glue, Lambda, and Redshift
Verified and validated data accuracy, completeness, and consistency by using ETL tools and writing complex SQL queries across various data sources
Designed, developed, and implemented scalable ETL pipelines using AWS Glue, Lambda, and DataSync to automate data ingestion and processing workflows
Deep domain knowledge to understanding business logic, data flows, and processes for efficient test case creation and execution
Integrated AWS Glue with other AWS services like S3, Redshift, and RDS (Aurora/PostgreSQL) to build end-to-end data pipelines for business intelligence and analytics
Managed data storage solutions on AWS S3, Aurora (PostgreSQL), and DynamoDB, ensuring high availability, scalability, and security for critical business data
Design, develop, and maintain scalable ETL (Extract, Transform, Load) pipelines using Data Bricks and Apache Spark
Data Bricks collaborative workspace for data engineers, data scientists, and analysts to work with big data
CI/CD pipelines for automating the deployment of ETL jobs or Lambda functions using services like AWS Code Pipeline, GitLab CI, or Jenkins
Collaborated closely with developers, data analysts, and business stakeholders to gather and understand technical and business requirements for ETL testing

Cloud Data Engineer

Synechron

09.2018 - 10.2021

Conducted analysis, design, and construction of contemporary data solutions using Azure PaaS services to facilitate data visualization, assessing their impact on existing business processes
Extracted, transformed, and loaded data from source systems into Azure Data Storage services, employing a blend of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics
Managed data ingestion to various Azure services such as Azure Data Lake, Azure Storage, Azure SQL, and Azure DW, processing data within Azure Data Bricks
Configured pipelines in ADF using Linked Services, Datasets, and Pipelines to extract, transform, and load data from diverse sources, including Azure SQL, Blob storage, and Azure SQL Data Warehouse
Design and implement efficient data models to optimize performance in Power BI, ensuring data integrity and accuracy
Create interactive and visually appealing reports and dashboards using Power BI Desktop and Power BI Service, tailored to business requirements
Identify and resolve issues related to data connectivity, report performance, and user access in Power BI
Developed and deployed data pipelines using Azure Data Bricks (ADB) for distributed data processing and machine learning workflows, enabling real-time analytics and advanced data processing capabilities
Designed, developed, and orchestrated ETL/ELT pipelines using Azure Data Factory (ADF) to automate the extraction, transformation, and loading of data from various data sources into data lakes and warehouses
Architected and implemented data storage solutions using Azure Data Lake Storage (ADLS), ensuring efficient storage and retrieval of large datasets in a secure, scalable, and cost-effective manner
Utilized Apache Spark (on Azure Data Bricks) for distributed data processing and analytics, ensuring scalability, performance, and efficient resource management
Worked extensively with Azure platform services for end-to-end data engineering workflows, including data ingestion, storage, transformation, and orchestration
Designed, developed, and maintained data solutions using Azure Data Lake Storage (ADLS), Azure Dat Bricks (ADB), and Azure Data Factory (ADF) to support scalable data ingestion, processing, and storage solutions
Built data transformation workflows in ADF using built-in activities (like copy, lookup, and data flow activities) to transform raw data and load it into target systems for reporting and analytics
Integrated data pipelines with Azure Data Lake and other cloud-based data services for efficient data storage and retrieval
Identify and evaluate various data sources, including databases, Excel files, and APIs, to ensure comprehensive data integration, and Establish connections to diverse data sources, ensuring secure and efficient data retrieval for Power BI
Collaborated within an agile framework, utilizing JIRA for managing project stories from requirements gathering to design, development, and testing

ETL Developer

Axtria

07.2017 - 07.2018

Implemented Agile methodology in SDLC using JIRA, overseeing daily scrums, sprint reviews, backlog refinement, sprint planning, Sprint demo, and Sprint retro sessions
Executed a Proof of Concept for integrating Oracle and flat files to Salesforce using Informatica Cloud
Utilized HTTP transformations to retrieve XML data from websites
Created mappings, built workflows, and monitored processes using Informatica
Developed a Python script to convert .csv to .xlsx files, installing necessary modules
Managed multiple time-sensitive reporting projects within proposed budgets
Designed hundreds of mappings on Informatica PowerCenter, including SCD type 1 and Type 2
Automated INFA jobs on PROD using Maestro
Employed SQL repository queries in PowerCenter to identify modified objects and capture them for migration
Facilitated workflow deployment using UDeploy
Leveraged Python scripting for file movement and FTP processes
Implemented CDC methodology in mappings to ensure the latest data for reporting

Education

Bechlor’s of Engineering -

HGCE College of Engineering And Technology

Ahmedabad, India

06-2017

DataEngineering

IBM

AWS Cloud

Amazon

Azure SQL

Microsoft

Python, Data science and AI development

IBM

Big data Spark & HADOOP

IBM

Skills

SQL

MYSQL

PostgreSQL

Big Data Processing Frameworks: Apache Spark

HADOOP

HDFS

Hive

JIRA

Cloud Platform: AWS

AWS EC2

AWS S3

AMAZON REDSHIFT

AWS GLUE

AWS Kinesis

AWS Lambda

AWS EMR

Languages: Python

Scala

Powershell

Reporting Tools: MS Office (Word/Excel/Power Point/Visio)

Azure Data Factory

Azure Data Lake Storage

Azure Synapse Analytics

Azure Data Bricks

Tableau

Power BI

Data warehousing

Data modeling

ETL pipeline design

Real-time processing

Data migration

Data cleansing

Big data processing

Data validation

Data profiling

Real-time analytics

API development

Timeline

Data Engineer / ETL Developer

TMX Group

11.2021 - Current

Cloud Data Engineer

Synechron

09.2018 - 10.2021

ETL Developer

Axtria

07.2017 - 07.2018

AWS Cloud

Amazon

Azure SQL

Microsoft

Python, Data science and AI development

IBM

Big data Spark & HADOOP

IBM

Bechlor’s of Engineering -

HGCE College of Engineering And Technology

DataEngineering

IBM

Ruchit Thakkar

Summary

Overview

Work History

Data Engineer / ETL Developer

Cloud Data Engineer

ETL Developer

Education

Bechlor’s of Engineering -

DataEngineering

AWS Cloud

Azure SQL

Python, Data science and AI development

Big data Spark & HADOOP

Skills

Timeline

Data Engineer / ETL Developer

Cloud Data Engineer

ETL Developer

AWS Cloud

Azure SQL

Python, Data science and AI development

Big data Spark & HADOOP

Bechlor’s of Engineering -

DataEngineering

Similar Profiles

Fawad AhmadFawad Ahmad

Harsh PatelHarsh Patel

Paula Headley-BrownPaula Headley-Brown

Om AgravatOm Agravat

Jamie NaundorfJamie Naundorf