Summary

Overview

Work History

Skills

Timeline

Raju Gaekwad

SANTA CLARA,CA

Summary

Results-driven data engineering professional with solid foundation in designing and maintaining scalable data systems. Expertise in developing efficient ETL processes and ensuring data accuracy, contributing to impactful business insights. Known for strong collaborative skills and ability to adapt to dynamic project requirements, delivering reliable and timely solutions.

Knowledgeable [Desired Position] with robust background in data architecture and pipeline development. Proven ability to streamline data processes and enhance data integrity through innovative solutions. Demonstrates advanced proficiency in SQL and Python, leveraging these skills to support cross-functional teams and drive data-driven decision-making.

Overview

years of professional experience

Work History

Azure Data Engineer

State Street

02.2023 - Current

Managed end-to-end operations of ETL data pipelines, ensuring scalability and smooth functioning
Implemented optimized query techniques and indexing strategies to enhance data fetching efficiency
Utilized SQL queries, including DDL, DML, and various database objects (indexes, triggers, views, stored procedures, functions, and packages) for data manipulation and retrieval
Integrated on-premises (MySQL, Cassandra) and cloud-based (Blob storage, Azure SQL DB) data using Azure Data Factory, applying transformations and loading data into Snowflake
Orchestrated seamless data movement into SQL databases using Data Factory's data pipelines
Developed data warehousing techniques, data cleansing, Slowly Changing Dimension (SCD) handling, surrogate key assignment, and change data capture for Snowflake modelling
Designed and implemented scalable data ingestion pipelines using tools such as Apache Kafka, Apache Flume, and Apache Nifi to collect and process large volumes of data from various sources
Developed and maintained ETL/ELT workflows using technologies like Apache Spark, Apache Beam, or Apache Airflow, enabling efficient data extraction, transformation, and loading processes
Implemented data quality checks and data cleansing techniques to ensure the accuracy and integrity of the data throughout the pipeline
Built and optimized data models and schemas using technologies like Apache Hive, Apache HBase, or Snowflake to support efficient data storage and retrieval for analytics and reporting purposes
Developed ELT/ETL pipelines using Python and Snowflake Snow SQL to facilitate data movement to and from Snowflake data store
Created ETL transformations and validations using Spark-SQL/Spark Data Frames with Azure Databricks and Azure Data Factory
Collaborated with Azure Logic Apps administrators to monitor and resolve issues related to process automation and data processing pipelines
Optimized code for Azure Functions to extract, transform, and load data from diverse sources, including databases, APIs, and file systems
Designed, built, and maintained data integration programs within Hadoop and RDBMS environments
Implemented a CI/CD framework for data pipelines using the Jenkins tool, enabling efficient automation and deployment
Collaborated with DevOps engineers to establish automated CI/CD and test-driven development pipelines using Azure, aligning with client requirements
Demonstrated proficiency in scripting languages like Python and Scala for efficient data processing
Executed Hive scripts through Hive on Spark and SparkSQL to address diverse data processing needs
Collaborated on ETL tasks, ensuring data integrity and maintaining stable data pipelines
Utilized Kafka, Spark Streaming, and Hive to process streaming data, developing a robust data pipeline for ingestion, transformation, and analysis
Utilized Spark Core and Spark SQL scripts using Scala to accelerate data processing capabilities
Utilized JIRA for project reporting, creating subtasks for development, QA, and partner validation
Actively participated in Agile ceremonies, including daily stand-ups and internationally coordinated PI Planning, ensuring efficient project management and execution
Environment: Azure Databricks, Data Factory, Logic Apps, Functional App, Snowflake, MS SQL, Oracle, HDFS, MapReduce, YARN, Spark, Hive, SQL, Python, Scala, PySpark, Spark Performance, data integration, data modeling, data pipelines, production support, Shell scripting, GIT, JIRA, Jenkins, Kafka, ADF Pipeline, Power Bi
Gathered, defined and refined requirements, led project design and oversaw implementation.
Designed data models for complex analysis needs.
Developed and delivered business information solutions.
Reviewed project requests describing database user needs to estimate time and cost required to accomplish projects.

Azure Data Engineer

Kroger Technologies Inc

10.2021 - 01.2023

Enhanced Spark performance by optimizing data processing algorithms, leveraging techniques such as partitioning, caching, and broadcast variables
Implemented efficient data integration solutions to seamlessly ingest and integrate data from diverse sources, including databases, APIs, and file systems, using tools like Apache Kafka, Apache NiFi, and Azure Data Factory
Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks
Worked on Microsoft Azure services like HDInsight Clusters, BLOB, Data Factory and Logic Apps and also done POC on Azure Data Bricks
Perform ETL using Azure Data Bricks, Migrated on premise Oracle ETL process to azure synapse analytics
Worked on Migrating SQL database to Azure data lake, Azure data lake analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse controlling and granting database access and Migrating on Premise databases to azure data lake store using Azure Data Factory
Data transfer using azure synapse and Polybase
Deployed and optimized Python web applications to Azure DevOps CI/CD to focus on development
Developed enterprise level solution using batch processing and streaming framework (using Spark Streaming, apache Kafka
Designed and implemented robust data models and schemas to support efficient data storage, retrieval, and analysis using technologies like Apache Hive, Apache Parquet, or Snowflake
Developed and maintained end-to-end data pipelines using Apache Spark, Apache Airflow, or Azure Data Factory, ensuring reliable and timely data processing and delivery
Collaborated with cross-functional teams to gather requirements, design data integration workflows, and implement scalable data solutions
Provided production support and troubleshooting for data pipelines, identifying and resolving performance bottlenecks, data quality issues, and system failures
Processed the schema oriented and non-schema-oriented data using Scala and Spark
Created Partitions, Buckets based on State to further process using Bucket based Hive joins
Created Hive Generic UDF's to process business logic that varies based on policy
Worked with Data Lakes and big data ecosystems (Hadoop, Spark, Hortonworks, Cloudera)
Load and transform large sets of structured, semi structured, and unstructured data
Written Hive queries for data analysis to meet the Business requirements
Wrote Hive queries for data analysis to meet the specified business requirements by creating Hive tables and working on them using Hive QL to simulate MapReduce functionalities
Developed a data pipeline using Kafka, Spark, and Hive to ingest, transform and analyzing data
Worked on RDD’s & Data frames (SparkSQL) using PySpark for analyzing and processing the data
Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data
Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing
Implemented CICD pipelines to build and deploy the projects in Hadoop environment
Using JIRA to manage the issues/project workflow
Worked on Spark using Python (PySpark) and Spark SQL for faster testing and processing of data
Used Git as version control tools to maintain the code repository
Environment: Azure Databricks, Data Factory, Logic Apps, Functional App, Snowflake, MS SQL, Oracle, HDFS, MapReduce, YARN, Spark, Hive, SQL, Python, Scala, PySpark, Spark Performance, data integration, data modeling, data pipelines, production support, Shell scripting, GIT, JIRA, Jenkins, Kafka, ADF Pipeline, Power Bi

Data Engineer

Rockwell Collins

07.2020 - 09.2021

Designed and setup Enterprise Data Lake to provide support for various uses cases including Analytics, processing, storing and Reporting of voluminous, rapidly changing data
Responsible for maintaining quality reference data in source by performing operations such as cleaning, transformation and ensuring Integrity in a relational environment by working closely with the stakeholders & solution architect
Worked on creating tabular models on Azure analytic services for meeting business reporting requirements
Data Ingestion to one or more cloud Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and cloud migration processing the data in Azure Databricks
Creating pipelines, data flows and complex data transformations and manipulations using ADF and PySpark with Databricks
Working with Azure BLOB and Data Lake storage and loading data into Azure SQL Synapse analytics (DW)
Developed Python, PySpark, Bash scripts logs to Transform, and Load data across on premise and cloud platform
Worked on Apache Spark Utilizing the Spark, SQL, and Streaming components to support the intraday and real-time data processing
Set up and worked on Kerberos authentication principals to establish secure network communication on cluster and testing of HDFS, Hive, Pig and Map Reduce to access cluster for new users
Used Spark SQL for Scala & amp, Python interface that automatically converts RDD case classes to schema RDD
Import the data from different sources like HDFS/HBase into Spark RDD and perform computations using PySpark to generate the output response
Implementing different performance optimization techniques such as using distributed cache for small datasets, partitioning, and bucketing in hive, doing map side joins etc
Good knowledge on Spark platform parameters like memory, cores and executors
Developed reusable framework to be leveraged for future migrations that automates ETL from RDBMS systems to the Data Lake utilizing Spark Data Sources and Hive data objects
Importing & exporting database using SQL Server Integrations Services (SSIS) and Data Transformation Services (DTS Packages)
Environment: Azure, Azure Data Factory, Databricks, PySpark, Python, Apache Spark, HBase, HIVE, SQOOP, Snowflake, Python, SSRS, Tableau

Big data Developer

Broadridge

09.2017 - 07.2020

Designed and developed the applications on the data lake to transform the data according business users to perform analytics
In depth understanding/ knowledge of Hadoop architecture and various components such as HDFS, application manager, node master, resource manager name node, data node and map reduce concepts
Involved in developing a Map Reduce framework that filters bad and unnecessary records
Involved heavily in setting up the CI/CD pipeline using Jenkins, Maven, Nexus, GitHub, and AWS
Developed data pipeline using flume, SQOOP, pig and map reduce to ingest customer behavioural data and purchase histories into HDFS for analysis
Used Spark-SQL to load JSON data and create schema RDD and loaded it into Hive tables handled structured data using Spark SQL
Used HIVE to do transformations, event joins and some pre-aggregations before storing the data onto HDFS
The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency
Implemented the workflows using Apache OOZIE framework to automate tasks
Developing design documents considering all possible approaches and identifying best of them
Written Map Reduce code that will take input as log files and parse the and structures them in tabular format to facilitate effective querying on the log data
Developed scripts and automated data management from end to end and sync up b/w all the Clusters
Implemented Fair schedulers on the Job Tracker to share the resources of the cluster for the Map Reduce jobs given by the users
Environment: Cloudera CDH 3/4, Hadoop, HDFS, MapReduce, Hive, Oozie, Pig, Shell Scripting, MySQL

Data warehouse Developer

Accenture inc

06.2014 - 09.2017

Create and maintain database for Server Inventory, Performance Inventory
Worked in Agile Scrum Methodology with daily stand up meetings, great knowledge working with Visual SourceSafe for Visual studio 2010 and tracking the projects using Trello
Generated Drill through and Drill down reports with Drop down menu option, sorting the data, and defining subtotals in Power BI
Used Data warehouse for developing Data Mart which for feeding downstream reports, development of User Access Tool using which users can create ad-hoc reports and run queries to analyze data in the proposed Cube
Deployed the SSIS Packages and created jobs for efficient running of the packages
Expertise in creating ETL packages using SSIS to extract data from heterogeneous database and then transform and load into the data mart
Involved in creating SSIS jobs to automate the reports generation, cube refresh packages
Great Expertise in Deploying SSIS Package to Production and used different types of Package configurations to export various package properties to make package environment independent
Experienced with SQL Server Reporting Services (SSRS) to author, manage, and deliver both paper-based and interactive Web-based reports
Developed stored procedures and triggers to facilitate consistent data entry into the database
Shared data outside using Snowflake to quickly set up to share data without transferring or developing pipelines
Environment: Windows server, MS SQL Server 2014, SSIS, SSAS, SSRS, SQL Profiler, Power BI, C#, Performance Point Server, MS Office, SharePoint

Skills

ETL development
Data warehousing
Data modeling

Data pipeline design
Big data processing

Timeline

Azure Data Engineer

State Street

02.2023 - Current

Azure Data Engineer

Kroger Technologies Inc

10.2021 - 01.2023

Data Engineer

Rockwell Collins

07.2020 - 09.2021

Big data Developer

Broadridge

09.2017 - 07.2020

Data warehouse Developer

Accenture inc

06.2014 - 09.2017

Raju Gaekwad

Summary

Overview

Work History

Azure Data Engineer

Azure Data Engineer

Data Engineer

Big data Developer

Data warehouse Developer

Skills

Timeline

Azure Data Engineer

Azure Data Engineer

Data Engineer

Big data Developer

Data warehouse Developer

Similar Profiles

PRATHAMESH PATILPRATHAMESH PATIL

Dharani SDharani S

Deepika ValabojuDeepika Valaboju

PRIYANKA MHATREPRIYANKA MHATRE

Josephine GarciaJosephine Garcia