Summary

Overview

Work History

Education

Skills

Timeline

Sandeep Nutakki

Montreal,QC

Summary

Experience Senior Data Engineer with around 7+ years in Data Architecture, Modeling, ETL, and Database Management and proficient in Statistical Analysis and Scalable Solutions for large datasets. Experience in the entire Software Development Life Cycle (SDLC) including Requirements Analysis, Design, Development, Testing, Deployment, and Support, with proficiency in Agile methodologies. Proficient in implementing Data warehouse solutions using demonstrated expertise in migrating data from on-premises databases to Confidential Redshift, RDS, S3, and Azure Data Lake. Experience in Data Integration and Data Warehousing using various ETL tools Informatica PowerCenter, AWS Glue, SQL Server Integration Services (SSIS), Talend. Strong knowledge in designing and developingdashboards by extracting data from different sources like SQL Server, Oracle, SAP, Flat Files, Excel files, XML Files. Hands on Experience with AWS Snowflake cloud data warehouse and AWS S3 bucket for integrating data from multiple source system which include loading nested JSON formatted data into Snowflake table. Expertise in Installation, Configuration, and Migration, Troubleshooting and Maintenance of Splunk. Expertise AWS Lambada function and API Gateway, to submit data via API Gateway that is accessible via Lambda function. Developed the Python automation script for consuming the Data subjects request from AWS snowflake tables and post the data to adobe analytics privacy API. Extensive experience in designing DataStage server and parallel jobs, data profiling, UNIX Shell scripting and SQL/PL SQL development. Experience in developing Spark applications using Spark-SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats. Creating Applications on Splunk to analyze Big Data. Proficient in AWS services like Lambda, Kinesis, DynamoDB, S3, Cloud Watch. Acted as build and release engineer, deployed the services by VSTS (Azure DevOps) pipeline. Created and Maintained pipelines to manage the IAC for all the applications. Extensive experience in DataStage/Quality stage development projects for data cleansing, data standardization (Name and Address standardization, US postal address verification, Geo coding etc.). Exposure to implementation and operations of data governance, data strategy, data management and solutions Expertise in Cloudera, Horton works Hadoop, and Azure systems handling massive data volumes in distributed environments. Strong understanding of data technologies for big data processing. Created an Azure SQL database, monitored it, and restored it. Migrated Microsoft SQL server to Azure SQL database. Experience with Azure Cloud, Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, Azure Analytical services, Big Data Technologies (Apache Spark), and Data Bricks is preferred. Use Lambda functions and Step Functions to trigger Glue Jobs and orchestrate the data pipeline. Extensive experience developing and implementing cloud architecture on Microsoft Azure. Hands on working capability with MuleSoft components, Mule Expression Language (MEL) workflow, Any point Studio, Enterprise Service Bus (ESB), API Manger and RAML, REST, SOAP. Design and develop Solutions using C#, ASP.NET Core, Web API, Microsoft Azure techniques. Excellent understanding of connecting Azure Data Factory V2 with a range of data sources and processing the data utilizing pipelines, pipeline parameters, activities, activity parameters, and manually/window-based/event-based task scheduling. Experience in developing enterprise-level solutions using batch processing (using Apache Pig) and streaming framework (using Spark Streaming, Apache Kafka & Apache Flink). Developed Automated scripts to do the migration using Unix shell scripting, Python, Oracle/TD SQL, TD Macros and Procedures. Worked on ETL Migration services by creating and deploying AWS Lambda functions to provide a serverless data pipeline that can be written to Glue Catalog and queried from Athena. Understand the latest features like (Azure DevOps, OMS, NSG Rules, etc..,) introduced by Microsoft Azure and utilize it for existing business applications. Developed ETL pipelines in and out of the data warehouse using a mix of Python and Snowflake’s Snow SQL and Writing SQL queries against Snowflake. Experience in Python programming with packages such as NumPy, Matplotlib, SciPy, and Pandas. Extensive experience creating Web Services with the Python programming language, including implementation of JSON-based RESTful and XML-based SOAP web services. Experience in writing complex Python scripts with Object-Oriented principles such as class creation, constructors, overloading, and modules. Proficient with BI tools like Tableau and Power BI, data interpretation, modeling, data analysis, and reporting with the ability to assist in directing planning based on insights. Designed and implemented end-to-end data pipelines, integrating diverse data sources seamlessly into Business Intelligence tools such as Tableau and Power BI, facilitating real-time analytics and reporting. Proficient with Azure Data Lake Services (ADLS), Databricks &python Notebooks formats, Databricks Delta lakes& Amazon Web Services (AWS). Utilized Azure Functions and event-driven architectures using Azure Event Grid, Azure Event Hub, and Azure Service Bus for building scalable and event-driven data processing workflows. Worked on Data Migration from Teradata to AWS Snowflake Environment using Python and BI tools. Proficient in Spark architecture, Spark Core, Spark SQL, and Spark Streaming. Skilled in PySpark for interactive analysis, batch processing, and stream processing applications. Developed shell scripts for job automation, which will generate the log file for every job. Extensive Spark Architecture experience in performance tuning, Spark Core, Spark SQL, Data Frame, Spark Streaming, Deployment modes, fault tolerance, and execution hierarchy for enhanced efficiency. Expertise in using Kafka for log aggregation solution with low latency processing and distributed data consumption and widely used Enterprise Integration Patterns (EIPs). Designed and developed Flink pipelines to consume streaming data from Kafka and applied business logic to massage and transform and serialize raw data. Translated Java code to Scala code as part of Info sum pipeline build. Spark-Streaming APIs facilitated real-time transformations and actions for the common learner data model, ingesting data from Kinesis in near real-time.

Overview

years of professional experience

Work History

BI and Cloud Engineer

VIA Rail

05.2023 - Current

Deployed event-driven architectures with Apache Kafka as message brokers, empowering real-time data streaming and event processing capabilities
Designed real-time data streaming and large-scale distributed computing apps using Apache Spark, Spark Streaming, Kafka, and Flume for seamless data processing
Proficient in data architecture, including pipeline design, Hadoop information, data modeling, mining, machine learning, and advanced data processing
Designed and implemented end-to-end data pipelines, integrating diverse data sources seamlessly into Business Intelligence tools such as Tableau and Power BI, facilitating real-time analytics and reporting
Collaborated in the evaluation and selection of BI tools, including Power BI, Tableau, and SSRS, based on business requirements and technology trends, ensuring the adoption of tools that align with organizational objectives
Implemented and managed end-to-end BI solutions, utilizing Power BI and Tableau, to empower business users with self-service analytics capabilities, reducing dependency on IT for routine reporting tasks
Worked on AWS Data Pipeline to configure data loads from S3 into Redshift
Used AWS EMR to transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB
Enforced data Integrity and Business Rules during design of DataStage jobs
Traced and catalogue data processes, transformation logic and manual adjustments to identify data governance issues
Monitoring Produced and Consumed Data Sets of ADF
Performed impact analysis on DataStage jobs, Oracle Stored Procedures and Packages
Used different AWS Data Migration Services and Schema Conversion Tool along with Matillion ETL tool
Integrating Splunk with a wide variety of legacy data sources
Integrated lambda with SQS and DynamoDB with step functions to iterate through list of messages and updated the status into DynamoDB table
Experience in QlikView Scripting, Set analysisand Section Access
Written automation scripts for creating resources in OpenStack cloud using Python and terraform modules
Experienced in using Amazon, S3, Lambda, Kinesis, CloudWatch, DynamoDB and application services in the AWS cloud infrastructure
Cleaned data and processed third party spending data into maneuverable deliverables within specific formats with Excel macros and python libraries
Developed Proof of Concept POC for DataStage to SSIS migration
Worked on Data Extraction, aggregations, and consolidation of Adobe data within AWS Glue using PySpark
Experience on creating Data Pipelines using Azure Data Factory, Azure Synapse Analytics
Developed Spark streaming applications to pull data from cloud to Hive table and used Spark SQL to process structured data, while also using Talend for Big data Integration with Spark and Hadoop
Implemented ETL using Azure Data Factory, T-SQL, Spark SQL, and U-SQL in Azure Data Lake Analytics and ingested and processed data in Azure Data bricks
Create external tables with partitions using Hive, AWS Athena, and Redshift
Experienced with cloud platforms like Amazon Web Services, Azure, Databricks (both on Azure as well as AWS integration of Databricks) Working on Azure Synapse Analytics for implementing Pyspark Notebooks Worked with Finance, Risk, and Investment Accounting teams to create Data Governance glossary, Data Governance framework and Process flow diagrams
Standalone tools using C SHARP and ASP.Net
Environment included Visual
Create new UNIX scripts to automate and to handle different file processing, editing and execution sequences with shell scripting by using basic Unix commands and ‘awk’, ‘sed’ editing languages
Having a solid understanding of business needs and requirements for MuleSoft and creating CR’s and raising cases to MuleSoft
Developed the Pysprk code for AWS Glue jobs and for EMR
Automated and scheduled daily data loads of QVW documents usingQlikView Publisher
Created Azure PowerShell scripts to transfer data between the local file system and HDFS Blob storage for efficient data movement
Creating different AWS Lambda functions and API Gateways, to submit data via API Gateway that is accessible via Lambda function
Transforming data in Azure Data Factory with ADF Transformations
Experience with Splunk Architecture and extensive experience in Python
Extensively used DataStage Change Data Capture for DB2 and Oracle files and employed change capture stage in parallel jobs
Experience in designing and architecting serverless computing and implementation using AWS Lamda and event tracking during mission operations in C SHARP and ASP.Net
Worked extensively on SQL, PL/SQL, and UNIX shell scripting
Participated in the Data Governance working group sessions to create Data Governance Policies
Worked on EDS transformation using Azure Data Factory and Azure Databricks
Good hands-on experience on Splunk KV store
Analyzed Azure Data Factory and Azure Data Bricks to build new ETL process in Azure
Built Azure Data Warehouse reporting using Microsoft Power BI and identified target systems with an overview of their impacts, including Data Landing, Staging, Core, Data Sharing, and Data Visualization
Consumed the Adobe analytics web API and written the python script to get the adobe consumer information for digital marketing into snowflake
Worked on Adobe analytics ETL jobs
Designing and overseeing data models for NoSQL DBs (Azure Cosmos, MongoDB, Cassandra) and Relational DBs (MS SQL Server, PostgreSQL, Oracle) with proficiency
Using g-cloud function with Python to load Data into big query for on arrival csv files in GCS bucket
Hands-on experience on working with AWS services like Lambda function, Athena, DynamoDB, Step functions, SNS, SQS, S3, IAM etc
Utilize Matillion and AWS Redshift for DW enactment of Epic Data Warehousing concept in Snowflake database
Transformation of data orchestrated via Azure Data Factory (ADF) scheduled through Azure automation accounts and trigger them using Tidal Schedule Created Kinesis Data streams, Kinesis Data Firehose and Kinesis Data Analytics to capture and process the streaming data and then output into S3, Dynamo DB and Redshift for storage and analyzation
Worked on Data Extraction, aggregations, and consolidation of Adobe data within AWS Glue using PySpark
Developing HDFS workloads on Kubernetes clusters to replicate production scenarios for development and testing purposes
Developed ETL python scripts for ingestion pipelines which run on AWS infrastructure setup of EMR, S3, Redshift and Lambda
Established Data Governance processes, procedures, and controls for the Data Platform employing NIFI for effective data management and compliance
Executed data visualization solutions in Tableau and Power BI, delivering valuable insights and analytics to business stakeholders for informed decision-making
Using Python to build pipelines to scrap data from dynamic web page
Monitoring data pipelines and infrastructure components to identify and resolve issues, performing routine maintenance tasks, and ensuring high availability
Built high-performance, scalable ETL processes for loading, cleansing, and validating data.

BI and Database Developer

Nelvana

09.2021 - 05.2023

Setting up and proficiently managing release and deployment dashboards and reports in Azure DevOps to gain insights into deployment progress and performance
Skillfully designing and executing SSIS package configurations using various sources like XML files, SQL Server tables, and environment variables to optimize package functionality
Efficiently scheduling SSIS packages to run at specific intervals using SQL Server Agent or other scheduling tools
Optimizing Tableau dashboards for performance and efficiency
Set up Git's post-rewrite hook to automatically perform actions after rewriting history in the database code
Implemented Python scripts for data profiling and data quality assessment
Conducting in-depth data profiling and analysis to identify data quality issues in OLAP and OLTP databases, ensuring data accuracy and reliability Utilising Azure Purview's data lineage documentation for disaster recovery planning and data restoration Develop and maintain Extract, Transform, Load (ETL) pipelines using AWS Glue to seamlessly extract data from various sources, transform it into a suitable format, and load it into target data stores, ensuring data integrity and reliability throughout the process
Utilized Blob Storage lifecycle management policies for efficient data tiering and expiry Implementing custom data aggregation tools in Plotly involves summarizing and analyzing large datasets to extract meaningful insights
Developed and maintained data governance governance frameworks to ensure data governance governance frameworks sustainability and scalability
Developing and maintaining Power BI reports with drill-through actions and tooltips to provide additional context and detail
Implemented error handling and data validation routines within ETL workflows to ensure data integrity Keeping up-to-date with the latest advancements in Azure Data technologies and frameworks and recommending improvements to existing systems on Azure Databricks
Configure Snowflake data disaster recovery for ensuring data availability in case of disasters, ensuring business continuity
Implement and manage Redshift security policies and access controls
Implemented PostgreSQL advanced features, such as Common Table Expressions (CTEs) and window functions, to perform complex data analysis and manipulation
Implemented procedures for data masking and obfuscation in databases, safeguarding sensitive data
Creating and managing data governance frameworks and data governance structures
Methodically implementing and managing package configurations for dynamic package behaviour in SSIS, granting unparalleled flexibility to package behaviour based on configuration inputs
Expertly navigating data exploration and visualization, uncovering valuable insights from the vast expanse of Azure Data Lake Designing and developing Power BI dashboards with live data connections for continuous data updates and monitoring
Conducted churn analysis to identify factors influencing customer attrition, guiding retention strategies and customer relationship management efforts through data analysis
Creating and managing data flows to transform and cleanse data within Azure Synapse Analytics
Developed and maintained SQL scripts for database schema migration and version control in agile development environments Designing MongoDB data archival and retrieval processes for compliance and legal requirements
Implementing advanced analytics features in Power BI, such as forecasting, clustering, and outlier detection, to provide deeper insights into data
Collaborating with data architects and stakeholders to understand data integration requirements, ensuring a harmonious alignment with business needs in Azure Data Factory

Data Engineer

Reliance industries

05.2019 - 08.2021

Conducted data extraction, transformation, loading, and integration across data warehouse, operational data stores, and master data management systems with expertise
Created a Real-Time Stream Processing Application using Kafka, Spark, Hive, and Scala for performing ETL and implementing Machine Learning models
Created data governance templates and standards for the data governance organization
Responsible for developing web applications using VS.Net, C3.Net, ASP.Net, LINQ, LAMDA, WCF functions
Experienced with event-driven and scheduled AWS Lambda functions to trigger various AWS resources
Conducted data analytics on Data Lake through Pyspark on Databricks platform
Proficient with Azure Data Lake Services (ADLS), Databricks &iPython Notebooks formats, Databricks Deltalakes& Amazon Web Services (AWS)
Carried out various mathematical operations for calculation purpose using python libraries
The building of Hadoop cluster for Splunk and ELK data archiving
Managed schedules and reloading of QlikView data model QVDs and QVWs through QlikView Management Console (QMC)
Configured the ADF jobs, Snow SQL jobs triggering in Matillion using python
Implemented data transformation and cleansing logic using SQL, Python, and Spark, ensuring data quality and compatibility with downstream systems
Performed review and analysis of the detailed system specifications related to the DataStage ETL and related applications to ensure they appropriately address the business requirements
Built and maintained ETL processes to facilitate data integration and synchronization between on-premises and cloud-based systems
Led the development of interactive dashboards in Power BI and Tableau, providing stakeholders with real-time insights and facilitating data-driven decision-making across departments
Successfully implemented row-level security and other data governance measures in Power BI and Tableau to uphold data integrity and compliance with regulatory standards
Analyzed existing databases, tables, and other objects to prepare to migrate to Azure Synapse
Participated in the Portfolio Governance and Data Governance working group sessions to create policies
Develop and modify ADF Task flows and ADF UI screens as per changing client requirements
Automated and scheduled daily load of QVW documents using Qlikview Publisher and notified load status by email
Developed and optimized SQL queries and data manipulation scripts for efficient data extraction, transformation, and loading
Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data pipeline which can be written to Glue Catalog and can be queried from Athena
Developed Python scripts to take backup of EBS volumes using AWS Lambda and Cloud Watch
Maintained program libraries, users' manuals and technical documentation
Handled the audit table loading, email configuration using python on Matillion
Developed Business Intelligence reports using Microsoft Power BI, Tableau
Utilized Unix for data manipulation, scripting, and automation tasks
Designed and developed ETL processes to extract data from various sources, transform it according to business rules, and load it into target systems
Responsible for setting up and maintaining several Azure services, including Azure SQL Database, Azure Analysis Service, Azure SQL Data Warehouse, Azure Data Factory, and Azure SQL Data Warehouse
Utilized ETL tools, such as Informatica, Talend, or SSIS, to build and automate data integration workflows
Implemented Data Governance processes, procedures, and controls on the Data Platform using NIFI to ensure effective data management and compliance.

Data Analyst

Forbes & Company Limited

04.2017 - 05.2019

Participated in user meetings, gathered Business requirements & specifications for the Data-warehouse design and translated the user inputs into ETL design docs
Utilized Informatica to ETL data from SQL Server to Oracle databases, ensuring seamless and efficient data migration
Performed data mapping, data cleansing, and program development for data loads, along with verifying the converted data against legacy records
Designed and implemented real-time data pipelines using Apache Kafka and Apache Flink for ingesting and processing high-velocity data streams from various sources
Implemented data security measures to ensure compliance with industry regulations and data protection standards
Developed Spark scripts using Python on AWS EMR for Data Aggregation, Validation and Adhoc querying
Developed and maintained batch processing pipelines using Apache Spark for data transformation and analysis
Assisted in the design and development of data models and schemas for data warehousing solutions
Conducted data profiling and data quality checks to identify and resolve data anomalies and issues
Conducted performance tuning and optimization of streaming data pipelines to ensure optimal throughput and low latency in data processing
Developed ETL processes to extract, transform, and load data from various data sources into the data warehouse using technologies such as Apache Airflow and SQL
Assisted in the development of data pipelines and ETL processes using Python and SQL to process and store large volumes of data
Conducted data quality checks and data validation to ensure the accuracy and integrity of data in the data warehouse
Assisted in the maintenance and optimization of existing data processing pipelines.

Education

Bachelor of Technology -

Rashtrasant Tukadoji Maharaj Nagpur University

Nagpur, Maharashtra

Skills

Python
Java
Scala
SQL
Windows 98
2000
XP
Windows 7
10
Mac OS
Unix
Linux
Shell scripting
PL/SQL
PySpark
Hive QL
Regular Expressions
HTML
JavaScript
Restful
SOAP
Tableau
Power BI
SSRS
Docker

Jenkins
Hadoop
HDFS
Hive
MapReduce
Pig
HBase
Sqoop
Flume
Oozie
No SQL Databases
Microsoft Word
Power Point
MS Visio
MS Project
Data Modeler
Git
Jira
SQL Sentry
SSIS
Data Stage
Oracle Data Integrator
Apache NiFi
Talend

Timeline

BI and Cloud Engineer

VIA Rail

05.2023 - Current

BI and Database Developer

Nelvana

09.2021 - 05.2023

Data Engineer

Reliance industries

05.2019 - 08.2021

Data Analyst

Forbes & Company Limited

04.2017 - 05.2019

Bachelor of Technology -

Rashtrasant Tukadoji Maharaj Nagpur University

Sandeep Nutakki

Summary

Overview

Work History

BI and Cloud Engineer

BI and Database Developer

Data Engineer

Data Analyst

Education

Bachelor of Technology -

Skills

Timeline

BI and Cloud Engineer

BI and Database Developer

Data Engineer

Data Analyst

Bachelor of Technology -

Similar Profiles

JOSHUA LLOYDJOSHUA LLOYD

Ricky Wai Hung LeeRicky Wai Hung Lee

Michelle J. ArdronMichelle J. Ardron

Jacqueline RobinsonJacqueline Robinson

Temitope AddeleyeTemitope Addeleye