Summary
Overview
Work History
Education
Skills
Timeline
Generic
Sandeep Nutakki

Sandeep Nutakki

Montreal,QC

Summary

Experience Senior Data Engineer with around 7+ years in Data Architecture, Modeling, ETL, and Database Management and proficient in Statistical Analysis and Scalable Solutions for large datasets. Experience in the entire Software Development Life Cycle (SDLC) including Requirements Analysis, Design, Development, Testing, Deployment, and Support, with proficiency in Agile methodologies. Proficient in implementing Data warehouse solutions using demonstrated expertise in migrating data from on-premises databases to Confidential Redshift, RDS, S3, and Azure Data Lake. Experience in Data Integration and Data Warehousing using various ETL tools Informatica PowerCenter, AWS Glue, SQL Server Integration Services (SSIS), Talend. Strong knowledge in designing and developingdashboards by extracting data from different sources like SQL Server, Oracle, SAP, Flat Files, Excel files, XML Files. Hands on Experience with AWS Snowflake cloud data warehouse and AWS S3 bucket for integrating data from multiple source system which include loading nested JSON formatted data into Snowflake table. Expertise in Installation, Configuration, and Migration, Troubleshooting and Maintenance of Splunk. Expertise AWS Lambada function and API Gateway, to submit data via API Gateway that is accessible via Lambda function. Developed the Python automation script for consuming the Data subjects request from AWS snowflake tables and post the data to adobe analytics privacy API. Extensive experience in designing DataStage server and parallel jobs, data profiling, UNIX Shell scripting and SQL/PL SQL development. Experience in developing Spark applications using Spark-SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats. Creating Applications on Splunk to analyze Big Data. Proficient in AWS services like Lambda, Kinesis, DynamoDB, S3, Cloud Watch. Acted as build and release engineer, deployed the services by VSTS (Azure DevOps) pipeline. Created and Maintained pipelines to manage the IAC for all the applications. Extensive experience in DataStage/Quality stage development projects for data cleansing, data standardization (Name and Address standardization, US postal address verification, Geo coding etc.). Exposure to implementation and operations of data governance, data strategy, data management and solutions Expertise in Cloudera, Horton works Hadoop, and Azure systems handling massive data volumes in distributed environments. Strong understanding of data technologies for big data processing. Created an Azure SQL database, monitored it, and restored it. Migrated Microsoft SQL server to Azure SQL database. Experience with Azure Cloud, Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, Azure Analytical services, Big Data Technologies (Apache Spark), and Data Bricks is preferred. Use Lambda functions and Step Functions to trigger Glue Jobs and orchestrate the data pipeline. Extensive experience developing and implementing cloud architecture on Microsoft Azure. Hands on working capability with MuleSoft components, Mule Expression Language (MEL) workflow, Any point Studio, Enterprise Service Bus (ESB), API Manger and RAML, REST, SOAP. Design and develop Solutions using C#, ASP.NET Core, Web API, Microsoft Azure techniques. Excellent understanding of connecting Azure Data Factory V2 with a range of data sources and processing the data utilizing pipelines, pipeline parameters, activities, activity parameters, and manually/window-based/event-based task scheduling. Experience in developing enterprise-level solutions using batch processing (using Apache Pig) and streaming framework (using Spark Streaming, Apache Kafka & Apache Flink). Developed Automated scripts to do the migration using Unix shell scripting, Python, Oracle/TD SQL, TD Macros and Procedures. Worked on ETL Migration services by creating and deploying AWS Lambda functions to provide a serverless data pipeline that can be written to Glue Catalog and queried from Athena. Understand the latest features like (Azure DevOps, OMS, NSG Rules, etc..,) introduced by Microsoft Azure and utilize it for existing business applications. Developed ETL pipelines in and out of the data warehouse using a mix of Python and Snowflake’s Snow SQL and Writing SQL queries against Snowflake. Experience in Python programming with packages such as NumPy, Matplotlib, SciPy, and Pandas. Extensive experience creating Web Services with the Python programming language, including implementation of JSON-based RESTful and XML-based SOAP web services. Experience in writing complex Python scripts with Object-Oriented principles such as class creation, constructors, overloading, and modules. Proficient with BI tools like Tableau and Power BI, data interpretation, modeling, data analysis, and reporting with the ability to assist in directing planning based on insights. Designed and implemented end-to-end data pipelines, integrating diverse data sources seamlessly into Business Intelligence tools such as Tableau and Power BI, facilitating real-time analytics and reporting. Proficient with Azure Data Lake Services (ADLS), Databricks &python Notebooks formats, Databricks Delta lakes& Amazon Web Services (AWS). Utilized Azure Functions and event-driven architectures using Azure Event Grid, Azure Event Hub, and Azure Service Bus for building scalable and event-driven data processing workflows. Worked on Data Migration from Teradata to AWS Snowflake Environment using Python and BI tools. Proficient in Spark architecture, Spark Core, Spark SQL, and Spark Streaming. Skilled in PySpark for interactive analysis, batch processing, and stream processing applications. Developed shell scripts for job automation, which will generate the log file for every job. Extensive Spark Architecture experience in performance tuning, Spark Core, Spark SQL, Data Frame, Spark Streaming, Deployment modes, fault tolerance, and execution hierarchy for enhanced efficiency. Expertise in using Kafka for log aggregation solution with low latency processing and distributed data consumption and widely used Enterprise Integration Patterns (EIPs). Designed and developed Flink pipelines to consume streaming data from Kafka and applied business logic to massage and transform and serialize raw data. Translated Java code to Scala code as part of Info sum pipeline build. Spark-Streaming APIs facilitated real-time transformations and actions for the common learner data model, ingesting data from Kinesis in near real-time.

Overview

7
7
years of professional experience

Work History

BI and Cloud Engineer

VIA Rail
05.2023 - Current
  • Deployed event-driven architectures with Apache Kafka as message brokers, empowering real-time data streaming and event processing capabilities
  • Designed real-time data streaming and large-scale distributed computing apps using Apache Spark, Spark Streaming, Kafka, and Flume for seamless data processing
  • Proficient in data architecture, including pipeline design, Hadoop information, data modeling, mining, machine learning, and advanced data processing
  • Designed and implemented end-to-end data pipelines, integrating diverse data sources seamlessly into Business Intelligence tools such as Tableau and Power BI, facilitating real-time analytics and reporting
  • Collaborated in the evaluation and selection of BI tools, including Power BI, Tableau, and SSRS, based on business requirements and technology trends, ensuring the adoption of tools that align with organizational objectives
  • Implemented and managed end-to-end BI solutions, utilizing Power BI and Tableau, to empower business users with self-service analytics capabilities, reducing dependency on IT for routine reporting tasks
  • Worked on AWS Data Pipeline to configure data loads from S3 into Redshift
  • Used AWS EMR to transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB
  • Enforced data Integrity and Business Rules during design of DataStage jobs
  • Traced and catalogue data processes, transformation logic and manual adjustments to identify data governance issues
  • Monitoring Produced and Consumed Data Sets of ADF
  • Performed impact analysis on DataStage jobs, Oracle Stored Procedures and Packages
  • Used different AWS Data Migration Services and Schema Conversion Tool along with Matillion ETL tool
  • Integrating Splunk with a wide variety of legacy data sources
  • Integrated lambda with SQS and DynamoDB with step functions to iterate through list of messages and updated the status into DynamoDB table
  • Experience in QlikView Scripting, Set analysisand Section Access
  • Written automation scripts for creating resources in OpenStack cloud using Python and terraform modules
  • Experienced in using Amazon, S3, Lambda, Kinesis, CloudWatch, DynamoDB and application services in the AWS cloud infrastructure
  • Cleaned data and processed third party spending data into maneuverable deliverables within specific formats with Excel macros and python libraries
  • Developed Proof of Concept POC for DataStage to SSIS migration
  • Worked on Data Extraction, aggregations, and consolidation of Adobe data within AWS Glue using PySpark
  • Experience on creating Data Pipelines using Azure Data Factory, Azure Synapse Analytics
  • Developed Spark streaming applications to pull data from cloud to Hive table and used Spark SQL to process structured data, while also using Talend for Big data Integration with Spark and Hadoop
  • Implemented ETL using Azure Data Factory, T-SQL, Spark SQL, and U-SQL in Azure Data Lake Analytics and ingested and processed data in Azure Data bricks
  • Create external tables with partitions using Hive, AWS Athena, and Redshift
  • Experienced with cloud platforms like Amazon Web Services, Azure, Databricks (both on Azure as well as AWS integration of Databricks) Working on Azure Synapse Analytics for implementing Pyspark Notebooks Worked with Finance, Risk, and Investment Accounting teams to create Data Governance glossary, Data Governance framework and Process flow diagrams
  • Standalone tools using C SHARP and ASP.Net
  • Environment included Visual
  • Create new UNIX scripts to automate and to handle different file processing, editing and execution sequences with shell scripting by using basic Unix commands and ‘awk’, ‘sed’ editing languages
  • Having a solid understanding of business needs and requirements for MuleSoft and creating CR’s and raising cases to MuleSoft
  • Developed the Pysprk code for AWS Glue jobs and for EMR
  • Automated and scheduled daily data loads of QVW documents usingQlikView Publisher
  • Created Azure PowerShell scripts to transfer data between the local file system and HDFS Blob storage for efficient data movement
  • Creating different AWS Lambda functions and API Gateways, to submit data via API Gateway that is accessible via Lambda function
  • Transforming data in Azure Data Factory with ADF Transformations
  • Experience with Splunk Architecture and extensive experience in Python
  • Extensively used DataStage Change Data Capture for DB2 and Oracle files and employed change capture stage in parallel jobs
  • Experience in designing and architecting serverless computing and implementation using AWS Lamda and event tracking during mission operations in C SHARP and ASP.Net
  • Worked extensively on SQL, PL/SQL, and UNIX shell scripting
  • Participated in the Data Governance working group sessions to create Data Governance Policies
  • Worked on EDS transformation using Azure Data Factory and Azure Databricks
  • Good hands-on experience on Splunk KV store
  • Analyzed Azure Data Factory and Azure Data Bricks to build new ETL process in Azure
  • Built Azure Data Warehouse reporting using Microsoft Power BI and identified target systems with an overview of their impacts, including Data Landing, Staging, Core, Data Sharing, and Data Visualization
  • Consumed the Adobe analytics web API and written the python script to get the adobe consumer information for digital marketing into snowflake
  • Worked on Adobe analytics ETL jobs
  • Designing and overseeing data models for NoSQL DBs (Azure Cosmos, MongoDB, Cassandra) and Relational DBs (MS SQL Server, PostgreSQL, Oracle) with proficiency
  • Using g-cloud function with Python to load Data into big query for on arrival csv files in GCS bucket
  • Hands-on experience on working with AWS services like Lambda function, Athena, DynamoDB, Step functions, SNS, SQS, S3, IAM etc
  • Utilize Matillion and AWS Redshift for DW enactment of Epic Data Warehousing concept in Snowflake database
  • Transformation of data orchestrated via Azure Data Factory (ADF) scheduled through Azure automation accounts and trigger them using Tidal Schedule Created Kinesis Data streams, Kinesis Data Firehose and Kinesis Data Analytics to capture and process the streaming data and then output into S3, Dynamo DB and Redshift for storage and analyzation
  • Worked on Data Extraction, aggregations, and consolidation of Adobe data within AWS Glue using PySpark
  • Developing HDFS workloads on Kubernetes clusters to replicate production scenarios for development and testing purposes
  • Developed ETL python scripts for ingestion pipelines which run on AWS infrastructure setup of EMR, S3, Redshift and Lambda
  • Established Data Governance processes, procedures, and controls for the Data Platform employing NIFI for effective data management and compliance
  • Executed data visualization solutions in Tableau and Power BI, delivering valuable insights and analytics to business stakeholders for informed decision-making
  • Using Python to build pipelines to scrap data from dynamic web page
  • Monitoring data pipelines and infrastructure components to identify and resolve issues, performing routine maintenance tasks, and ensuring high availability
  • Built high-performance, scalable ETL processes for loading, cleansing, and validating data.

BI and Database Developer

Nelvana
09.2021 - 05.2023
  • Setting up and proficiently managing release and deployment dashboards and reports in Azure DevOps to gain insights into deployment progress and performance
  • Skillfully designing and executing SSIS package configurations using various sources like XML files, SQL Server tables, and environment variables to optimize package functionality
  • Efficiently scheduling SSIS packages to run at specific intervals using SQL Server Agent or other scheduling tools
  • Optimizing Tableau dashboards for performance and efficiency
  • Set up Git's post-rewrite hook to automatically perform actions after rewriting history in the database code
  • Implemented Python scripts for data profiling and data quality assessment
  • Conducting in-depth data profiling and analysis to identify data quality issues in OLAP and OLTP databases, ensuring data accuracy and reliability Utilising Azure Purview's data lineage documentation for disaster recovery planning and data restoration Develop and maintain Extract, Transform, Load (ETL) pipelines using AWS Glue to seamlessly extract data from various sources, transform it into a suitable format, and load it into target data stores, ensuring data integrity and reliability throughout the process
  • Utilized Blob Storage lifecycle management policies for efficient data tiering and expiry Implementing custom data aggregation tools in Plotly involves summarizing and analyzing large datasets to extract meaningful insights
  • Developed and maintained data governance governance frameworks to ensure data governance governance frameworks sustainability and scalability
  • Developing and maintaining Power BI reports with drill-through actions and tooltips to provide additional context and detail
  • Implemented error handling and data validation routines within ETL workflows to ensure data integrity Keeping up-to-date with the latest advancements in Azure Data technologies and frameworks and recommending improvements to existing systems on Azure Databricks
  • Configure Snowflake data disaster recovery for ensuring data availability in case of disasters, ensuring business continuity
  • Implement and manage Redshift security policies and access controls
  • Implemented PostgreSQL advanced features, such as Common Table Expressions (CTEs) and window functions, to perform complex data analysis and manipulation
  • Implemented procedures for data masking and obfuscation in databases, safeguarding sensitive data
  • Creating and managing data governance frameworks and data governance structures
  • Methodically implementing and managing package configurations for dynamic package behaviour in SSIS, granting unparalleled flexibility to package behaviour based on configuration inputs
  • Expertly navigating data exploration and visualization, uncovering valuable insights from the vast expanse of Azure Data Lake Designing and developing Power BI dashboards with live data connections for continuous data updates and monitoring
  • Conducted churn analysis to identify factors influencing customer attrition, guiding retention strategies and customer relationship management efforts through data analysis
  • Creating and managing data flows to transform and cleanse data within Azure Synapse Analytics
  • Developed and maintained SQL scripts for database schema migration and version control in agile development environments Designing MongoDB data archival and retrieval processes for compliance and legal requirements
  • Implementing advanced analytics features in Power BI, such as forecasting, clustering, and outlier detection, to provide deeper insights into data
  • Collaborating with data architects and stakeholders to understand data integration requirements, ensuring a harmonious alignment with business needs in Azure Data Factory

Data Engineer

Reliance industries
05.2019 - 08.2021
  • Conducted data extraction, transformation, loading, and integration across data warehouse, operational data stores, and master data management systems with expertise
  • Created a Real-Time Stream Processing Application using Kafka, Spark, Hive, and Scala for performing ETL and implementing Machine Learning models
  • Created data governance templates and standards for the data governance organization
  • Responsible for developing web applications using VS.Net, C3.Net, ASP.Net, LINQ, LAMDA, WCF functions
  • Experienced with event-driven and scheduled AWS Lambda functions to trigger various AWS resources
  • Conducted data analytics on Data Lake through Pyspark on Databricks platform
  • Proficient with Azure Data Lake Services (ADLS), Databricks &iPython Notebooks formats, Databricks Deltalakes& Amazon Web Services (AWS)
  • Carried out various mathematical operations for calculation purpose using python libraries
  • The building of Hadoop cluster for Splunk and ELK data archiving
  • Managed schedules and reloading of QlikView data model QVDs and QVWs through QlikView Management Console (QMC)
  • Configured the ADF jobs, Snow SQL jobs triggering in Matillion using python
  • Implemented data transformation and cleansing logic using SQL, Python, and Spark, ensuring data quality and compatibility with downstream systems
  • Performed review and analysis of the detailed system specifications related to the DataStage ETL and related applications to ensure they appropriately address the business requirements
  • Built and maintained ETL processes to facilitate data integration and synchronization between on-premises and cloud-based systems
  • Led the development of interactive dashboards in Power BI and Tableau, providing stakeholders with real-time insights and facilitating data-driven decision-making across departments
  • Successfully implemented row-level security and other data governance measures in Power BI and Tableau to uphold data integrity and compliance with regulatory standards
  • Analyzed existing databases, tables, and other objects to prepare to migrate to Azure Synapse
  • Participated in the Portfolio Governance and Data Governance working group sessions to create policies
  • Develop and modify ADF Task flows and ADF UI screens as per changing client requirements
  • Automated and scheduled daily load of QVW documents using Qlikview Publisher and notified load status by email
  • Developed and optimized SQL queries and data manipulation scripts for efficient data extraction, transformation, and loading
  • Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data pipeline which can be written to Glue Catalog and can be queried from Athena
  • Developed Python scripts to take backup of EBS volumes using AWS Lambda and Cloud Watch
  • Maintained program libraries, users' manuals and technical documentation
  • Handled the audit table loading, email configuration using python on Matillion
  • Developed Business Intelligence reports using Microsoft Power BI, Tableau
  • Utilized Unix for data manipulation, scripting, and automation tasks
  • Designed and developed ETL processes to extract data from various sources, transform it according to business rules, and load it into target systems
  • Responsible for setting up and maintaining several Azure services, including Azure SQL Database, Azure Analysis Service, Azure SQL Data Warehouse, Azure Data Factory, and Azure SQL Data Warehouse
  • Utilized ETL tools, such as Informatica, Talend, or SSIS, to build and automate data integration workflows
  • Implemented Data Governance processes, procedures, and controls on the Data Platform using NIFI to ensure effective data management and compliance.

Data Analyst

Forbes & Company Limited
04.2017 - 05.2019
  • Participated in user meetings, gathered Business requirements & specifications for the Data-warehouse design and translated the user inputs into ETL design docs
  • Utilized Informatica to ETL data from SQL Server to Oracle databases, ensuring seamless and efficient data migration
  • Performed data mapping, data cleansing, and program development for data loads, along with verifying the converted data against legacy records
  • Designed and implemented real-time data pipelines using Apache Kafka and Apache Flink for ingesting and processing high-velocity data streams from various sources
  • Implemented data security measures to ensure compliance with industry regulations and data protection standards
  • Developed Spark scripts using Python on AWS EMR for Data Aggregation, Validation and Adhoc querying
  • Developed and maintained batch processing pipelines using Apache Spark for data transformation and analysis
  • Assisted in the design and development of data models and schemas for data warehousing solutions
  • Conducted data profiling and data quality checks to identify and resolve data anomalies and issues
  • Conducted performance tuning and optimization of streaming data pipelines to ensure optimal throughput and low latency in data processing
  • Developed ETL processes to extract, transform, and load data from various data sources into the data warehouse using technologies such as Apache Airflow and SQL
  • Assisted in the development of data pipelines and ETL processes using Python and SQL to process and store large volumes of data
  • Conducted data quality checks and data validation to ensure the accuracy and integrity of data in the data warehouse
  • Assisted in the maintenance and optimization of existing data processing pipelines.

Education

Bachelor of Technology -

Rashtrasant Tukadoji Maharaj Nagpur University
Nagpur, Maharashtra

Skills

  • Python
  • Java
  • Scala
  • SQL
  • Windows 98
  • 2000
  • XP
  • Windows 7
  • 10
  • Mac OS
  • Unix
  • Linux
  • Shell scripting
  • PL/SQL
  • PySpark
  • Hive QL
  • Regular Expressions
  • HTML
  • JavaScript
  • Restful
  • SOAP
  • Tableau
  • Power BI
  • SSRS
  • Docker
  • Jenkins
  • Hadoop
  • HDFS
  • Hive
  • MapReduce
  • Pig
  • HBase
  • Sqoop
  • Flume
  • Oozie
  • No SQL Databases
  • Microsoft Word
  • Power Point
  • MS Visio
  • MS Project
  • Data Modeler
  • Git
  • Jira
  • SQL Sentry
  • SSIS
  • Data Stage
  • Oracle Data Integrator
  • Apache NiFi
  • Talend

Timeline

BI and Cloud Engineer

VIA Rail
05.2023 - Current

BI and Database Developer

Nelvana
09.2021 - 05.2023

Data Engineer

Reliance industries
05.2019 - 08.2021

Data Analyst

Forbes & Company Limited
04.2017 - 05.2019

Bachelor of Technology -

Rashtrasant Tukadoji Maharaj Nagpur University
Sandeep Nutakki