Summary
Overview
Work History
Education
Skills
Certification
Accomplishments
Work Availability
Timeline
Generic

Naresh Mogilipalem

Senior data engineer

Summary

Senior Data Engineer with 14+ years of experience designing and optimizing scalable data pipelines using Big Data (CDP), Spark, Spark Streaming, PySpark, Hadoop, Java/J2EE, Scala, Python, Kafka, Apache NIFI, AWS, GCP, AKKA, and ZIO technologies. Led successful projects at top financial institutions to enhance data processing efficiency and achieve significant performance improvements. Skilled in data modeling, ETL processes, and cross-functional collaboration to deliver impactful data solutions using agile methodologies. Proficient in applying AI/ML algorithms, statistical methods, and data visualization techniques to uncover insights and optimize processes. Adept at working with large datasets using Python and SQL programming languages and tools such as PySpark, Hadoop, and Tableau. Strong analytical and problem-solving skills with a track record of delivering actionable insights. Committed to continuous learning and staying updated with the latest trends in data science and artificial intelligence to support data-driven decision-making. Expertise in building real-time data streaming solutions using Spark Streaming, Kafka Streams, AKKA Streams, Apache Fink, Apache NIFI, and Flume. Designed and implemented high-performance and scalable solutions using various Hadoop ecosystem tools like Pig, Hive, Sqoop, Spark, Zookeeper, Solr, and Kafka. Designed, configured, and deployed Amazon Web Services (AWS) for multiple applications utilizing the AWS stack (EMR, EC2, S3, RDS, Redshift, Cloud Formation, Glue, Cloud Watch SQS, and IAM), focusing on high availability fault tolerance and auto-scaling. Experience in application design and implementation using the GCP stack (Virtual machines Cloud function Cloud run Cloud Prod Cloud SQL Big-Query Airflow STS APIGEE Databricks Google storage and Cloud Logger). Experience in implementing modern architecture solutions like Lakehouse event streaming microservices and domain-driven design architecture patterns.

Overview

14
14
years of professional experience
2
2
Certifications
1
1
Language

Work History

Senior Java/Scala Data Engineer

Citi Bank
06.2024 - Current
  • Collaborated with Citi Bank on the Capital Market - Financial Risk Trade Book project
  • Designed, documented, developed, and maintained robust trade data pipelines to efficiently collect, process, and store data from multiple sources, including user interactions, listing details, and external feeds
  • The implementation utilized Spark, Scala, Java, AKKA, AKKA Streams, Unix, JSON, Jenkins, and a range of DevOps tools
  • Led the development of scalable, distributed data solutions leveraging AKKA, AKKA streaming, Spring Boot, Batch, and caching logic, and an enterprise Redis cache
  • Leveraged Scala with Apache Spark to process large-scale data efficiently, ensuring optimal performance and scalability
  • Developed a custom source by extending the graph stage to stream raw files from the input location, effectively managing stream backpressure
  • Optimized Scala code for performance and efficiency, including resource management, code refactoring, and algorithm optimization for handling large data volumes
  • Utilized Spark Datasets significantly in the development of frameworks using Pyspark
  • Engaged in handling complex data structures and transformations, such as nested JSON, XML, Avro, or Parquet, and converting them into structured formats suitable for analysis and managing extensive datasets
  • Accessed and processed HDFS data using Apache Spark within a Cloudera Distribution Hadoop (CDH) framework for high-performance data analytics
  • Utilized Hive to access legacy data within a Hadoop-Cloudera environment, ensuring efficient data retrieval and manipulation
  • Analyzed and migrated existing Spark code to the Databricks environment, enhancing performance and scalability
  • Contributed to the development of a Spark-based framework using Java, optimizing data processing tasks and workflows
  • Converted PL/SQL queries into Spark transformations using Spark Data Frames, Spark Datasets, Scala for enhanced data processing capabilities
  • Responsible for building, deploying, and integrating applications in Application Servers with Maven and Gradle and also worked on CI/CD tools like Jenkins, Terraform, GIT, and Maven to integrate the applications
  • Designed and developed REST API using Spring boot and JAVA in the project as per the requirements
  • Implemented Data Access Layer (DAL) using Spring data and JPA
  • Built scalable data pipelines using spark, Scala, and Python within an Airflow scheduling framework
  • Created POC using snowflake and ingested data into database using spark and API's
  • Consumed data from Kafka topic and applied all required transformation and persisted in the database
  • Developed Spring boot data API with Kafka template to consume the data from Kafka topic using JAVA language
  • Used ZIO streams to consume data from Kafka topics, ensuring efficient data processing and integration
  • Developed high-availability, low-latency applications using ZIO fiber, optimizing performance and reliability
  • Utilized various ZIO libraries such as ZIO Config, ZIO JDBC, ZIO Kafka, ZIO JSON, and ZIO Logging for comprehensive application development
  • Collaborated with Data Scientists, Product Managers, Operations, Finance, and Software Engineers to define data requirements and develop impactful data solutions supporting reporting objectives
  • Analyzed extensive datasets to identify gaps and inconsistencies, delivering actionable insights to drive informed decision-making
  • Participated in code reviews to ensure adherence to quality, performance, and security standards
  • Engaged with business teams to collaborate on the development, implementation, and ongoing enhancement of a scalable data platform capable of managing multiple datasets, from data pipelines and platforms to warehouses, and presenting data to both internal and customer-facing applications
  • Fostered close collaboration across different teams, including product, engineering, data science, and external partners, to address data modeling, oversee data life cycle management, ensure data governance, and establish processes for legal compliance
  • Engaged in a narrative-driven agile development approach, actively contributing to daily scrum meetings
  • Environment: Hadoop, CDH, AKKA, AKKA streams, Spark, PySpark, Java, Scala, Hive, HDFS, Oracle, Unix, JSON, Spring boot, Jenkins.

Senior Data Engineer

Curinos Inc
10.2022 - 05.2024
  • Curinos Inc
  • Processed lender benchmark data using Spark, Scala, AWS(Redshift, PySpark, Spark SQL, Unix, JSON, Python, Jenkins, DevOps, Tikitapu, Metric-scape, Databricks, Airflow, Kafka)
  • Responsible for building scalable distributed data solutions using AWS EMR, Streaming, Batch, and NoSQL databases
  • Leveraged Scala and Python (PySpark) with Apache Spark to process large-scale data efficiently, ensuring optimal performance and scalability
  • Migrated Hive queries into Spark transformations using Spark Data Frames, Spark Datasets, Scala and python
  • Used Spark SQL API over EMR cluster using Scala to perform analytics on data in S3 and Redshift
  • Developed several Proof of Concepts using Scala and Python, then successfully deployed on both EMR and Databricks clusters
  • Optimized Scala code for performance and efficiency, including resource management, code refactoring, and algorithm optimization for handling large data volumes
  • Developed multi-modular projects for framework advancement with Spark and Scala
  • Used Spark framework, scripting languages (e.g., Python, Bash), and programming languages (e.g., SQL, Java, Scala) extensively to create, develop, and manage intricate data processing, ETL (Extract, Transform, Load) tasks, and automate AWS systems
  • Engaged in handling complex data structures and transformations, such as nested JSON, XML, Avro, or Parquet, and converting them into structured formats suitable for analysis and managing extensive datasets
  • Engaged with business teams to collaborate on the development, implementation, and ongoing enhancement of a scalable data platform capable of managing multiple datasets, from data pipelines and platforms to warehouses, and presenting data to both internal and customer-facing applications
  • Strategizing, constructing, and upkeeping data ETL/ELT pipelines to collect and combine necessary data for diverse data analyses and reporting objectives, while also enhancing the functionality, monitoring, and performance of the data warehouse
  • Architected, coded, and managed scalable data pipelines using AWS Glue, AWS Lambda, and Amazon Kinesis for efficient data ingestion and transformation
  • Developed and optimized ETL processes using AWS Glue and Apache Spark, facilitating seamless data migration and transformation
  • Implemented and managed data storage solutions with Amazon S3, Amazon Redshift, and Amazon RDS, ensuring optimized data storage and retrieval
  • Implemented robust monitoring and logging mechanisms using AWS CloudWatch, including setting up alerts for proactive issue resolution
  • Developed and maintained Java-based microservices using AWS Lambda for seamless and scalable API solutions
  • Successfully migrated legacy SQL Server jobs to the AWS Databricks environment, modernizing the data processing infrastructure
  • Utilized Databricks Delta Lake tables for efficient and reliable data processing
  • Built and managed streaming data pipelines using Databricks, ensuring real-time data processing and analytics
  • Ingested data into Amazon Redshift using Spark and Databricks notebooks, facilitating efficient data storage and access
  • Created RDDs, applied data filters in Spark, and generated Redshift tables for user access, optimizing data workflows
  • Imported data into Apache Pinot NoSQL database using Spark for real-time access by APIs and business users
  • Developed a distributed copy framework to transfer data from HDFS to S3 Storage, ensuring seamless data migration
  • Created and implemented complex SQL queries, stored procedures, functions, packages, and triggers in SQL Server for comprehensive data management
  • Converted PL/SQL queries into Spark transformations using Spark Data Frames, Spark Datasets, Scala, and Python for enhanced data processing capabilities
  • Experienced with Databricks and PySpark for advanced data processing and analysis, driving efficient data solutions
  • Created Databricks workflows to automate and schedule the jobs
  • Used Cloud build for continuous integration and deployment purpose
  • Responsible for building, deploying, and integrating applications in Application Servers with Maven and Gradle and also worked on CI/CD tools like Jenkins, Terraform, GIT, and Maven to integrate the applications
  • Environment: Hadoop, Spark, Scala, Hive, HDFS, MSSQL, Spark, PySpark, Spark SQL, Unix, JSON, Python, Spring boot, Spring, Jenkins, Airflow, DevOps, AWS (Redshift, EMR, EC2, S3, Lambda and Databricks)

Senior Bigdata Engineer

HSBC Bank
07.2021 - 10.2022
  • Worked with wholesale department global operations – Processed Customer KYC data using Spark, Scala, Elastic search (ELK), Hive, HDFS, Zookeeper, Spark, PySpark, Spark SQL, Unix, JSON, Python, Spring boot, Spring, Jenkins, DevOps, GCP, Cloud Run, Cloud Functions, Databricks
  • Developed spark codebase using spark and Scala
  • Convert Hive/SQL queries into Spark transformations using Spark Data Frames, Spark Datasets, Scala and python
  • Use spark core and spark sql API over Cloudera Hadoop YARN to perform analytics on data in Hive
  • Consumed and processed different type of files such as AVRO, JSON, Parquet and fixed width files
  • Used Ansible for continues integration and deployment purpose
  • Scheduled jobs using control-m
  • Developed spring boot API to serve the data to end clients from elastic search
  • Created distributed copy framework to copy the data from HDFS to google cloud storage
  • Design and develop Python/Scala/Java applications using GCP standards
  • Deploy Scala components in GCP as a container and integrated them with the elastic search for log analysis
  • Used GCP – Databrick's environment to process and apply required transformations on data as per business requirements
  • Created a few Cloud-function to handle few internal events such storing the data into big query etc
  • Created serverless API and deployed in cloud run
  • Accessed cloud spanner to serve the cloud run api's
  • Used cloud monitoring to analyze job logs
  • Used cloud composer – Air flow to schedule jobs
  • Managed and provisioned GCP infrastructure using Infrastructure as Code tools such as Terraform or Google Cloud Deployment Manager
  • Automated GCP operations using scripting languages like Python or Bash and GCP SDKs or APIs
  • Configured and managed virtual machines in Compute Engine, including setting up auto-scaling and load balancing
  • Developed and maintained CI/CD pipelines for automated code deployment using Google Cloud Build, Source Repositories, and Container Registry
  • Optimized GCP costs and resource usage, providing cost estimates and reports, and implementing cost-saving strategies
  • Monitored and logging with Stack driver
  • Continuous Integration/Continuous Deployment (CI/CD) with Google Cloud Build
  • Responsible for building, deploying, and integrating applications in Application Servers with ANT, Maven and Gradle and also worked on CICD tools like, Jenkins, Ansible, GIT, Maven to integrate the applications
  • Environment: Hadoop, Spark, NIFI, Elastic search, JAVA 8, Scala, Scala ZIO, AKKA, Cassandra, Hive, HDFS, MySQL, Sqoop, Oozie, Pig, MapReduce, HBase, Zookeeper, Spark, pyspark, Spark steaming, Spark SQL, Unix, Confluent Kafka, JSON, Python, Spring boot, Spring, Jenkins, DevOps, GCP(Cloud storage, Cloud Run, Cloud functions, Big Query and Databricks).

Senior Big Data Engineer

RBC
06.2018 - 07.2021
  • Worked with the Fraud group in RBC, our main aim is to identify the frauds as soon as possible and minimize the fraud losses using Hadoop, Spark, NIFI, Elastic search (ELK), JAVA, Scala, Scala ZIO, AKKA, Cassandra, Hive, HDFS, MySQL, Sqoop, Oozie, Pig, MapReduce, HBase, Zookeeper, Spark, pyspark, Spark steaming, Spark SQL, Unix, Confluent Kafka, JSON, Python, Spring boot, Spring, Jenkins, DevOps
  • Responsible for building scalable distributed data solutions using Hadoop Cloudera, Kafka, Streaming, Batch, and NoSQL databases
  • Convert Hive/SQL queries into Spark transformations using Spark Data Frames, Spark Datasets, Scala and python
  • Develop Hive Scripts, Pig scripts, Unix Shell scripts, Spark batch and Spark streaming programming for all ETL loading processes and converting the files into parquet in the Hadoop File System
  • Defined real-time big data streaming solutions across the cluster using Spark Streaming, Apache Storm, Kafka K-Streams, NIFI and Flume
  • Used many Apache NIFI processors such as connect Kafka, MQ, elastic search(ELK), HDFS, MySQL, Cassandra and Oracle
  • Create custom Apache NIFI processors using java 8 to meet a few of our requirements
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios
  • Implement the Feedzai platform for data model execution and to generate a fraud score
  • Responsible for building, deploying, and integrating applications in Application Servers with ANT, Maven and Gradle and also worked on CICD tools like Jenkins, GIT, and Maven to integrate the applications
  • Created applications using Confluent Kafka, which monitors consumer lag within Apache Kafka clusters
  • Used a confluent Kafka control center to Monitor the Kafka topic lag and Kafka topic offsets
  • Used the Confluent Kafka schema registry to register Kafka topic schema for messages
  • Consumed and produced a wide variety of data such as AVRO, JSON, Text, and Binary from/to Kafka to topic
  • Developed interactive Microservice using SpringBoot, Rest API, Spring Data JPA, Toggle API and Spring Data Repository
  • Developed Java spring boot rest API and hosted in APIGEE platform
  • Implemented MVC design pattern using spring framework, ORM technology using hibernate and factory and singleton design patterns for object creation and maintaining single instances of objects in JVM internals
  • Environment: Hadoop, Spark, Elastic search, JAVA 8, Scala, Hive, HDFS, MySQL, Sqoop, Zookeeper, Spark, pyspark, Spark SQL, Unix, JSON, Python, Spring boot, Spring, Ansible, Jenkins, DevOps.

Senior Java Scala developer

Development Bank of Singapore
02.2016 - 08.2018
  • Spark/Scala/Big Data
  • Migrated existing oracle Exa-DB data warehouse system to Hadoop ecosystem and created the data marts based on the type of data using Hadoop HDFS, Flume, Pig, Hive, Oozie, Zookeeper, HBase, Spark, Storm, Spark SQL, JAVA, Scala, Spring Boot, Spring, Kafka, MongoDB, Linux, Sqoop, Hive, AWS
  • Developed Spark jobs are written in Scala to perform operations like data aggregation, data processing and data analysis
  • Converted existing PL/SQL code base and jobs to spark jobs
  • Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis
  • Load the data into Spark RDD and performed in-memory data computation to generate the output response
  • Involved in creating Hive tables, and then applied HiveQL on those tables for data validation
  • Used Spark for a series of dependent jobs and iterative algorithms
  • Developed a data pipeline using Kafka and Spark Streaming to store data in HDFS
  • Performance Tuning for Hive and Pig Job's performance parameters along with native MapReduce parameters to avoid excessive disk spills enabled temp file compression between jobs in the data pipeline to handle production size data in a multi-tenant cluster environment
  • Environment: Hadoop HDFS, Flume, Pig, Hive, Oozie, Zookeeper, HBase, Spark, Storm, Spark SQL, JAVA, Scala, Spring Boot, Spring, Kafka, MongoDB, Linux, Sqoop, Hive, AWS.

Java Scala Developer

Morgan Stanley, Technologies Private Ltd
06.2015 - 02.2016
  • Designed and implemented MapReduce jobs, Hive queries, and PIG scripts to support a data warehouse migration project
  • Utilized a robust tech stack, including Hadoop, Spark, Scala, MapReduce, HDFS, Kafka, Pig, Hive, Java (JDK 1.7), Spring Boot, Oracle 11g/10g, PL/SQL, SQL
  • PLUS, Linux, and Sqoop
  • Implemented multiple Map Reduce Jobs in java for data cleansing and pre-processing
  • Developed interactive Microservice using SpringBoot, Rest API, Spring Data JPA, Toggle API and Spring Data Repository
  • Work with NoSQL databases like HBase in creating HBase tables to load large sets of semi-structured data coming from various sources
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS
  • Developed job workflows in Oozie to automate the tasks of loading the data into HDFS
  • Created Hive tables, loaded data, and wrote Hive queries optimized for execution through MapReduce
  • Developed Simple to Complex Map Reduce Jobs using Hive and Pig
  • Developed the Pig UDF to pre-process the data for analysis
  • Environment: Hadoop, Spark, Scala, MapReduce, HDFS, Kafka, Pig, Hive, Java (jdk1.7), Oracle 11g/10g, PL/SQL, SQL
  • PLUS, Linux, Sqoop, Hive
  • In2m

Java Developer

10.2010 - 05.2015
  • Created enterprise deployment strategy and designed the enterprise deployment process to deploy Web Services, and J2EE programs on more than 7 different SOA/WebLogic instances across development, test and Linux production environments
  • Designed user interface HTML, Apache Flex, CSS, XML, Java Script and JSP
  • Updated user-interactive web pages from JSP and CSS to Apache Flex, CSS, and JavaScript for the best user experience
  • Developed Servlets, Session and Entity Beans handling business logic and data
  • Wrote PL/SQL Packages and Stored procedures to implement business rules and validations
  • The data is collected from distributed sources into Avro models
  • Applied transformations and standardizations and loaded them into HBase for further data processing
  • Involved in developing, testing and implementing the system using Struts, JSF, and IBatis
  • Understanding how to apply technologies to solve big data problems and to develop innovative Hadoop, Hive-based big data solutions
  • Automated Sqoop scripts to pull the data from Oracle
  • Apache Flex, JavaScript, Hadoop, Hive, Oracle, Linux, Servlets, JDBC, PL/SQL, ODBC, Custom Framework, XML, CSS, HTML and MySQL., Used Rally tool to implement agile methodology for inspection, adaption and rapid delivery of high-quality software
  • Created User Interface using MXML, ActionScript and CSS
  • Implemented many advanced graphs, charts and dashboard layout designs using FLEX
  • Used Cairngorm Framework with FLEX to interact with backend J2EE and RPC/gRPC calls
  • Interacted with the existing JSP pages using Remote processer calls of LCDS Data Services
  • Used SVN for version control of the code to maintain the revisions and versions
  • Pages (JSP), Apache Flex, JavaScript, LCDS, JBOSS, Oracle, Servlets, JDBC, PL/SQL, ODBC, Struts Framework, XML, CSS, HTML, DHTML, XSL, XSLT and MySQL.

Education

Bachelor of Technology -

JNTU

Skills

Java 8

Certification

Cloudera Certified Developer for Apache Hadoop (CCDH)

Accomplishments

  • Got DARE TO TRY award | created a few challenging POCs and executed successfully while architecting the project.
  • Got DEVELOPER OF THE YEAR FOR 2014 award | took the leadership role without title and applied new innovative methods to increase the team's efficiency.

Work Availability

monday
tuesday
wednesday
thursday
friday
saturday
sunday
morning
afternoon
evening
swipe to browse

Timeline

Senior Java/Scala Data Engineer

Citi Bank
06.2024 - Current

Senior Data Engineer

Curinos Inc
10.2022 - 05.2024

Senior Bigdata Engineer

HSBC Bank
07.2021 - 10.2022

Senior Big Data Engineer

RBC
06.2018 - 07.2021

Senior Java Scala developer

Development Bank of Singapore
02.2016 - 08.2018

Java Scala Developer

Morgan Stanley, Technologies Private Ltd
06.2015 - 02.2016

Java Developer

10.2010 - 05.2015

Bachelor of Technology -

JNTU
Naresh MogilipalemSenior data engineer