Summary

Overview

Work History

Education

Skills

Certification

Accomplishments

Work Availability

Timeline

Naresh Mogilipalem

Senior data engineer

Summary

Senior Data Engineer with 14+ years of experience designing and optimizing scalable data pipelines using Big Data (CDP), Spark, Spark Streaming, PySpark, Hadoop, Java/J2EE, Scala, Python, Kafka, Apache NIFI, AWS, GCP, AKKA, and ZIO technologies. Led successful projects at top financial institutions to enhance data processing efficiency and achieve significant performance improvements. Skilled in data modeling, ETL processes, and cross-functional collaboration to deliver impactful data solutions using agile methodologies. Proficient in applying AI/ML algorithms, statistical methods, and data visualization techniques to uncover insights and optimize processes. Adept at working with large datasets using Python and SQL programming languages and tools such as PySpark, Hadoop, and Tableau. Strong analytical and problem-solving skills with a track record of delivering actionable insights. Committed to continuous learning and staying updated with the latest trends in data science and artificial intelligence to support data-driven decision-making. Expertise in building real-time data streaming solutions using Spark Streaming, Kafka Streams, AKKA Streams, Apache Fink, Apache NIFI, and Flume. Designed and implemented high-performance and scalable solutions using various Hadoop ecosystem tools like Pig, Hive, Sqoop, Spark, Zookeeper, Solr, and Kafka. Designed, configured, and deployed Amazon Web Services (AWS) for multiple applications utilizing the AWS stack (EMR, EC2, S3, RDS, Redshift, Cloud Formation, Glue, Cloud Watch SQS, and IAM), focusing on high availability fault tolerance and auto-scaling. Experience in application design and implementation using the GCP stack (Virtual machines Cloud function Cloud run Cloud Prod Cloud SQL Big-Query Airflow STS APIGEE Databricks Google storage and Cloud Logger). Experience in implementing modern architecture solutions like Lakehouse event streaming microservices and domain-driven design architecture patterns.

Overview

years of professional experience

Certifications

Language

Work History

Senior Java/Scala Data Engineer

Citi Bank

06.2024 - Current

Collaborated with Citi Bank on the Capital Market - Financial Risk Trade Book project
Designed, documented, developed, and maintained robust trade data pipelines to efficiently collect, process, and store data from multiple sources, including user interactions, listing details, and external feeds
The implementation utilized Spark, Scala, Java, AKKA, AKKA Streams, Unix, JSON, Jenkins, and a range of DevOps tools
Led the development of scalable, distributed data solutions leveraging AKKA, AKKA streaming, Spring Boot, Batch, and caching logic, and an enterprise Redis cache
Leveraged Scala with Apache Spark to process large-scale data efficiently, ensuring optimal performance and scalability
Developed a custom source by extending the graph stage to stream raw files from the input location, effectively managing stream backpressure
Optimized Scala code for performance and efficiency, including resource management, code refactoring, and algorithm optimization for handling large data volumes
Utilized Spark Datasets significantly in the development of frameworks using Pyspark
Engaged in handling complex data structures and transformations, such as nested JSON, XML, Avro, or Parquet, and converting them into structured formats suitable for analysis and managing extensive datasets
Accessed and processed HDFS data using Apache Spark within a Cloudera Distribution Hadoop (CDH) framework for high-performance data analytics
Utilized Hive to access legacy data within a Hadoop-Cloudera environment, ensuring efficient data retrieval and manipulation
Analyzed and migrated existing Spark code to the Databricks environment, enhancing performance and scalability
Contributed to the development of a Spark-based framework using Java, optimizing data processing tasks and workflows
Converted PL/SQL queries into Spark transformations using Spark Data Frames, Spark Datasets, Scala for enhanced data processing capabilities
Responsible for building, deploying, and integrating applications in Application Servers with Maven and Gradle and also worked on CI/CD tools like Jenkins, Terraform, GIT, and Maven to integrate the applications
Designed and developed REST API using Spring boot and JAVA in the project as per the requirements
Implemented Data Access Layer (DAL) using Spring data and JPA
Built scalable data pipelines using spark, Scala, and Python within an Airflow scheduling framework
Created POC using snowflake and ingested data into database using spark and API's
Consumed data from Kafka topic and applied all required transformation and persisted in the database
Developed Spring boot data API with Kafka template to consume the data from Kafka topic using JAVA language
Used ZIO streams to consume data from Kafka topics, ensuring efficient data processing and integration
Developed high-availability, low-latency applications using ZIO fiber, optimizing performance and reliability
Utilized various ZIO libraries such as ZIO Config, ZIO JDBC, ZIO Kafka, ZIO JSON, and ZIO Logging for comprehensive application development
Collaborated with Data Scientists, Product Managers, Operations, Finance, and Software Engineers to define data requirements and develop impactful data solutions supporting reporting objectives
Analyzed extensive datasets to identify gaps and inconsistencies, delivering actionable insights to drive informed decision-making
Participated in code reviews to ensure adherence to quality, performance, and security standards
Engaged with business teams to collaborate on the development, implementation, and ongoing enhancement of a scalable data platform capable of managing multiple datasets, from data pipelines and platforms to warehouses, and presenting data to both internal and customer-facing applications
Fostered close collaboration across different teams, including product, engineering, data science, and external partners, to address data modeling, oversee data life cycle management, ensure data governance, and establish processes for legal compliance
Engaged in a narrative-driven agile development approach, actively contributing to daily scrum meetings
Environment: Hadoop, CDH, AKKA, AKKA streams, Spark, PySpark, Java, Scala, Hive, HDFS, Oracle, Unix, JSON, Spring boot, Jenkins.

Senior Data Engineer

Curinos Inc

10.2022 - 05.2024

Curinos Inc
Processed lender benchmark data using Spark, Scala, AWS(Redshift, PySpark, Spark SQL, Unix, JSON, Python, Jenkins, DevOps, Tikitapu, Metric-scape, Databricks, Airflow, Kafka)
Responsible for building scalable distributed data solutions using AWS EMR, Streaming, Batch, and NoSQL databases
Leveraged Scala and Python (PySpark) with Apache Spark to process large-scale data efficiently, ensuring optimal performance and scalability
Migrated Hive queries into Spark transformations using Spark Data Frames, Spark Datasets, Scala and python
Used Spark SQL API over EMR cluster using Scala to perform analytics on data in S3 and Redshift
Developed several Proof of Concepts using Scala and Python, then successfully deployed on both EMR and Databricks clusters
Optimized Scala code for performance and efficiency, including resource management, code refactoring, and algorithm optimization for handling large data volumes
Developed multi-modular projects for framework advancement with Spark and Scala
Used Spark framework, scripting languages (e.g., Python, Bash), and programming languages (e.g., SQL, Java, Scala) extensively to create, develop, and manage intricate data processing, ETL (Extract, Transform, Load) tasks, and automate AWS systems
Engaged in handling complex data structures and transformations, such as nested JSON, XML, Avro, or Parquet, and converting them into structured formats suitable for analysis and managing extensive datasets
Engaged with business teams to collaborate on the development, implementation, and ongoing enhancement of a scalable data platform capable of managing multiple datasets, from data pipelines and platforms to warehouses, and presenting data to both internal and customer-facing applications
Strategizing, constructing, and upkeeping data ETL/ELT pipelines to collect and combine necessary data for diverse data analyses and reporting objectives, while also enhancing the functionality, monitoring, and performance of the data warehouse
Architected, coded, and managed scalable data pipelines using AWS Glue, AWS Lambda, and Amazon Kinesis for efficient data ingestion and transformation
Developed and optimized ETL processes using AWS Glue and Apache Spark, facilitating seamless data migration and transformation
Implemented and managed data storage solutions with Amazon S3, Amazon Redshift, and Amazon RDS, ensuring optimized data storage and retrieval
Implemented robust monitoring and logging mechanisms using AWS CloudWatch, including setting up alerts for proactive issue resolution
Developed and maintained Java-based microservices using AWS Lambda for seamless and scalable API solutions
Successfully migrated legacy SQL Server jobs to the AWS Databricks environment, modernizing the data processing infrastructure
Utilized Databricks Delta Lake tables for efficient and reliable data processing
Built and managed streaming data pipelines using Databricks, ensuring real-time data processing and analytics
Ingested data into Amazon Redshift using Spark and Databricks notebooks, facilitating efficient data storage and access
Created RDDs, applied data filters in Spark, and generated Redshift tables for user access, optimizing data workflows
Imported data into Apache Pinot NoSQL database using Spark for real-time access by APIs and business users
Developed a distributed copy framework to transfer data from HDFS to S3 Storage, ensuring seamless data migration
Created and implemented complex SQL queries, stored procedures, functions, packages, and triggers in SQL Server for comprehensive data management
Converted PL/SQL queries into Spark transformations using Spark Data Frames, Spark Datasets, Scala, and Python for enhanced data processing capabilities
Experienced with Databricks and PySpark for advanced data processing and analysis, driving efficient data solutions
Created Databricks workflows to automate and schedule the jobs
Used Cloud build for continuous integration and deployment purpose
Responsible for building, deploying, and integrating applications in Application Servers with Maven and Gradle and also worked on CI/CD tools like Jenkins, Terraform, GIT, and Maven to integrate the applications
Environment: Hadoop, Spark, Scala, Hive, HDFS, MSSQL, Spark, PySpark, Spark SQL, Unix, JSON, Python, Spring boot, Spring, Jenkins, Airflow, DevOps, AWS (Redshift, EMR, EC2, S3, Lambda and Databricks)

Senior Bigdata Engineer

HSBC Bank

07.2021 - 10.2022

Worked with wholesale department global operations – Processed Customer KYC data using Spark, Scala, Elastic search (ELK), Hive, HDFS, Zookeeper, Spark, PySpark, Spark SQL, Unix, JSON, Python, Spring boot, Spring, Jenkins, DevOps, GCP, Cloud Run, Cloud Functions, Databricks
Developed spark codebase using spark and Scala
Convert Hive/SQL queries into Spark transformations using Spark Data Frames, Spark Datasets, Scala and python
Use spark core and spark sql API over Cloudera Hadoop YARN to perform analytics on data in Hive
Consumed and processed different type of files such as AVRO, JSON, Parquet and fixed width files
Used Ansible for continues integration and deployment purpose
Scheduled jobs using control-m
Developed spring boot API to serve the data to end clients from elastic search
Created distributed copy framework to copy the data from HDFS to google cloud storage
Design and develop Python/Scala/Java applications using GCP standards
Deploy Scala components in GCP as a container and integrated them with the elastic search for log analysis
Used GCP – Databrick's environment to process and apply required transformations on data as per business requirements
Created a few Cloud-function to handle few internal events such storing the data into big query etc
Created serverless API and deployed in cloud run
Accessed cloud spanner to serve the cloud run api's
Used cloud monitoring to analyze job logs
Used cloud composer – Air flow to schedule jobs
Managed and provisioned GCP infrastructure using Infrastructure as Code tools such as Terraform or Google Cloud Deployment Manager
Automated GCP operations using scripting languages like Python or Bash and GCP SDKs or APIs
Configured and managed virtual machines in Compute Engine, including setting up auto-scaling and load balancing
Developed and maintained CI/CD pipelines for automated code deployment using Google Cloud Build, Source Repositories, and Container Registry
Optimized GCP costs and resource usage, providing cost estimates and reports, and implementing cost-saving strategies
Monitored and logging with Stack driver
Continuous Integration/Continuous Deployment (CI/CD) with Google Cloud Build
Responsible for building, deploying, and integrating applications in Application Servers with ANT, Maven and Gradle and also worked on CICD tools like, Jenkins, Ansible, GIT, Maven to integrate the applications
Environment: Hadoop, Spark, NIFI, Elastic search, JAVA 8, Scala, Scala ZIO, AKKA, Cassandra, Hive, HDFS, MySQL, Sqoop, Oozie, Pig, MapReduce, HBase, Zookeeper, Spark, pyspark, Spark steaming, Spark SQL, Unix, Confluent Kafka, JSON, Python, Spring boot, Spring, Jenkins, DevOps, GCP(Cloud storage, Cloud Run, Cloud functions, Big Query and Databricks).

Senior Big Data Engineer

RBC

06.2018 - 07.2021

Worked with the Fraud group in RBC, our main aim is to identify the frauds as soon as possible and minimize the fraud losses using Hadoop, Spark, NIFI, Elastic search (ELK), JAVA, Scala, Scala ZIO, AKKA, Cassandra, Hive, HDFS, MySQL, Sqoop, Oozie, Pig, MapReduce, HBase, Zookeeper, Spark, pyspark, Spark steaming, Spark SQL, Unix, Confluent Kafka, JSON, Python, Spring boot, Spring, Jenkins, DevOps
Responsible for building scalable distributed data solutions using Hadoop Cloudera, Kafka, Streaming, Batch, and NoSQL databases
Convert Hive/SQL queries into Spark transformations using Spark Data Frames, Spark Datasets, Scala and python
Develop Hive Scripts, Pig scripts, Unix Shell scripts, Spark batch and Spark streaming programming for all ETL loading processes and converting the files into parquet in the Hadoop File System
Defined real-time big data streaming solutions across the cluster using Spark Streaming, Apache Storm, Kafka K-Streams, NIFI and Flume
Used many Apache NIFI processors such as connect Kafka, MQ, elastic search(ELK), HDFS, MySQL, Cassandra and Oracle
Create custom Apache NIFI processors using java 8 to meet a few of our requirements
Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios
Implement the Feedzai platform for data model execution and to generate a fraud score
Responsible for building, deploying, and integrating applications in Application Servers with ANT, Maven and Gradle and also worked on CICD tools like Jenkins, GIT, and Maven to integrate the applications
Created applications using Confluent Kafka, which monitors consumer lag within Apache Kafka clusters
Used a confluent Kafka control center to Monitor the Kafka topic lag and Kafka topic offsets
Used the Confluent Kafka schema registry to register Kafka topic schema for messages
Consumed and produced a wide variety of data such as AVRO, JSON, Text, and Binary from/to Kafka to topic
Developed interactive Microservice using SpringBoot, Rest API, Spring Data JPA, Toggle API and Spring Data Repository
Developed Java spring boot rest API and hosted in APIGEE platform
Implemented MVC design pattern using spring framework, ORM technology using hibernate and factory and singleton design patterns for object creation and maintaining single instances of objects in JVM internals
Environment: Hadoop, Spark, Elastic search, JAVA 8, Scala, Hive, HDFS, MySQL, Sqoop, Zookeeper, Spark, pyspark, Spark SQL, Unix, JSON, Python, Spring boot, Spring, Ansible, Jenkins, DevOps.

Senior Java Scala developer

Development Bank of Singapore

02.2016 - 08.2018

Spark/Scala/Big Data
Migrated existing oracle Exa-DB data warehouse system to Hadoop ecosystem and created the data marts based on the type of data using Hadoop HDFS, Flume, Pig, Hive, Oozie, Zookeeper, HBase, Spark, Storm, Spark SQL, JAVA, Scala, Spring Boot, Spring, Kafka, MongoDB, Linux, Sqoop, Hive, AWS
Developed Spark jobs are written in Scala to perform operations like data aggregation, data processing and data analysis
Converted existing PL/SQL code base and jobs to spark jobs
Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis
Load the data into Spark RDD and performed in-memory data computation to generate the output response
Involved in creating Hive tables, and then applied HiveQL on those tables for data validation
Used Spark for a series of dependent jobs and iterative algorithms
Developed a data pipeline using Kafka and Spark Streaming to store data in HDFS
Performance Tuning for Hive and Pig Job's performance parameters along with native MapReduce parameters to avoid excessive disk spills enabled temp file compression between jobs in the data pipeline to handle production size data in a multi-tenant cluster environment
Environment: Hadoop HDFS, Flume, Pig, Hive, Oozie, Zookeeper, HBase, Spark, Storm, Spark SQL, JAVA, Scala, Spring Boot, Spring, Kafka, MongoDB, Linux, Sqoop, Hive, AWS.

Java Scala Developer

Morgan Stanley, Technologies Private Ltd

06.2015 - 02.2016

Designed and implemented MapReduce jobs, Hive queries, and PIG scripts to support a data warehouse migration project
Utilized a robust tech stack, including Hadoop, Spark, Scala, MapReduce, HDFS, Kafka, Pig, Hive, Java (JDK 1.7), Spring Boot, Oracle 11g/10g, PL/SQL, SQL
PLUS, Linux, and Sqoop
Implemented multiple Map Reduce Jobs in java for data cleansing and pre-processing
Developed interactive Microservice using SpringBoot, Rest API, Spring Data JPA, Toggle API and Spring Data Repository
Work with NoSQL databases like HBase in creating HBase tables to load large sets of semi-structured data coming from various sources
Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS
Developed job workflows in Oozie to automate the tasks of loading the data into HDFS
Created Hive tables, loaded data, and wrote Hive queries optimized for execution through MapReduce
Developed Simple to Complex Map Reduce Jobs using Hive and Pig
Developed the Pig UDF to pre-process the data for analysis
Environment: Hadoop, Spark, Scala, MapReduce, HDFS, Kafka, Pig, Hive, Java (jdk1.7), Oracle 11g/10g, PL/SQL, SQL
PLUS, Linux, Sqoop, Hive
In2m

Java Developer

10.2010 - 05.2015

Created enterprise deployment strategy and designed the enterprise deployment process to deploy Web Services, and J2EE programs on more than 7 different SOA/WebLogic instances across development, test and Linux production environments
Designed user interface HTML, Apache Flex, CSS, XML, Java Script and JSP
Updated user-interactive web pages from JSP and CSS to Apache Flex, CSS, and JavaScript for the best user experience
Developed Servlets, Session and Entity Beans handling business logic and data
Wrote PL/SQL Packages and Stored procedures to implement business rules and validations
The data is collected from distributed sources into Avro models
Applied transformations and standardizations and loaded them into HBase for further data processing
Involved in developing, testing and implementing the system using Struts, JSF, and IBatis
Understanding how to apply technologies to solve big data problems and to develop innovative Hadoop, Hive-based big data solutions
Automated Sqoop scripts to pull the data from Oracle
Apache Flex, JavaScript, Hadoop, Hive, Oracle, Linux, Servlets, JDBC, PL/SQL, ODBC, Custom Framework, XML, CSS, HTML and MySQL., Used Rally tool to implement agile methodology for inspection, adaption and rapid delivery of high-quality software
Created User Interface using MXML, ActionScript and CSS
Implemented many advanced graphs, charts and dashboard layout designs using FLEX
Used Cairngorm Framework with FLEX to interact with backend J2EE and RPC/gRPC calls
Interacted with the existing JSP pages using Remote processer calls of LCDS Data Services
Used SVN for version control of the code to maintain the revisions and versions
Pages (JSP), Apache Flex, JavaScript, LCDS, JBOSS, Oracle, Servlets, JDBC, PL/SQL, ODBC, Struts Framework, XML, CSS, HTML, DHTML, XSL, XSLT and MySQL.

Education

Bachelor of Technology -

JNTU

Skills

Java 8

Certification

Cloudera Certified Developer for Apache Hadoop (CCDH)

Accomplishments

Got DARE TO TRY award | created a few challenging POCs and executed successfully while architecting the project.
Got DEVELOPER OF THE YEAR FOR 2014 award | took the leadership role without title and applied new innovative methods to increase the team's efficiency.

Work Availability

monday

tuesday

wednesday

thursday

friday

saturday

sunday

morning

afternoon

evening

swipe to browse

Timeline

Senior Java/Scala Data Engineer

Citi Bank

06.2024 - Current

Senior Data Engineer

Curinos Inc

10.2022 - 05.2024

Senior Bigdata Engineer

HSBC Bank

07.2021 - 10.2022

Senior Big Data Engineer

RBC

06.2018 - 07.2021

Senior Java Scala developer

Development Bank of Singapore

02.2016 - 08.2018

Java Scala Developer

Morgan Stanley, Technologies Private Ltd

06.2015 - 02.2016

Java Developer

10.2010 - 05.2015

Bachelor of Technology -

JNTU

Naresh Mogilipalem

Summary

Overview

Work History

Senior Java/Scala Data Engineer

Senior Data Engineer

Senior Bigdata Engineer

Senior Big Data Engineer

Senior Java Scala developer

Java Scala Developer

Java Developer

Education

Bachelor of Technology -

Skills

Certification

Accomplishments

Work Availability

Timeline

Senior Java/Scala Data Engineer

Senior Data Engineer

Senior Bigdata Engineer

Senior Big Data Engineer

Senior Java Scala developer

Java Scala Developer

Java Developer

Bachelor of Technology -

Similar Profiles

Senthilkumar SSenthilkumar S

null null

Freddy Marín SánchezFreddy Marín Sánchez

NISHANTH VIJAYANNISHANTH VIJAYAN

Prashant KumarPrashant Kumar