
Senior Data Engineer with 14+ years of experience designing and optimizing scalable data pipelines using Big Data (CDP), Spark, Spark Streaming, PySpark, Hadoop, Java/J2EE, Scala, Python, Kafka, Apache NIFI, AWS, GCP, AKKA, and ZIO technologies. Led successful projects at top financial institutions to enhance data processing efficiency and achieve significant performance improvements. Skilled in data modeling, ETL processes, and cross-functional collaboration to deliver impactful data solutions using agile methodologies. Proficient in applying AI/ML algorithms, statistical methods, and data visualization techniques to uncover insights and optimize processes. Adept at working with large datasets using Python and SQL programming languages and tools such as PySpark, Hadoop, and Tableau. Strong analytical and problem-solving skills with a track record of delivering actionable insights. Committed to continuous learning and staying updated with the latest trends in data science and artificial intelligence to support data-driven decision-making. Expertise in building real-time data streaming solutions using Spark Streaming, Kafka Streams, AKKA Streams, Apache Fink, Apache NIFI, and Flume. Designed and implemented high-performance and scalable solutions using various Hadoop ecosystem tools like Pig, Hive, Sqoop, Spark, Zookeeper, Solr, and Kafka. Designed, configured, and deployed Amazon Web Services (AWS) for multiple applications utilizing the AWS stack (EMR, EC2, S3, RDS, Redshift, Cloud Formation, Glue, Cloud Watch SQS, and IAM), focusing on high availability fault tolerance and auto-scaling. Experience in application design and implementation using the GCP stack (Virtual machines Cloud function Cloud run Cloud Prod Cloud SQL Big-Query Airflow STS APIGEE Databricks Google storage and Cloud Logger). Experience in implementing modern architecture solutions like Lakehouse event streaming microservices and domain-driven design architecture patterns.
Java 8
SQL programming
Data integration
Real-time analytics
Hadoop ecosystem
Data quality assurance
NoSQL databases
Spark framework
API development
Scripting languages
Performance tuning
Big data processing
Machine learning
Data migration
Data warehousing
Data modeling
ETL development
Data pipeline design
OpenShift
Kubernetes
Jira
Artifactory
Jenkins
Linux
Windows
VS Studio
NetBeans
PyCharm
Eclipse
IntelliJ IDEA
Bitbucket
CVS
SVN
GIT
Apache Pinot
Cassandra
MongoDB
HBase
Redis
Elastic search
Cloud Proc
Big-Table
Big-Query
Cloud SQL
Cloud run
Cloud function
DynamoDB
RDS
Cloud watch
Dynamo DB
S3
Lambda
EC2
Confluent Kafka
Oozie
Hive
Map Reduce
Hadoop
PySpark
GCP Cloud proc
AWS EMR
CDH
GCP
AWS
Cloudera
Databricks
ZIO
Spark
AKKA
Spring boot
Hive QL
Pig Latin
PL/SQL
SQL
Python
Scala
Java 8