Summary
Overview
Work History
Education
Skills
Personal Information
Certification
Languages
Affiliations
Timeline
Generic

Lavanya Padamati

Nashua,New Hampshire

Summary

  • Google Cloud Certified and Talend Certified Data Engineer with over 7 years of experience in delivering cloud-native data solutions. Expertise in building and optimizing data pipelines using Spark, Kafka, Hive, and Hadoop frameworks. Proficient in AWS services for data orchestration and skilled in deploying solutions on Google Cloud with BigQuery and Dataflow. Strong background in ETL/ELT processes, data warehousing, and advanced data science techniques, complemented by hands-on experience in CI/CD automation and cross-functional collaboration.

Overview

8
8
years of professional experience
1
1
Certification

Work History

Data Engineer

Capgemini America Inc.
Nashua, New Hampshire
04.2023 - Current
  • Collaborated with data scientists and BI engineers to construct scalable, production-grade data pipelines using Spark (Scala/PySpark), Hive, and AWS services such as S3, Glue, Redshift, and EMR.
  • Simplified accessibility of complex credit card transaction datasets by developing advanced dashboards and data models for business intelligence reporting.
  • Designed, implemented, and maintained KPIs, reports, and dashboards to support strategic decision-making across business units.
  • Translated ambiguous business problems into data-driven solutions by integrating analytics insights with domain knowledge.
  • Migrated large datasets from HDFS to AWS S3, ensuring performance optimization, cost-efficiency, and data integrity during transition.
  • Automated end-to-end ETL workflows utilizing CloudFormation and Terraform, including CI/CD deployment pipelines and monitoring mechanisms.
  • Integrated Hive, MySQL, and other sources into data lakes through Sqoop and Spark transformations.
  • Enhanced pipeline resilience by introducing data quality validations, retry mechanisms, and performance tuning of Spark jobs.
  • Developed large-scale data processing systems using Hadoop and Spark frameworks.
  • Designed and implemented data pipelines for efficient data ingestion and transformation.
  • Developed and deployed Big Data applications using Hadoop, MapReduce, HDFS, Hive, Pig and Spark.
  • Designed and developed distributed systems leveraging Apache Kafka and other technologies.
  • Tuned query performance by applying indexing strategies across multiple databases.
  • Developed NoSQL solutions using MongoDB, Cassandra or HBase for scalability purposes.
  • Maintained high performance clusters running in production environment with YARN scheduler or Mesos framework.
  • Created custom scripts to automate the process of extracting, transforming and loading data into databases.
  • Built real-time streaming architectures utilizing Apache Storm or Apache Spark Streaming.
  • Provided technical guidance to junior team members on best practices of designing efficient ETL pipelines.
  • Created graphs and charts detailing data analysis results.
  • Developed unit test cases for code coverage of all the modules and integrated with CI and CD pipeline tools like Jenkins and GitHub.
  • Generated daily status reports summarizing quality assurance activities such as defect tracking, code coverage.
  • Assisted in developing strategies for improving overall code coverage by increasing unit tests.
  • Compiled metrics reports on code coverage, bug trends, which were presented at weekly meetings.
  • Created test classes to cover all scenarios in order to obtain required code coverage percentage.
  • Evaluated code coverage metrics generated by automated tests execution tooling.
  • Monitored quality metrics such as defect density rate and code coverage over time to assess project health.

AWS Data Engineer

Schneider Electric
Nashua, New Hampshire
08.2022 - 01.2023

Architected and deployed scalable data pipelines using AWS and GCP services, including BigQuery, S3, and Snowflake.

Developed batch and real-time pipelines with Spark and Kafka to manage large volumes of structured and semi-structured data.

Built and scheduled DAGs in Airflow for automated ingestion from S3 to Snowflake, facilitating real-time insights.

Managed data modeling and schema design in Druid and Snowflake for efficient multi-dimensional analysis.

Provisioned infrastructure and deployed distributed data services in cloud environments using Terraform and Kubernetes.

Optimized Spark applications and Databricks workflows to enhance processing speed and reduce costs.

Implemented monitoring and alerting for ingestion pipelines utilizing Prometheus and Grafana.

Enabled end-to-end data processing with Python, Spark SQL, and integrated CI/CD pipelines for production readiness.

  • Developed data pipelines using AWS services to support data processing tasks.
  • Implemented ETL processes for data extraction, transformation, and loading tasks.
  • Developed and maintained data pipelines to ingest, store, process and analyze large datasets in AWS S3 buckets.
  • Integrated existing systems with new platforms such as AWS S3 or Azure Blob Storage.
  • Managed cloud infrastructure for analytics applications, ensuring optimal performance.
  • Built fault tolerant applications that leveraged multiple Availability Zones within an AWS region.
  • Performed maintenance tasks such as backups, restores, patching, capacity planning and performance tuning of databases on AWS EC2 instances.
  • Implemented automated monitoring of data flows using Cloudwatch and Lambda functions.
  • Enforced security policies through encryption at rest, in transit mechanisms provided by KMS service alongside IAM roles, policies.
  • Managed Data Lake architecture based on Apache Parquet files stored in S3 buckets and queried via Athena.
  • Executed migration strategies between different versions of HiveQL queries running over HDFS clusters hosted in EMR.
  • Optimized query performance by creating indexes and materialized views in Amazon Redshift clusters.
  • Created ETL processes using Python scripts to move data from various sources into the target databases on AWS Redshift or RDS.
  • Selected appropriate AWS service based on compute, data or security requirements.
  • Developed and maintained CI/CD pipelines for seamless code deployment to cloud platforms.
  • Communicated with clients to understand system requirements.

Data Engineer

Data Economy
Hyderabad, India
01.2021 - 07.2021
  • Collaborated with cross-functional teams to gather data requirements.
  • Designed and implemented data pipelines for large-scale data processing.
  • Developed and maintained ETL processes using modern tools and frameworks.
  • Managed data storage solutions to ensure efficient access and retrieval.
  • Streamlined workflows by automating repetitive tasks within the data pipeline.
  • Analyzed user requirements, designed and developed ETL processes to load enterprise data into the Data Warehouse.
  • Developed and implemented data models, database designs, data access and table maintenance codes.
  • Created stored procedures for automating periodic tasks in SQL Server.
  • Developed Python scripts for extracting data from web services API's and loading into databases.
  • Participated in agile development processes, contributing to sprint planning, stand-ups, and reviews to ensure timely delivery of data projects.
  • Implemented and optimized big data storage solutions, including Hadoop and NoSQL databases, to improve data accessibility and efficiency.
  • Developed data pipelines using AWS services to support data processing tasks.
  • Collaborated with cross-functional teams to gather data requirements and specifications.
  • Automated deployment of infrastructure components including EC2 instances, VPCs and EBS volumes with CloudFormation templates.
  • Monitored resource utilization metrics such as CPU utilization and network throughput with CloudWatch alarms and dashboards.
  • Implemented automated monitoring of data flows using Cloudwatch and Lambda functions.
  • Developed Spark applications on top of Hadoop clusters running on EMR for performing complex analytics operations.
  • Managed Data Lake architecture based on Apache Parquet files stored in S3 buckets and queried via Athena.
  • Developed and maintained data pipelines to ingest, store, process and analyze large datasets in AWS S3 buckets.
  • Created ETL processes using Python scripts to move data from various sources into the target databases on AWS Redshift or RDS.

Data Analyst

Virtusa Systems Private Limited
Hyderabad, India
08.2017 - 12.2020

Designed and developed data integration pipelines using Talend and Spark for seamless data ingestion from Oracle and MySQL into Hadoop HDFS.

Performed data cleansing, normalization, and deduplication to enhance analytics accuracy within pipelines.

Created batch processing workflows in Hive and Sqoop to facilitate reporting for enterprise applications.

Contributed to foundational data lake layers, enabling effective downstream reporting through Tableau and Power BI.

Collaborated with DevOps team to automate ETL job deployments using Jenkins and Git.

  • Developed interactive dashboards using data visualization tools for stakeholder presentations.
  • Created detailed reports summarizing findings and recommendations for management review.
  • Conducted data quality assessments to ensure accuracy and reliability of information.
  • Analyzed large datasets to identify trends, patterns and correlations for business insights.
  • Translated raw data into meaningful information using statistical techniques.
  • Provided data-driven solutions to support decision making.
  • Maintained documentation of all the processes related to Data Analysis.
  • Generated reports and obtained data to develop analytics on key performance and operational metrics.
  • Worked with internal teams to understand business needs and changing strategies.
  • Developed dashboards with Tableau to monitor key performance indicators.
  • Developed and maintained databases, data systems, reorganizing data in a readable format.

Education

Master of Science - Computer Information Systems & IT

University Of Central Missouri
Warrensburg, MO
12-2022

Bachelor of Science - Computer Science

Chirala Engineering College
Chirala, India
05-2017

Skills

  • Cloud security
  • Data transformation
  • CI/CD pipelines
  • Monitoring and alerting
  • AWS redshift expertise
  • Amazon S3 proficiency
  • Data lake management
  • Python programming
  • Data migration strategies
  • Real-time data streaming
  • AWS glue ETL management
  • Data pipeline development
  • Machine learning integration
  • Hadoop ecosystem
  • Lambda functions
  • SQL querying
  • NoSQL databases
  • ETL design and implementation
  • PowerBI reporting
  • Data migration
  • AWS expertise
  • Monitoring tools
  • Azure proficiency
  • Data analysis
  • Big data architecture
  • Apache Spark
  • SQL databases
  • Google cloud platform
  • Data storage
  • Microsoft Azure
  • Machine learning
  • Big data technologies

Personal Information

Title: Senior Data Engineer

Certification

  • Google Cloud Certified Professional Data Engineer
  • Trained in Large Language Models (LLMs) – ongoing professional development.

Languages

English
Full Professional

Affiliations

  • Mentored junior data engineers on Spark optimizations and ETL best practices during internal training sessions.
  • Participated in internal knowledge-sharing sessions on Spark and AWS.
  • Recognized for consistent on-time delivery of data pipeline projects.
  • Received appreciation from leadership for improving data process efficiency.

Timeline

Data Engineer

Capgemini America Inc.
04.2023 - Current

AWS Data Engineer

Schneider Electric
08.2022 - 01.2023

Data Engineer

Data Economy
01.2021 - 07.2021

Data Analyst

Virtusa Systems Private Limited
08.2017 - 12.2020

Master of Science - Computer Information Systems & IT

University Of Central Missouri

Bachelor of Science - Computer Science

Chirala Engineering College
Lavanya Padamati