Experienced Data Engineer with over 4 years of expertise in developing Java-based microservices and building scalable ETL pipelines in cloud environments. Proficient in leveraging AWS services, Apache Kafka, PySpark, and Elasticsearch to deliver real-time and batch data solutions. Skilled in data modeling, API development, and workflow orchestration using tools like Glue, Lambda, and Airflow. Adept at managing structured and unstructured data across distributed systems. Strong collaborator with a focus on building secure, efficient, and high-performance data platforms.
· Built and maintained ETL pipelines, facilitating efficient data extraction, transformation, and loading across multiple data sources.
· Utilized data modeling techniques to design and maintain databases, optimizing performance and data integrity.
· Managed data warehousing in AWS Redshift and S3, ensuring data accessibility and security through proper IAM role configuration and management.
· Leveraged AWS Glue and Lambda for automation of data pipelines and real-time data processing.
· Conducted queries using SQL, HiveQL, and Spark SQL for data analysis, reporting, and troubleshooting.
· Collaborated with cross-functional teams to define data requirements and optimize business intelligence strategies.
· Streamlined an ETL process, reducing data processing time by 30%.
· Migrated large-scale data workloads to AWS Redshift, leading to a 25% cost reduction in cloud storage.
● Designed and implemented a cloud-based data ingestion solution using Python, SQL, and Spark, improving data processing speed by 60%.
● Created ETL pipelines to extract, transform, and load data from multiple sources into a centralized data warehouse using AWS technologies such as S3, EC2, RDS, Glue, and EMR.
● Collaborated with cross-functional teams to define data requirements and optimize data ingestion workflows, enhancing data quality, accuracy, and consistency.
● Worked closely with clients to analyze business requirements and translated them into actionable data models and interactive Power BI reports.
● Processed and analyzed data from over 500 raw CSV and JSON files, ensuring the data was clean, accurate, and ready for use.
● Assisted senior analysts by crafting and executing SQL queries to extract critical insights from relational databases.
● Utilized advanced Excel functions, including VLOOKUP and Pivot Tables, to manage and analyze large datasets, supporting data-driven decision-making.
● Contributed to all phases of the Data Warehouse development lifecycle, from requirements gathering and testing to implementation, data migration, and ongoing project support.
● Designed and delivered interactive Power BI dashboards that allowed non-technical stakeholders to easily interpret complex datasets and make informed decisions.
PROGRAMMING LANGUAGES
DATABASES
ETL/BI TOOLS
BIG DATA ECOSYSTEM
CLOUD TECHNOLOGIES
VERSION CONTROL
OPERATING SYSTEMS
Application Development Associate