Architected and deployed scalable data pipelines using AWS and GCP services, including BigQuery, S3, and Snowflake.
Developed batch and real-time pipelines with Spark and Kafka to manage large volumes of structured and semi-structured data.
Built and scheduled DAGs in Airflow for automated ingestion from S3 to Snowflake, facilitating real-time insights.
Managed data modeling and schema design in Druid and Snowflake for efficient multi-dimensional analysis.
Provisioned infrastructure and deployed distributed data services in cloud environments using Terraform and Kubernetes.
Optimized Spark applications and Databricks workflows to enhance processing speed and reduce costs.
Implemented monitoring and alerting for ingestion pipelines utilizing Prometheus and Grafana.
Enabled end-to-end data processing with Python, Spark SQL, and integrated CI/CD pipelines for production readiness.
Designed and developed data integration pipelines using Talend and Spark for seamless data ingestion from Oracle and MySQL into Hadoop HDFS.
Performed data cleansing, normalization, and deduplication to enhance analytics accuracy within pipelines.
Created batch processing workflows in Hive and Sqoop to facilitate reporting for enterprise applications.
Contributed to foundational data lake layers, enabling effective downstream reporting through Tableau and Power BI.
Collaborated with DevOps team to automate ETL job deployments using Jenkins and Git.
Title: Senior Data Engineer