Developed new tasks and maintained existing DAGs using Python with Airflow-service to launch nightly AWS EMR clusters for various data ingestion pipelines (Spark, Hadoop, and other tasks).
Implemented and optimized Hadoop jobs using Java, achieving up to 40% faster data processing times.
Maintained and enhanced an Airflow service built on Apache Airflow, reducing failure rates by 20% and increasing task execution speed by 15%.
Software Engineer Intern(Big Data)
Genesys
05.2023 - 06.2024
Developed Python-based DAGs using Airflow to create data pipelines, reducing execution time by 50%.
Decreased data ingestion time by 60% by leveraging Hadoop and AWS S3 EMR for batch ingestion processes in the Druid ingest pipeline.
Contributed to optimizing data processing and integration using Python, Java, Hadoop, and AWS S3, resulting in a 25% improvement in overall data pipeline performance.
Software Developer Intern (Big Data Platform)
Huawei Technologies Canada
05.2022 - 05.2023
Contributed to the open-source project OpenLookeng (OLK), a big data platform based on Presto and Java Guice, by leveraging Hive and Hadoop for distributed and large-scale data processing.
Integrated the OLK database connector with OmniTable using Java, resulting in a significant 6x performance improvement for data interaction.
Developed an Event Listener functionality in Java, enabling custom logging, debugging, and performance analysis plugins.
Implemented a caching system in Java, reducing query execution time by 50% for frequently executed queries.
Improved benchmarking and performance of OLK by conducting in-depth analysis and implementing automated processes with Python, leading to a 30% increase in query productivity and an 80% boost in efficiency.
Software Engineer Intern
BMW Group
05.2021 - 08.2021
Developed and implemented highly efficient automation test software using Java, Selenium framework, and Cucumber tools.
Achieved remarkable 80% reduction in manual testing time through successful automation of testing processes.
Optimized system performance by implementing XML objects for streamlined data input automation.
McGill Course and TA Evaluation System (https://www.cs.mcgill.ca/~ytang76/project/index.php)
Led a team of 3 in designing and developing a web app for Course and TA evaluation at McGill University using React.JS, CSS, Bootstrap for Frontend, and Firebase, PHP for Backend.
Implemented features including User Registration, Authentication, Course/TA Management, and Rating.