Analyzed large-scale 777 operator fleet datasets to evaluate reliability performance and deliver data-driven insights informing operational decisions.
Built predictive regression and time-series models in Python to forecast climate-related impacts and support strategic planning.
Automated data extraction, transformation, and analysis workflows using Python, reducing processing time and improving analytical efficiency.
Developed automated data quality checks in Python to validate operator datasets, improve data integrity, and enable high-quality downstream analysis.
Collaborated with customers and cross-functional stakeholders to ingest operational data, compare fleet performance against Boeing reliability recommendations, and translate insights into actionable recommendations.
Capstone: Transparent ICD-9 Coding Assistant
UC Berkeley
Berkeley
05.2024 - 12.2025
Designed and built an end-to-end ICD-9 code recommendation system that automatically assigns top-k diagnosis codes from clinical discharge summaries (MIMIC-III), reducing manual medical coding effort while improving auditability.
Implemented a hybrid retrieval–reranking pipeline that combines fine-tuned medical text embeddings (MedCPT) with large language model rerankers (Gemini Flash) to improve ICD-9 code relevance and ranking quality.
Developed evidence span extraction to surface concise, human-readable clinical justifications for each predicted code, enabling transparency for medical coders and patients.
Evaluated system performance using Precision@k, Recall@k, Micro-F1, and Macro-F1, beginning with a focused diabetes ICD case study and extending evaluation to the full ICD code set.
Built an interactive Streamlit web application supporting coder workflows (review, accept/reject codes) and a patient-facing view for code transparency and trust.
System and Data Analyst - Flight Operations Support
The Boeing Company
11.2022 - 05.2024
Created a Python tool to automate the BCS Service Request analysis for different operators
Constructed operational and interactive report in Jira dashboard using MDX to query and visualize data on Flight Operations Support product quality escapes
Analyzed and identified quality metrics (M3s) to improve Flight Operations Support products up to 80%
Integrated application components and databases across computing platforms using Teradata to data mine real-time Flight Operations data
Communicated business requirements and test results with offshore developers to manage projects
Developed and executed tests to validate system functionality against specification
Coordinated with Flight Operations teams to develop Python tools to automate workflow tracking processes with Jira REST API and Python scripts
Performed research of process, applications, systems and data to support identification of functional requirements for application or system design
Interpreted and translated application operational requirements into functional specifications
Education
Master of Science - Information and Data Science
University of California, Berkeley
Berkeley
12-2025
Bachelor of Science - Bioengineering, Bioinformatics
University of California, San Diego
San Diego
01.2021
Skills
Programming: Python, SQL
Machine learning: regression, classification, CNNs, time series
NLP & LLM systems: embeddings, retrieval, and reranking, seq-to-seq models (T5, BART), LLM APIs (Gemini) for relevance scoring
ICD-9 / ICD-10 Recommendation System: Built an end-to-end medical coding system using hybrid retrieval and LLM-based reranking to recommend relevant ICD codes from clinical notes, scaling from a diabetes-focused prototype to all ICD-9 codes and evaluating performance with Precision@k, Recall@k, and F1.
Data Visualization & Policy Analytics: Developed interactive dashboards and visual narratives using Congressional bill data to analyze legislative activity across sessions, regimes, and policy areas, translating complex political data into accessible insights for non-technical audiences.
Korean NLP & Cross-Lingual Processing: Researched and implemented Korean NLP models for grammatical error correction and cross-lingual retrieval, leveraging large-scale Korean text datasets and sequence-to-sequence architectures to address gaps in informal Korean language tooling.