Data analyst with strong enthusiasm in area of data science, statistical modeling and machine learning. Advanced understanding of analytical techniques and new evolving computational methods. Highly organized, motivated, and diligent with significant background in statistics.
1. Graph Neural Network Recommender
- Was trying to leverage advantages of graph neural network in recommendation pipeline,
- In process, implemented a SOTA GNN proposed by microsoft research india in 2021 from scratch, which enables inductive learning on spectral GNN, and have better performance than baselines.
- Scaled up experimental training on 13000000 user-item interactions by leveraging from Horovod and optimizing source code.
2. Drift Monitoring workflow
- Implemented a drift monitoring workflow on New York taxi data leveraging databricks MLflow functionalities
- The workflow can automatically train regression models predicting taxi fares, deploy selected model to production, monitor status of data & potential drifts (inferencing offline using batch data), and automatically retrain if performance falls beyond threshold
- The retrained model shows better & more stable performance in test period comparing to baseline model without being retrained.
3. Approximated Hierarchical Model-Based Clustering on HBE Data
- Implemented a gibbs sampler of an approximated hierarchical GMM from scratch with colleagues and compiled it into an R-package, including proper documentation and unit testing.
- Applied model based clustering on HBE mucosal based data, compared the clustering results with given classes and obtained promising results.