Summary
Overview
Work History
Education
Skills
Certification
Projects
Timeline
Generic

Rui Chen

Toronto,ON

Summary

Data analyst with strong enthusiasm in area of data science, statistical modeling and machine learning. Advanced understanding of analytical techniques and new evolving computational methods. Highly organized, motivated, and diligent with significant background in statistics.

Overview

4
4
years of professional experience
1
1
Certification

Work History

Analyst

Tredence Inc Canada
Toronto, ON
09.2021 - 10.2022
  • Participated in various experimental project proposed by managers
  • Worked on replicating GNN recommender models from scratch, and built demo drift monitoring workflow in databricks.
  • Currently working with clients to maintain their workflow on inventory optimization, bringing exposure to production environment & some data engineering basics

Associate Data Analyst

Bayesian Intelligence
Xi'an, Shanxi, China
05.2018 - 07.2018
  • Helped handling large amount(approx 5,650,000 records) of data collected throughout China using Postgresql & Python. Collaborated with different teams to understand their need and requirements, and gained valuable practical experience of processing data.
  • Participated in cleaning, sorting highly repetitive, incomplete and unordered records, and reforming them into proper format suitable for modeling and visualization.
  • Uploaded formatted data into data warehouses for further use by other teams. Provided full documentation for all code and methods applied for future reference, and received good feedback for quality of work.

Education

Bachelor of Mathematics - Statistics

University of Waterloo
Waterloo, ON
05.2021

Skills

  • Hands-on working experience with SQL and python on data processing, exposure to workflow in pyspark & databricks, exposure to MLops life cycle in databricks mlflow
  • Knowledge in machine learning, experience in computational and bayesian statistics Familarity to pytorch, tensorflow & deep graph library Hands on experience with graphical neural network in link prediction
  • Exposure to distributed training using hyperopt & horovod in databricks
  • Knowledge in predictive modeling and statistical analysis Advanced inference skills within wide range from previous coursework
  • Proficiency in R programming, including applied GLM, statistical computing, spatial modeling, and bayesian modelling with stan
  • Primary interest lies within computational statistics, especially better inference and interpretation of models
  • Good for handling multiple tasks and ad-hoc assignments
  • Good problem solving and analytical skills
  • Good writing and communication skills


Certification

  • Machine Learning Foundations & Techniques
  • Deep Learning Specialization
  • Graph Neural Networks
  • Azure Databricks & Spark Core for Data Engineering(Python/SQL)
  • Bayesian Machine Learning(in progress)


Projects

1. Graph Neural Network Recommender

- Was trying to leverage advantages of graph neural network in recommendation pipeline,

- In process, implemented a SOTA GNN proposed by microsoft research india in 2021 from scratch, which enables inductive learning on spectral GNN, and have better performance than baselines.

- Scaled up experimental training on 13000000 user-item interactions by leveraging from Horovod and optimizing source code.


2. Drift Monitoring workflow

- Implemented a drift monitoring workflow on New York taxi data leveraging databricks MLflow functionalities

- The workflow can automatically train regression models predicting taxi fares, deploy selected model to production, monitor status of data & potential drifts (inferencing offline using batch data), and automatically retrain if performance falls beyond threshold

- The retrained model shows better & more stable performance in test period comparing to baseline model without being retrained.


3. Approximated Hierarchical Model-Based Clustering on HBE Data

- Implemented a gibbs sampler of an approximated hierarchical GMM from scratch with colleagues and compiled it into an R-package, including proper documentation and unit testing.

- Applied model based clustering on HBE mucosal based data, compared the clustering results with given classes and obtained promising results.



Timeline

Analyst

Tredence Inc Canada
09.2021 - 10.2022

Associate Data Analyst

Bayesian Intelligence
05.2018 - 07.2018

Bachelor of Mathematics - Statistics

University of Waterloo
  • Machine Learning Foundations & Techniques
  • Deep Learning Specialization
  • Graph Neural Networks
  • Azure Databricks & Spark Core for Data Engineering(Python/SQL)
  • Bayesian Machine Learning(in progress)


Rui Chen