Summary
Overview
Work History
Education
Skills
Certification
Languages
Awards
Publications
Certifications & Awards
Timeline
Generic

MINGYUE XUE

Toronto,ON

Summary

  • RWE Scientist with 10+ years’ expertise transforming multi-modal real-world data (EMR, registries, multi-omics) into actionable evidence for public health and clinical research.
  • Global cohort Leadership: Led data science for 108-hospital GEM Project, uncovering Crohn’s disease biomarkers with translational potential for therapeutic development (20+ publications, e.g., Gut, IF=23.1; Gastroenterology, IF=26.3).
  • Policy Impact: Analyzed 10M+ national health records to identify region-specific disease mechanisms, directly informing provincial health policies impacting 3.5M residents.
  • Technical Excellence: Spearheaded end-to-end RWD analytics, including observational study design, statistics and machine learning pipelines, and SAP development, reducing data integration time by 30% through HPC optimization.
  • Core Expertise: Observational study design | statistical analysis (CoxPH/RCS) | Data standardization (ICD-10/OMOP) | Stakeholder collaboration.

Overview

9
9
years of professional experience
1
1
Certification

Work History

Senior Postdoctoral Fellow - Real-World Evidence

University of Toronto, Mount Sinai Hospital
01.2021 - Current

Core Project: Global GEM Project: https://crohnsandcolitis.ca/Research/Funded-research/The-gem-projec

World’s Largest Prospective Crohn’s Disease Cohort (107 international sites in 7 Countries, $22 million CAD invested)

RWE Study Leadership:

  • Directed end-to-end RWE generation for 5K+ participants: SAP design → data acquisition → HPC execution.
  • Managed 80% team workflow covering EMR, multi-omics (proteomics/metabolomics), and registry data.

Advanced Analytics & Integration:

  • Statistical Modeling Pipeline:
  • Applied CoxPH with restricted cubic splines to characterize non-linear exposure-disease relationships (e.g., diet-CD study).
  • Identified key predictors via LASSO, Elastic Net, and causal inference techniques.
  • Machine Learning Pipeline:
  • Developed supervised models (XGBoost, SVM, DNN) for disease prediction; optimized via grid search hyperparameter tuning.
  • Achieved AUC=0.89 for preclinical CD prediction – highest reported performance.
  • Multi-Omics Integration:
  • Standardized variables using ICD-10 and medical ontologies; harmonized EMR, registry, and multi-omics data.
  • Processed shotgun metagenomics (HUMAnN3, MaAsLin3) from raw FASTQ to taxonomic profiles.
  • Translational Impact:
  • Built interactive R Shiny/Power BI dashboards for cohort monitoring, reducing reporting time by 40%.
  • Collaborated with medical affairs and public health agencies to translate analytics into strategic insights.
  • Delivered 4 oral presentations at DDW/ECCO (top 5% selection); mentored 10+ junior researchers.

Clinical Data Scientist

Xinjiang Hospital
01.2020 - 01.2021
  • Predictive Analytics: Developed LR model (AUC=0.85) on 50K+ EMR records, reducing HF readmissions by 12%.
  • Process Innovation: Automated sepsis detection via Python/SQL, cutting analysis time 80%.

Ph.D. Researcher | Epidemiological Modeling for Health Policy

Xinjiang Medical University
01.2016 - 01.2020
  • Flagship Project: Regression-Driven Risk Stratification Initiative (10M+ Records | Provincial Policy Integration)
  • Advanced Statistical Modeling:
  • Led multivariable logistic regression (AUC=0.92) and CoxPH with splines to model diabetes/NAFLD risk, identifying critical thresholds (HbA1c≥6.5%, HR=4.2).
  • Causal Inference & Policy Translation:
  • Built propensity score matching pipelines (STROBE-compliant) to establish evidence for screening guidelines covering 3.5M high-risk residents.
  • Technical Implementation:
  • Developed Shiny dashboards for 3 county health commissions; optimized feature selection for skewed clinical data.

Education

Ph.D. - Public health (Machine learning and biostatistics focus)

Xinjiang Medical University
01.2020

M.Sc. - Epidemiology and Biostatistics

Xinjiang Medical University
01.2014

B.S. - Health Information Management and Information Systems

Xinjiang Medical University
01.2010

Skills

  • RWE Methodologies:
  • SAP Development Observational Study Design Multi-Country Data Harmonization (ICD-10/OMOP) HEOR Principles
  • Advanced Analytics:
  • Survival Analysis (CoxPH w/ RCS) Causal Inference (PSM) Machine Learning (XGBoost, LASSO/Elastic Net)
  • Data Integration & Engineering:
  • Multi-Omics Harmonization ETL Pipelines (SQL) HPC/Cloud (Niagara)
  • Stakeholder Enablement:
  • Self-service Tools (R Shiny/Power BI) Cross-functional Collaboration
  • Languages: R, Python, SAS, SQL
  • Visualization: ggplot2, Power BI, R Shiny

Certification

  • CERTIFICATIONS
  • Senior Big Data Analyst(T190200701802135)-Ministry of Industry and Information Technology
  • Python Software Engineer(8C27-uSHL)-Microsoft

Languages

Languages: R, Python, SAS, SQL

Awards

AWARDS, newsworthy abstracts Presented at the planetary session(2022), Best Oral Presentation, T-CAIREM AI In Medicine (2023), Distinguished Abstract Award, Digestive Disease Week (2024), Second Prize, National Mathematical Modeling Contest for Graduate Students

Publications

  • First-author Publications
  • 1. Gut (Under Review, 2025) | Metabolic Biomarkers Depict Distinct Pathways for Crohn’s Disease Risk
  • Identified preclinical signatures for therapeutic targeting.
  • 2. Clinical Gastroenterology and Hepatology (2024, IF=13.6) | Environmental Risk Factors in the CCC-GEM Cohort (N=4,500)
  • Informs regulatory strategy for environmental health interventions
  • 3. Diabetes Care (2021, IF=16.2) | Non-Invasive NAFLD Prediction Model in Type 2 Diabetes (AUC=0.91)
  • 4. Scientific Reports (2020, IF=4.6) | Diabetes Risk Nomogram from 345K Chinese Participants
  • Methodology: Scalable framework for population-level risk stratification
  • 5. Frontiers in Public Health (2022, IF=6.3) | Machine Learning Framework for NAFLD Classification
  • Technical Innovation: Handled high-dimensional EHR data with feature skewness

Certifications & Awards

  • SELECTED RESEARCH IMPACT
  • First-author Publications
  • Gut (Under Review, 2025) | Metabolic Biomarkers Depict Distinct Pathways for Crohn’s Disease Risk
  • Identified preclinical signatures for therapeutic targeting.
  • Clinical Gastroenterology and Hepatology (2024, IF=13.6) | Environmental Risk Factors in the CCC-GEM Cohort (N=4,500)
  • Informs regulatory strategy for environmental health interventions
  • Diabetes Care (2021, IF=16.2) | Non-Invasive NAFLD Prediction Model in Type 2 Diabetes (AUC=0.91)
  • Scientific Reports (2020, IF=4.6) | Diabetes Risk Nomogram from 345K Chinese Participants
  • Methodology: Scalable framework for population-level risk stratification
  • Frontiers in Public Health (2022, IF=6.3) | Machine Learning Framework for NAFLD Classification
  • Technical Innovation: Handled high-dimensional EHR data with feature skewness

Timeline

Senior Postdoctoral Fellow - Real-World Evidence

University of Toronto, Mount Sinai Hospital
01.2021 - Current

Clinical Data Scientist

Xinjiang Hospital
01.2020 - 01.2021

Ph.D. Researcher | Epidemiological Modeling for Health Policy

Xinjiang Medical University
01.2016 - 01.2020

Ph.D. - Public health (Machine learning and biostatistics focus)

Xinjiang Medical University

M.Sc. - Epidemiology and Biostatistics

Xinjiang Medical University

B.S. - Health Information Management and Information Systems

Xinjiang Medical University
MINGYUE XUE