Primary prevention cardiovascular disease risk prediction model for contemporary Chinese (1°P-CARDIAC): Model derivation and validation using a hybrid statistical and machine-learning approach

Yekai Zhou, Celia Jiaxi Lin, Qiuyan Yu, Joseph Edgar Blais, Eric Yuk Fai Wan, Emmanuel Wong, Kathryn Tan, David Chung-Wah Siu, Kai Hang Yiu, Esther Wai Yin Chan, Doris Yu, William Wong, Tak-Wah Lam, Ian Chi Kei Wong, Ruibang Luo*, Celine S. L. Chui*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (SciVal)
4 Downloads (Pure)

Abstract

Background: Cardiovascular disease (CVD) is the leading cause of mortality and morbidity in China and worldwide while we are lacking in validated primary prevention model specifically for Chinese. To identify CVD high-risk individuals for early intervention, we created and validated a primary prevention risk prediction model, Personalized CARdiovascular DIsease risk Assessment for Chinese (1°P-CARDIAC), in contemporary Chinese cohorts in Hong Kong. Methods: Patients without any history of CVD was categorized as derivation and validation cohorts based on their different geographical location of residence in Hong Kong. The outcome was the first diagnosis of a composite of coronary heart disease, ischemic or hemorrhagic stroke, peripheral artery disease, and revascularization. The full model incorporated all available variables in the dataset as clinical laboratory tests, disease and medication history, family history of disease, demographic factors, and healthcare utilization. We employed XGBoost Cox model and multivariate imputation with chained equation (MICE) for derivation and missing data replacement. A basic model was developed with the integration of statistically significant and important subset of risk variables by least absolute shrinkage and selection operator (LASSO) regression. Validation was performed by 1000 bootstrap replicates and compared to four existing models: PREDICT, pooled cohort equation (PCE), China-PAR, and Framingham (Asian). Results: The study included 179,953 patients in the derivation cohort and 1,083,924 patients across two independent validation cohorts. A total of 103 covariates were included in the full model whilst 8 covariates were included the basic model. It demonstrated good performance with C-statistic of 0.87 (95% CI: 0.87, 0.87), calibration slope of 0.94 in the full model. The C-statistic in the basic model was 0.75 (95% CI: 0.75, 0.75) with calibration slope of 0.91. Other comparison risk models have lower C statistic ranging from 0.68 to 0.72. Conclusion: We developed and validated 1°P-CARDIAC, a CVD risk prediction model for primary prevention applying a novel hybrid statistical and machine-learning approach. Validation results suggest that it may offer improved performance compared to commonly used risk models. The 1°P-CARDIAC yields the similar level of accuracy and performance between basic and full model. It demonstrated both effectiveness and versatility in harnessing the power of big data and which has the potential to serve as a promising method for CVD primary prevention and improving public health outcome.
Original languageEnglish
Article numbere0322419
Number of pages15
JournalPLoS ONE
Volume20
Issue number7
DOIs
Publication statusPublished - 28 Jul 2025

Bibliographical note

Copyright © 2025 Zhou et al. This is an open access article distributed under the terms of
the Creative Commons Attribution License, which permits unrestricted use, distribution,
and reproduction in any medium, provided the original author and source are credited.

Funding

This study is funded by the Innovation and Technology Fund, Innovation and Technology Commission in Hong Kong and Amgen Asia Holdings Limited. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Keywords

  • Aged
  • Cardiovascular Diseases/prevention & control
  • East Asian People
  • Female
  • Hong Kong/epidemiology
  • Humans
  • Machine Learning
  • Male
  • Middle Aged
  • Models, Statistical
  • Primary Prevention/methods
  • Risk Assessment/methods
  • Risk Factors

Fingerprint

Dive into the research topics of 'Primary prevention cardiovascular disease risk prediction model for contemporary Chinese (1°P-CARDIAC): Model derivation and validation using a hybrid statistical and machine-learning approach'. Together they form a unique fingerprint.

Cite this