Submitted:
01 December 2025
Posted:
08 December 2025
You are already at the latest version
Abstract
Background: The rapid expansion of real-world data in medicine is driving the adoption of advanced methods like Artificial Intelligence (AI) and Propensity Score Matching (PSM). AI is widely applied across diagnostics, prediction, and treatment planning, while PSM is a crucial statistical technique used in quasi-experimental studies to mitigate confounding bias and approximate the reliability of randomized controlled trials. There is a growing research interest in combining these two methods to leverage their symbiotic strengths, but this association has not been holistically explored. Methodology: This study employed Synthetic Thematic Analysis (STA), derived from synthetic knowledge synthesis, to systematically review the existing literature on AI and PSM in medicine. Publications were harvested from the Scopus database using a comprehensive search string limited to the Medical subject area. The resulting corpus (N=433 documents) was analyzed using bibliometric tools (Bibliometrix and VOSViewer) to map the research landscape, identify thematic clusters based on author keywords, analyze collaboration patterns, and synthesize findings from highly prolific publications. Results: The field is young and rapidly accelerating, showing an exponential increase from 2020 to 2024. China and the USA dominate research production and citation impact. The symbiotic relationship is published in high-impact medical journals and health informatics journals. STA identified four main thematic clusters: Prediction, Cancer Management, Diagnosing, and Deep Learning. AI and PSM are combined in two primary ways: AI used in PSM and PSM used in AI. Conclusion: The symbiotic association between AI and PSM is a global and rapidly developing trend in medical research, driven by major international contributors. This convergence is enhancing methodological rigor in observational studies, primarily by improving prediction models and refining causal inference in complex areas like cardiovascular disease, cancer, and diagnostics.
Keywords:
1. Introduction
- What are the dynamic and spatial features of the research literature production of the AI and PSM use in medicine?
- How is the symbiosis association between AI and PSM reflected in most prolific source titles and most productive countries?
- What research themes emerge in studies combining AI and PSM for medical data analysis?
- What are the more prolific AI methods, medical applications, and diagnoses in combined AI and PSM analyses?
- What are the dominant research trends in the combined use of PSM and AI?
2. Methodology

3. Results

| AI algorithms and services | n | Medical applications | n | Diagnosis | N |
| Machine learning | 66 | Prognosis | 11 | heart diseases | 12 |
| Artificial intelligence | 23 | Mortality | 10 | atrial fibrillation | 6 |
| Deep learning | 11 | Nomogram | 5 | covid-19 | 6 |
| Decision tree | 5 | Prediction model | 11 | gastric cancer | 6 |
| Random forest | 6 | Diagnosis |
4 | hepatocellular carcinoma | 6 |
| Natural language processing |
5 | Decision support | 6 | Stroke | 3 |
| Big data | 3 | Risk assessment | 14 | Kidney diseases | 11 |
| SHAP/Explainable AI | 2 | epidemiology |
5 | Diabetes | 9 |
| Missing data imputation | 1 | Survival analysis |
11 | Cardiovascular diseases | 18 |
| Feature selection | 2 | Health services | 6 | Coronary diseases | 7 |

4. Discussion
5. Conclusion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Public Involvement Statement
Use of Artificial Intelligence
References
- Dang, A. Real-World Evidence: A Primer. Pharm. Med. 2023, 37, 25–36. [Google Scholar] [CrossRef] [PubMed]
- Li, Q.; Lin, J.; Chi, A.; Davies, S. Practical considerations of utilizing propensity score methods in clinical development using real-world and historical data. Contemp. Clin. Trials 2020, 97, 106123. [Google Scholar] [CrossRef] [PubMed]
- Rivas, J.G.; Kraft, P.; Evans-Axelsson, S.; Hijazy, A.; Beyer, K.; De Meulder, B.; Liu, A.Q.; Golozar, A.; Harbachou, A.; Feng, Q.; et al. Real-world Evidence on Baseline Characteristics and Treatment in Metastatic Hormone-sensitive Prostate Cancer: Findings from the PIONEER 2.0 Big Data Investigation Group. Eur. Urol. Open Sci. 2025, 81, 82–91. [Google Scholar] [CrossRef] [PubMed]
- Al-Antari, M.A. Artificial Intelligence for Medical Diagnostics—Existing and Future AI Technology! Diagnostics 2023, 13, 688. [Google Scholar] [CrossRef]
- Artificial intelligence meets medical robotics. Science. n.d. Available online: https://www.science.org/doi/full/10.1126/science.adj3312?casa_token=HoLADs-riL4AAAAA%3AlU3aQJbwQEQy0iPYzPU33NHeoF8CLJxIq8kJonOrHDAyKUZ1yYmEgCiA1wbPSyJFsiEKks2hnpeys2U (accessed on 13 December 2024).
- Bonkhoff, A.K.; Grefkes, C. Precision medicine in stroke: towards personalized outcome predictions using artificial intelligence. Brain 2021, 145, 457–475. [Google Scholar] [CrossRef]
- Briganti, G.; Le Moine, O. Artificial Intelligence in Medicine: Today and Tomorrow. Front. Med. 2020, 7, 27. [Google Scholar] [CrossRef]
- Liao, J.; Li, X.; Gan, Y.; Han, S.; Rong, P.; Wang, W.; Li, W.; Zhou, L. Artificial intelligence assists precision medicine in cancer treatment. Front. Oncol. 2023, 12, 998222. [Google Scholar] [CrossRef]
- Muehlematter, U.J.; Daniore, P.; Vokinger, K.N. Approval of artificial intelligence and machine learning-based medical devices in the USA and Europe (2015–20): A comparative analysis. Lancet Digit. Health 2021, 3, e195–e203. [Google Scholar] [CrossRef]
- Shick, A.A.; Webber, C.M.; Kiarashi, N.; Weinberg, J.P.; Deoras, A.; Petrick, N.; Saha, A.; Diamond, M.C. Transparency of artificial intelligence/machine learning-enabled medical devices. npj Digit. Med. 2024, 7, 1–4. [Google Scholar] [CrossRef]
- Tian, M.; Shen, Z.; Wu, X.; Wei, K.; Liu, Y. The Application of Artificial Intelligence in Medical Diagnostics: A New Frontier. Acad. J. Sci. Technol. 2023, 8, 57–61. [Google Scholar] [CrossRef]
- van de Sande, D.; E Van Genderen, M.; Smit, J.M.; Huiskens, J.; Visser, J.J.; Veen, R.E.R.; van Unen, E.; Ba, O.H.; Gommers, D.; van Bommel, J. Developing, implementing and governing artificial intelligence in medicine: a step-by-step approach to prevent an artificial intelligence winter. BMJ Heal. Care Informatics 2022, 29, e100495. [Google Scholar] [CrossRef]
- Lu, Y.; Jin, J.; Zhang, H.; Lu, Q.; Zhang, Y.; Liu, C.; Liang, Y.; Tian, S.; Zhao, Y.; Fan, H. Traumatic brain injury: Bridging pathophysiological insights and precision treatment strategies. Neural Regen. Res. 2025, 21, 887–907. [Google Scholar] [CrossRef]
- Xiong, X.; Zheng, L.-W.; Ding, Y.; Chen, Y.-F.; Cai, Y.-W.; Wang, L.-P.; Huang, L.; Liu, C.-C.; Shao, Z.-M.; Yu, K.-D. Breast cancer: pathogenesis and treatments. Signal Transduct. Target. Ther. 2025, 10, 1–33. [Google Scholar] [CrossRef] [PubMed]
- Katip, W.; Rayanakorn, A.; Oberdorfer, P.; Taruangsri, P.; Nampuan, T. Short versus long course of colistin treatment for carbapenem-resistant A. baumannii in critically ill patients: A propensity score matching study. J. Infect. Public Heal. 2023, 16, 1249–1255. [Google Scholar] [CrossRef]
- Krenzien, F.; Schmelzle, M.; Pratschke, J.; Feldbrügge, L.; Liu, R.; Liu, Q.; Zhang, W.; Zhao, J.J.; Tan, H.-L.; Cipriani, F.; et al. Propensity Score-Matching Analysis Comparing Robotic Versus Laparoscopic Limited Liver Resections of the Posterosuperior Segments. Ann. Surg. 2023, 279, 297–305. [Google Scholar] [CrossRef]
- Langworthy, B.; Wu, Y.; Wang, M. An overview of propensity score matching methods for clustered data. Stat. Methods Med Res. 2022, 32, 641–655. [Google Scholar] [CrossRef] [PubMed]
- Meneguzzo, P.; Antoniades, A.; Garolla, A.; Tozzi, F.; Todisco, P. Predictors of psychopathology response in atypical anorexia nervosa following inpatient treatment: A propensity score matching study of weight suppression and weight loss speed. Int. J. Eat. Disord. 2024, 57, 1002–1007. [Google Scholar] [CrossRef] [PubMed]
- Wang, S.V.; Schneeweiss, S.; Initiative, R.-D.; Franklin, J.M.; Desai, R.J.; Feldman, W.; Garry, E.M.; Glynn, R.J.; Lin, K.J.; Paik, J.; et al. Emulation of Randomized Clinical Trials With Nonrandomized Database Analyses. JAMA 2023, 329, 1376–1385. [Google Scholar] [CrossRef]
- Zhu, P.; Liao, W.; Zhang, W.-G.; Chen, L.; Shu, C.; Zhang, Z.-W.; Huang, Z.-Y.; Chen, Y.-F.; Lau, W.Y.; Zhang, B.-X.; et al. A Prospective Study Using Propensity Score Matching to Compare Long-term Survival Outcomes After Robotic-assisted, Laparoscopic, or Open Liver Resection for Patients With BCLC Stage 0-A Hepatocellular Carcinoma. Ann. Surg. 2022, 277, e103–e111. [Google Scholar] [CrossRef]
- Jochum, F.; Dumas, É.; Gougis, P.; Hamy, A.-S.; Querleu, D.; Lecointre, L.; Gaillard, T.; Reyal, F.; Lecuru, F.; Laas, E.; et al. Survival outcomes of primary vs interval cytoreductive surgery for International Federation of Gynecology and Obstetrics stage IV ovarian cancer: a nationwide population-based target trial emulation. Am. J. Obstet. Gynecol. 2024, 232, 194.e1–194.e11. [Google Scholar] [CrossRef]
- Yang, S.; Hussain, M.; Zahid, R.A.; Maqsood, U.S. The role of artificial intelligence in corporate digital strategies: evidence from China. Kybernetes 2024, 54, 3062–3082. [Google Scholar] [CrossRef]
- Park, J.-B.; Bae, J.H. Effectiveness of a novel artificial intelligence-assisted colonoscopy system for adenoma detection: a prospective, propensity score-matched, non-randomized controlled study in Korea. Clin. Endosc. 2025, 58, 112–120. [Google Scholar] [CrossRef]
- Benedetto, U.; Head, S.J.; Angelini, G.D.; Blackstone, E.H. Statistical primer: propensity score matching and its alternatives†. Eur. J. Cardio-Thoracic Surg. 2018, 53, 1112–1117. [Google Scholar] [CrossRef]
- Kim, D.W. Statistical Methods for Baseline Adjustment and Cohort Analysis in Korean National Health Insurance Claims Data: A Review of PSM, IPTW, and Survival Analysis With Future Directions. J. Korean Med Sci. 2025, 40, e110. [Google Scholar] [CrossRef]
- Ghimire, L.; Waller, E. The Future of Health Physics: Trends, Challenges, and Innovation. Heal. Phys. 2024, 128, 167–189. [Google Scholar] [CrossRef]
- Xiao, X.; Alharbi, K.; Zhang, P.; Qin, H.; Yue, X. Bayesian federated causal inference and its application in manufacturing. J. Intell. Manuf. 2025, 1–25. [Google Scholar] [CrossRef]
- Hennecken, J. Predicting Subclinical Atrial Fibrillation using Artificial Intelligence and validate using propensity-score matching and Explainable AI. Master Thesis, 2024. [Google Scholar]
- Ishiyama, M.; Kudo, S.-E.; Misawa, M.; Mori, Y.; Maeda, Y.; Ichimasa, K.; Kudo, T.; Hayashi, T.; Wakamura, K.; Miyachi, H.; et al. Impact of the clinical use of artificial intelligence–assisted neoplasia detection for colonoscopy: a large-scale prospective, propensity score–matched study (with video). Gastrointest. Endosc. 2022, 95, 155–163. [Google Scholar] [CrossRef] [PubMed]
- Kim, H.; Choi, J.S.; Kim, K.; Ko, E.S.; Ko, E.Y.; Han, B.-K. Effect of artificial intelligence–based computer-aided diagnosis on the screening outcomes of digital mammography: a matched cohort study. Eur. Radiol. 2023, 33, 7186–7198. [Google Scholar] [CrossRef] [PubMed]
- Prosperi, M.; Ghosh, S.; Chen, Z.; Salemi, M.; Lyu, T.; Zhao, J.; Bian, J. Causal AI with Real World Data: Do Statins Protect from Alzheimer's Disease Onset? ICMHI 2021: 2021 5th International Conference on Medical and Health Informatics, LOCATION OF CONFERENCE, JapanDATE OF CONFERENCE; pp. 296–303.
- Karim, M.E. Can supervised deep learning architecture outperform autoencoders in building propensity score models for matching? BMC Med Res. Methodol. 2024, 24, 1–10. [Google Scholar] [CrossRef]
- Lourenço, L.; Weber, L.; Garcia, L.; Ramos, V.; Souza, J. Machine Learning Algorithms to Estimate Propensity Scores in Health Policy Evaluation: A Scoping Review. Int. J. Environ. Res. Public Heal. 2024, 21, 1484. [Google Scholar] [CrossRef]
- Whata, A.; Chimedza, C. Evaluating Uses of Deep Learning Methods for Causal Inference. IEEE Access 2022, 10, 2813–2827. [Google Scholar] [CrossRef]
- Kokol, P.; Kokol, M.; Zagoranski, S. Machine learning on small size samples: A synthetic knowledge synthesis. Sci. Prog. 2022, 105. [Google Scholar] [CrossRef]
- Kokol, P. Synthetic Knowledge Synthesis in Hospital Libraries. J. Hosp. Libr. 2023, 24, 10–17. [Google Scholar] [CrossRef]
- Van Eck, N.J.; Waltman, L. Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics 2010, 84, 523–538. [Google Scholar] [CrossRef]
- Aria, M.; Cuccurullo, C. bibliometrix: An R-tool for comprehensive science mapping analysis. J. Informetr. 2017, 11, 959–975. [Google Scholar] [CrossRef]
- Van Eck, N.J.; Waltman, L. Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics 2010, 84, 523–538. [Google Scholar] [CrossRef]
- Austin, P.C. Comparing paired vs non-paired statistical methods of analyses when making inferences about absolute risk reductions in propensity-score matched samples. Stat. Med. 2011, 30, 1292–1301. [Google Scholar] [CrossRef] [PubMed]
- Austin, P.C.; Small, D.S. The use of bootstrapping when using propensity-score matching without replacement: a simulation study. Stat. Med. 2014, 33, 4306–4319. [Google Scholar] [CrossRef]
- Benedetto, U.; Head, S.J.; Angelini, G.D.; Blackstone, E.H. Statistical primer: propensity score matching and its alternatives†. Eur. J. Cardio-Thoracic Surg. 2018, 53, 1112–1117. [Google Scholar] [CrossRef] [PubMed]
- Scimago Journal & Country Rank Home Page. Available online: https://www.scimagojr.com/.
- Islam, N.; Islam, S.; Roy, P.B. A bibliometric technique for analyzing trends in public health research. Data Sci. Inf. 2024, 4, 89–103. [Google Scholar] [CrossRef]
- Colaneri, M.; Fama, F.; Fassio, F.; Holmes, D.; Scaglione, G.; Mariani, C.; Galli, L.; Lai, A.; Antinori, S.; Gori, A.; et al. Impact of early antiviral therapy on SARS-CoV-2 clearance time in high-risk COVID-19 subjects: A propensity score matching study. Int. J. Infect. Dis. 2024, 149, 107265. [Google Scholar] [CrossRef]
- Xie, Y.; Shen, H.; Xu, Q.; Tu, C.; Yang, R.; Liu, T.; Tang, H.; Miao, Z.; Zhang, J. Evaluating coronary arteries and predicting MACEs using CCTA in lung cancer patients receiving chemotherapy or chemoradiotherapy. Radiother. Oncol. 2024, 200, 110498. [Google Scholar] [CrossRef]
- Lim, J.; Choi, Y.-J.; Kim, B.S.; Rhee, T.-M.; Lee, H.-J.; Han, K.-D.; Park, J.-B.; Na, J.O.; Kim, Y.-J.; Lee, H.; et al. Comparative cardiovascular outcomes in type 2 diabetes patients taking dapagliflozin versus empagliflozin: a nationwide population-based cohort study. Cardiovasc. Diabetol. 2023, 22, 1–10. [Google Scholar] [CrossRef]
- Squiccimarro, E.; Lorusso, R.; Consiglio, A.; Labriola, C.; Haumann, R.G.; Piancone, F.; Speziale, G.; Whitlock, R.P.; Paparella, D. Impact of Inflammation After Cardiac Surgery on 30-Day Mortality and Machine Learning Risk Prediction. J. Cardiothorac. Vasc. Anesthesia 2024, 39, 683–691. [Google Scholar] [CrossRef]
- Ngufor, C.; Zhang, N.; Van Houten, H.K.; Holmes, D.R.; Graff-Radford, J.; Alkhouli, M.; Friedman, P.A.; Noseworthy, P.A.; Yao, X. Causal Machine Learning for Left Atrial Appendage Occlusion in Patients With Atrial Fibrillation. JACC: Clin. Electrophysiol. 2025, 11, 977–986. [Google Scholar] [CrossRef]
- Pettus, J.; Roussel, R.; Zhou, F.L.; Bosnyak, Z.; Westerbacka, J.; Berria, R.; Jimenez, J.; Eliasson, B.; Hramiak, I.; Bailey, T.; et al. Rates of Hypoglycemia Predicted in Patients with Type 2 Diabetes on Insulin Glargine 300 U/ml Versus First- and Second-Generation Basal Insulin Analogs: The Real-World LIGHTNING Study. Diabetes Ther. 2019, 10, 617–633. [Google Scholar] [CrossRef] [PubMed]
- Kumar, S; Gupta, P; Dekker, ALAJ; Bermejo, I; Kar, S. Development and Validation of Multicenter Study on Novel Artificial Intelligence Based Cardiovascular Risk Score (AICVD). 2021. [Google Scholar] [CrossRef]
- Wang, Z.; Zhang, L.; Chao, Y.; Xu, M.; Geng, X.; Hu, X. DEVELOPMENT OF A MACHINE LEARNING MODEL FOR PREDICTING 28-DAY MORTALITY OF SEPTIC PATIENTS WITH ATRIAL FIBRILLATION. Shock 2023, 59, 400–408. [Google Scholar] [CrossRef]
- Ruan, H.; Ran, X.; Li, S.-S.; Zhang, Q. Dyslipidemia versus obesity as predictors of ischemic stroke prognosis: a multi-center study in China. Lipids Heal. Dis. 2024, 23, 1–14. [Google Scholar] [CrossRef] [PubMed]
- Liang, H.; Pan, K.; Wang, J.; Lin, J. Association between neutrophil percentage-to-albumin ratio and breast cancer in adult women in the US: findings from the NHANES. Front. Nutr. 2025, 12, 1533636. [Google Scholar] [CrossRef] [PubMed]
- Gao, Z.; Winhusen, T.J.; Gorenflo, M.; Ghitza, U.E.; Davis, P.B.; Kaelber, D.C.; Xu, R. Repurposing ketamine to treat cocaine use disorder: integration of artificial intelligence-based prediction, expert evaluation, clinical corroboration and mechanism of action analyses. Addiction 2023, 118, 1307–1319. [Google Scholar] [CrossRef]
- Pundi, K.; Fan, J.; Kabadi, S.; Din, N.; Blomström-Lundqvist, C.; Camm, A.J.; Kowey, P.; Singh, J.P.; Rashkin, J.; Wieloch, M.; et al. Dronedarone Versus Sotalol in Antiarrhythmic Drug–Naive Veterans With Atrial Fibrillation. Circ. Arrhythmia Electrophysiol. 2023, 16, 456–467. [Google Scholar] [CrossRef]
- Qu, J.; Li, C.; Liu, M.; Wang, Y.; Feng, Z.; Li, J.; Wang, W.; Wu, F.; Zhang, S.; Zhao, X. Prognostic Models Using Machine Learning Algorithms and Treatment Outcomes of Occult Breast Cancer Patients. J. Clin. Med. 2023, 12, 3097. [Google Scholar] [CrossRef]
- Park, S.W.; Park, Y.-L.; Lee, E.-G.; Chae, H.; Park, P.; Choi, D.-W.; Choi, Y.H.; Hwang, J.; Ahn, S.; Kim, K.; et al. Mortality Prediction Modeling for Patients with Breast Cancer Based on Explainable Machine Learning. Cancers 2024, 16, 3799. [Google Scholar] [CrossRef]
- Hu, J.; Gong, N.; Li, D.; Deng, Y.; Chen, J.; Luo, D.; Zhou, W.; Xu, K. Identifying hepatocellular carcinoma patients with survival benefits from surgery combined with chemotherapy: based on machine learning model. World J. Surg. Oncol. 2022, 20, 1–10. [Google Scholar] [CrossRef]
- Huang, C.; Liu, Z.; Xiao, L.; Xia, Y.; Huang, J.; Luo, H.; Zong, Z.; Zhu, Z. Clinical Significance of Serum CA125, CA19-9, CA72-4, and Fibrinogen-to-Lymphocyte Ratio in Gastric Cancer With Peritoneal Dissemination. Front. Oncol. 2019, 9, 1159. [Google Scholar] [CrossRef] [PubMed]
- Xu, S.; Xiang, C.; Wu, J.; Teng, Y.; Wu, Z.; Wang, R.; Lu, B.; Zhan, Z.; Wu, H.; Zhang, J. Tongue Coating Bacteria as a Potential Stable Biomarker for Gastric Cancer Independent of Lifestyle. Dig. Dis. Sci. 2020, 66, 2964–2980. [Google Scholar] [CrossRef] [PubMed]
- Makhnevich, A.; Perrin, A.; Talukder, D.; Liu, Y.; Izard, S.; Chiuzan, C.; D’aNgelo, S.; Affoo, R.; Rogus-Pulia, N.; Sinvani, L. Thick Liquids and Clinical Outcomes in Hospitalized Patients With Alzheimer Disease and Related Dementias and Dysphagia. JAMA Intern. Med. 2024, 184, 778–785. [Google Scholar] [CrossRef] [PubMed]
- Digumarthi, V.; Amin, T.; Kanu, S.; Mathew, J.; Edwards, B.; A Peterson, L.; E Lundy, M.; E Hegarty, K. Preoperative prediction model for risk of readmission after total joint replacement surgery: a random forest approach leveraging NLP and unfairness mitigation for improved patient care and cost-effectiveness. J. Orthop. Surg. Res. 2024, 19, 1–17. [Google Scholar] [CrossRef]
- Pimentel, SD; Yu, R. Re-evaluating the impact of hormone replacement therapy on heart disease using match-adaptive randomization inference 2024. [CrossRef]
- Feller, D.J.B.; Zucker, J.; Yin, M.T.; Gordon, P.; Elhadad, N. Using Clinical Notes and Natural Language Processing for Automated HIV Risk Assessment. Am. J. Ther. 2018, 77, 160–166. [Google Scholar] [CrossRef]
- Zoccali, C.; Tripepi, G. Clinical trial emulation in nephrology. J. Nephrol. 2024, 38, 11–23. [Google Scholar] [CrossRef]
- Patel, S.S.; Raman, V.K.; Zhang, S.; Sheriff, H.M.; Fonarow, G.C.; Heidenreich, P.A.; Faselis, C.; Lam, P.H.; Morgan, C.J.; Moore, H.; et al. Renin Angiotensin Inhibition and Lower Risk of Kidney Failure in Patients with Heart Failure. Am. J. Med. 2025, 138, 1384–1393.e5. [Google Scholar] [CrossRef]
- Inoue, K.; Seeman, T.E.; Horwich, T.; Budoff, M.J.; Watson, K.E. Heterogeneity in the Association Between the Presence of Coronary Artery Calcium and Cardiovascular Events: A Machine-Learning Approach in the MESA Study. Circulation 2023, 147, 132–141. [Google Scholar] [CrossRef]
- Pietropaoli, D.; Monaco, A.; D’aIuto, F.; Aguilera, E.M.; Ortu, E.; Giannoni, M.; Czesnikiewicz-Guzik, M.; Guzik, T.J.; Ferri, C.; Del Pinto, R. Active gingival inflammation is linked to hypertension. J. Hypertens. 2020, 38, 2018–2027. [Google Scholar] [CrossRef]
- Fu, S.; Chen, L.; Lin, H.; Jiang, X.; Zhang, S.; Zhong, F.; Liu, D. Prediction Model for Delayed Behavior of Early Ambulation After Surgery for Varicose Veins of the Lower Extremity: A Prospective Case-Control Study. Arch. Phys. Med. Rehabilitation 2024, 105, 1908–1920. [Google Scholar] [CrossRef] [PubMed]
- Krishnamurthy, S.; Ks, K.; Dovgan, E.; Luštrek, M.; Piletič, B.G.; Srinivasan, K.; Li, Y.-C. (.; Gradišek, A.; Syed-Abdul, S. Machine Learning Prediction Models for Chronic Kidney Disease Using National Health Insurance Claim Data in Taiwan. Healthcare 2021, 9, 546. [Google Scholar] [CrossRef] [PubMed]
- Wang, X.; Guo, J.; Liu, H.; Zhao, T.; Li, H.; Wang, T. Impact of Social Participation Types on Depression in the Elderly in China: An Analysis Based on Counterfactual Causal Inference. Front. Public Heal. 2022, 10, 792765. [Google Scholar] [CrossRef] [PubMed]
- Ghosh, S.; Bian, J.; Guo, Y.; Prosperi, M. Deep propensity network using a sparse autoencoder for estimation of treatment effects. J. Am. Med Informatics Assoc. 2021, 28, 1197–1206. [Google Scholar] [CrossRef]
- Luo, Q.; Zheng, Z.; Luo, W.; Zhu, J. Development and external validation of interpretable machine learning models for personalized multiple treatment recommendations in non-small cell lung cancer. Int. J. Med Informatics 2025, 206, 106160. [Google Scholar] [CrossRef]
- Weymann, D.; Chan, B.; Regier, D.A. Genetic matching for time-dependent treatments: a longitudinal extension and simulation study. BMC Med Res. Methodol. 2023, 23, 1–13. [Google Scholar] [CrossRef]
- Cui, X.; Shi, Y.; He, X.; Zhang, M.; Zhang, H.; Yang, J.; Leng, Y. Abdominal physical examinations in early stages benefit critically ill patients without primary gastrointestinal diseases: a retrospective cohort study. Front. Med. 2024, 11, 1338061. [Google Scholar] [CrossRef]
- Chen, M.; Yang, J.; Lu, J.; Zhou, Z.; Huang, K.; Zhang, S.; Yuan, G.; Zhang, Q.; Li, Z. Ureteral calculi lithotripsy for single ureteral calculi: can DNN-assisted model help preoperatively predict risk factors for sepsis? Eur. Radiol. 2022, 32, 8540–8549. [Google Scholar] [CrossRef] [PubMed]
- Squiccimarro, E.; Lorusso, R.; Consiglio, A.; Labriola, C.; Haumann, R.G.; Piancone, F.; Speziale, G.; Whitlock, R.P.; Paparella, D. Impact of Inflammation After Cardiac Surgery on 30-Day Mortality and Machine Learning Risk Prediction. J. Cardiothorac. Vasc. Anesthesia 2024, 39, 683–691. [Google Scholar] [CrossRef] [PubMed]
- Digumarthi, V.; Amin, T.; Kanu, S.; Mathew, J.; Edwards, B.; A Peterson, L.; E Lundy, M.; E Hegarty, K. Preoperative prediction model for risk of readmission after total joint replacement surgery: a random forest approach leveraging NLP and unfairness mitigation for improved patient care and cost-effectiveness. J. Orthop. Surg. Res. 2024, 19, 1–17. [Google Scholar] [CrossRef] [PubMed]
- Chen, M.; Yang, J.; Lu, J.; Zhou, Z.; Huang, K.; Zhang, S.; Yuan, G.; Zhang, Q.; Li, Z. Ureteral calculi lithotripsy for single ureteral calculi: can DNN-assisted model help preoperatively predict risk factors for sepsis? Eur. Radiol. 2022, 32, 8540–8549. [Google Scholar] [CrossRef]







| Country | Rank all disciplines | Rank in medicine | Rank in artificial intelligence | Rank in Statistics and probability |
| China | 2 | 2 | 1 | 2 |
| United states | 1 | 1 | 2 | 1 |
| South Korea | 13 | 14 | 12 | 16 |
| Japan | 5 | 5 | 4 | 10 |
| Germany | 4 | 4 | 6 | 4 |
| France | 7 | 7 | 7 | 5 |
| Canada | 9 | 8 | 9 | 8 |
| Italy | 8 | 6 | 8 | 5 |
| India | 6 | 9 | 3 | 7 |
| Journal name | SNIP | Quarter | Research area |
| Frontiers in Oncology | 0.831 | 2. | Cancer Research, Oncology |
| Frontiers in Cardiovascular Medicine | 0.742 | 2. | Cardiology and Cardiovascular Medicine |
| BMC Infectious Diseases | 1.106 | 1. | Infectious Diseases |
| Frontiers in Public Health | 0.938 | 2. | Public Health, Environmental and Occupational Health |
| Journal of Clinical Medicine | 1.022 | 1. | Medicine (all) |
| JMIR Medical Informatics | 1.035 | 2. | Health Information Management Health Informatics |
| Cancers | 1.030 | 2. | Cancer Research Oncology |
| Frontiers in Pharmacology | 0.999 | 1. | Pharmacology Pharmacology (medical) |
| BMC Public Health | 1.386 | 1. | Public Health, Environmental and Occupational Health |
| European Radiology | 1.775 | 1. | Radiology, Nuclear Medicine and Imaging |
| Frontiers in Endocrinology | 1.122 | 2. | Endocrinology, Diabetes and Metabolism |
| Frontiers in Medicine | 0.879 | 1. | Medicine (all) |
| Theme | Prolific author's keywords association sub-networks | Publications describing AI use in combination with PSM | Publications describing PSM use in AI |
| Prediction Blue (14 author keywords) |
Cardiovascular diseases – Diabetes mellitus Atrial fibrillation – sepsis-prediction |
[45], [46,47,48] [49] |
[50,51] [52,53] |
| Cancer management Red (15 author keywords) |
Breast cancer – SEER Hepatocellular carcinoma – SEER – chemotherapy-survival Gastric cancer – random forest Natural language model, prediction modelling |
[54] [55,56] |
[57,58] [59] [60,61] [62,63] |
| Diagnosing Green (14 author keywords) |
Coronary heart diseases – diagnosis Diagnosis – Intensive care unit – Public health Chronic kidney disease – Electronic health record |
[64] [65,66] [67] |
[68,69] [70] [71] |
| Deep learning Yellow (7 author keywords) |
Casual inference – Big data – deep learning Monte Carlo simulation Computer tomography – deep learning |
[72,73,74] [75] |
[76] [77] |
| Theme | Association sub-networks | Synthesis of publications | ||||
| Prediction | Cardiovascular diseases – Diabetes mellitus- | Binomial regression models and random forest regression was performed on a dataset of high risk COVID-19 subjects (inclusion criteria: age over 65 years old, presence of solid or haematological cancer, chronic kidney disease, chronic liver disease, chronic lung disease, uncontrolled diabetes, neurological disease, cardiovascular disease, obesity, cerebrovascular disease or being immunocompromised (AIDS, solid organ or blood stem cell transplantation, and all conditions requiring use of corticosteroids or other immunosuppressive medications)) after performing PSM based on being early treated or not [45]. Lim et al. published a study in 2023 conducted a nationwide population-based cohort study comparing cardiovascular outcomes between new and existing users of dapagliflozin and empagliflozin in type 2 diabetes patients. Using a Korean cohort dataset, the authors employed a nearest-neighbours machine learning approach for propensity score matching prior to statistical analysis [47]. The LIGHTNING study modelled, predicted, and compared hypoglycaemia rates of people with type 2 diabetes, comparing patients using first or second-generation insulin preparations. During analysis, authors first used conventional (PSM) and then advanced machine learning [50]. A large-scale Indian patient database was analysed using the Spearman correlation coefficient method and Deep Learning to build a hazard model, which was used to predict CVD events and their time of occurrence that reportedly had a good performance rate. PSM was used first to match patients with and without CVD [51]. Xie et al. investigated the utility of coronary computed tomography angiography (CCTA) in detecting cancer treatment-related coronary artery impairments and predicting major adverse cardiovascular events (MACE) in lung cancer patients undergoing chemotherapy or chemo-radiotherapy. Their methodology involved: (1) AI-driven image recognition for initial assessment, (2) propensity score (PS) matching to compare patients with and without carcinoma, and (3) Cox regression modelling to evaluate differences in MACE-free survival rates. [46]. Squiccimarro et al [78] performed a retrospective cohort study (N=1,908) examining systemic inflammatory response syndrome (SIRS) impact on 30-day mortality post-cardiac surgery and developed predictive machine learning models. PSM was used to balance the training set. SIRS significantly increased mortality risk; models achieved AUC up to 0.82. |
||||
| Atrial fibrillation – sepsis-prediction |
In a study by Wang et al. [52], authors developed a model to predict the risk of mortality in septic patients with atrial fibrillation using different ML algorithms. They used PSM to reduce the imbalance between the external validation and internal validation data sets. In another study, Ruan et al. [53] used five different ML algorithms to determine whether dyslipidaemia or obesity contributes more towards unfavourable clinical outcomes in patients suffering a first-ever ischemic stroke. PMS was employed to ascertain associations between indicators and prognosis. The study applied propensity score matching and a causal machine learning framework to predict heterogeneous treatment effects of LAAO versus DOAC in atrial fibrillation patients, enabling AI-driven individualized benefit estimation for improved patient selection and clinical decision-making [49]. |
|||||
| Cancer management | Breast cancer – SEER (Surveillance, Epidemiology, and End Results) database | A study used the SEER database to identify the prognostic variables for patients with occult breast cancer, which is an uncommon malignant tumour for which the prognosis and treatment remain a controversial topic. Cox regression analysis was performed to construct prognostic models with the help of six machine-learning algorithms to predict overall survival. The authors further examined the impact of chemotherapy and surgery on survival outcomes in occult breast cancer patients stratified by molecular subtype, utilizing Kaplan-Meier survival analysis and propensity score matching. These findings were subsequently validated through subgroup Cox regression analysis [57]. In a similar study, South Korean investigators used machine learning-based risk factor detection and breast cancer mortality prediction with the Shappley Additive Explanation (SHAP), which is an explainable artificial intelligence technique, to identify and interpret key features that have a significant impact on breast cancer mortality. To enhance the robustness and generalizability of their primary findings and balance the baseline covariates, they employed an exposure-driven 1:3 propensity score matching (PSM) analysis while minimizing a logistic regression model with the implications of potential confounders [58]. Liang et al [54] analyzed 18,726 NHANES participants to examine breast cancer prevalence and neutrophil-percentage-to-albumin ratio (NPAR), revealing a significant positive association, potentially mediated by sex hormone levels, validated through advance multivariate, subgroup, and propensity score analyses. |
||||
| Hepatocellular carcinoma – SEER (Surveillance, Epidemiology, and End Results) database – chemotherapy-survival |
Patients diagnosed with hepatocellular carcinoma between January 2010 and December 2015 were identified through the SEER (Surveillance, Epidemiology, and End Results) database. The researchers first conducted univariate and multivariate logistic regression analyses to assess prognostic factors, then developed a 5-year survival risk prediction model using classical decision tree methodology. To address potential confounding variables related to chemotherapy administration, propensity score matching was implemented for both high-risk and low-risk patient cohorts [59]. | |||||
| Gastric cancer – random forests |
According to Huang C. et al., clinical data from 391 gastric cancer patients (including 86 peritoneal dissemination cases) were analyzed using a 1:3 propensity score matching approach. The researchers subsequently performed both univariate and multivariate conditional logistic regression analyses. Their methodology further incorporated classification tree analysis to establish decision rules, followed by random forest algorithm implementation to extract significant risk factors for peritoneal dissemination in gastric cancer [47]. Another study aimed to explore the association of the tongue coating microbiota with the serum metabolic features and inflammatory cytokines in GC patients to seek a potential non-invasive biomarker for diagnosing GC. The tongue coating microbiota was profiled by 16S rRNA and 18S rRNA genes sequencing technology in the original population with 181 GC patients and 112 healthy controls (HCs). The propensity score matching method was used to eliminate potential confounders, including age, gender, and six lifestyle factors, and a matching population was created. For the diagnosis of GC a random forest model was constructed. |
|||||
| Natural language model prediction modelling |
As reported by Gao Z. et al, an innovative integrated strategy was developed to identify FDA-approved drugs for repurposing in cocaine use disorder (CUD) treatment. The study combined AI-driven drug prediction with clinical validation through the National Drug Abuse Treatment Clinical Trials Network (CTN), incorporating expert panel review and mechanistic action analysis. Based on combined AI prioritization and clinical expertise, ketamine emerged as the top candidate for further evaluation. The team conducted electronic health record (EHR) analysis comparing CUD outcomes in patients prescribed ketamine (for anesthesia/depression) against propensity-matched controls receiving alternative treatments. Complementary genetic and pathway enrichment analyses were performed to elucidate ketamine’s potential mechanisms of action in CUD [55]. In another study, PSM balanced the covariates across two groups of Alzheimer's disease and related dementia patients with oropharyngeal during hospitalization, whether at least 75% of their hospital diet consisted of a thick liquid diet or a thin liquid diet. Machine learning was used to predict hospital outcomes such as mortality, length of stay, and complications [62]. Varun et al. analyzed data from 38,581 shoulder and hip replacement patients (2015-2021) to develop a random forest model predicting 30-day post-discharge outcomes (emergency department visits, unplanned readmissions, or discharge to skilled nursing facilities). The study incorporated 98 features spanning laboratory results, diagnoses, vital signs, medications, and utilization history. Notably, the researchers employed a Clinical BERT-finetuned NLP model to generate risk scores from clinical notes. To address potential biases, the methodology combined propensity score matching with comprehensive feature bias analysis, implementing Fairlearn toolkit’s threshold optimization to mitigate gender and payer-related prediction disparities [79]. Krishna et al. [56] performed natural language processing on clinical records of patients from the Veterans Health Administration database, which were antiarrhythmic drug-naive, to identify and compare baseline left ventricular ejection fraction between treatments with different drugs. They used 1:1 propensity score matching based on patient demographics, comorbidities, and medications, as well as Cox regression to compare strategies. A falsification analysis with non-plausible outcomes was performed to evaluate residual confounding. |
|||||
| Diagnosing | Coronary heart diseases-diagnosis |
While coronary artery calcium (CAC) is an established predictor of cardiovascular disease (CVD), optimal screening strategies require identification of populations deriving maximal benefit from CAC detection. Kosuke et al. examined whether CAC’s predictive value varies across demographic subgroups in the Multi-Ethnic Study of Atherosclerosis (MESA) cohort (ages ≥45, CVD-free at baseline). After 1:1 propensity score matching, the team employed machine learning causal forest modelling to: (1) quantify heterogeneity in CAC-CVD associations, and (2) predict individualized 10-year CVD risk increases when CAC>0 versus CAC=0. These machine learning estimates were subsequently benchmarked against absolute 10-year ASCVD risks calculated via 2013 ACC/AHA pooled cohort equations [68]. A recent study explored whether gingival bleeding - a simple clinical indicator of periodontal disease - might serve as a marker for hypertension. Given the established link between cardiovascular diseases and systemic inflammation, with periodontitis potentially exacerbating this inflammatory burden, researchers analyzed NHANES III data from 5,396 adults aged ≥30 years who completed both blood pressure assessments and periodontal exams. Using survey-based propensity score matching that accounted for key confounders shared by hypertension and periodontal disease, they created matched cohorts with and without gingival bleeding. The analysis employed generalized additive models adjusted for inflammatory markers to evaluate associations between bleeding gums and both systolic blood pressure (mmHg) and uncontrolled hypertension. Further stratification by periodontal status (healthy, gingivitis, stable periodontitis, unstable periodontitis) provided additional insights, while machine learning techniques helped determine variable importance in these relationships [69]. Samuel et al. developed a computationally efficient algorithm that properly characterizes and samples from the conditional distribution of treatment following optimal propensity score matching, while accounting for Z-dependence. This innovation addresses a fundamental methodological challenge: unlike traditional matched-pair designs where pairs are fixed beforehand, propensity score matches are constructed post-treatment based on observed status. Consequently, standard permutation-based inference methods become invalid since treatment permutations could yield entirely different matched sets—a dependency (Z-dependence) that conventional approaches fail to consider [64]. |
||||
| Diagnosing | Diagnosis – Intensive care unit- Public health | Feller et al. [65] evaluated the added value of natural language processing (NLP) for enhancing HIV diagnosis prediction models. Their study included 181 HIV-positive patients receiving care at New York Presbyterian Hospital prior to confirmed diagnosis, along with 543 propensity-matched HIV-negative controls. Researchers extracted structured EHR data (demographics, laboratory results, diagnosis codes) and unstructured clinical notes from the pre-diagnosis period. They then developed three machine learning models: (1) a baseline model using only structured EHR data, (2) baseline plus NLP-derived topics, and (3) baseline plus NLP-extracted clinical keywords. Results demonstrated that incorporating NLP features significantly improved predictive accuracy for HIV risk assessment In a review paper, Zaccali and Tripepi [66] claim that trial emulation with PMS use in observational studies represents a significant advancement in epidemiology and can support improving public health outcomes. However, traditional PSM techniques face challenges like data quality, unmeasured confounding, and implementation complexity that could be overcome with machine learning techniques and developing methods to address unmeasured confounding. In a recent study, Fu et al. [70] collected information from selected participants before surgery and followed up until the day after surgery, then divided them into a normal and delayed ambulation group. Propensity score matching was applied to all participants by type of surgery and anaesthesia. All the characteristics in the two groups were compared using logistic regression, back propagation neural network (BPNN), and decision tree models. The accuracy, sensitivity, specificity, and area under the curve (AUC) values of the three models were compared to determine the optimal model. |
||||
| Chronic kidney disease – Electronic health record |
Krishnamurthy et al. [52] developed a machine-learning model to predict incidents of chronic kidney disease (CKD) 6-12 months before clinical onset using Taiwan’s National Health Insurance claims data. The study employed propensity score matching to select 18,000 CKD cases and 72,000 matched controls, analysing two years of demographic, medication, and comorbidity history for each subject. Among various algorithms tested, convolutional neural networks (CNNs) demonstrated superior predictive performance. Tree-based feature importance analysis identified diabetes mellitus, advanced age, gout, and specific medication use (particularly sulphonamides and RAAS inhibitors) as the strongest predictors of CKD development. Using AI phenotyping and propensity score matching on 168,860 Veterans with heart failure, balanced on 77 covariates, high-dose RAS inhibitors showed lower kidney failure risk versus low-dose [67]. |
|||||
| Deep Learning | Casual inference – deep learning |
Cui et al. [76] conducted a large-scale analysis of ICU patients without primary gastrointestinal diseases using the MIMIC-IV database to evaluate the prognostic value of abdominal physical examinations (palpation and auscultation). Patients were stratified based on examination status, with 28-day mortality as the primary endpoint. The researchers employed multiple analytical approaches: Cox proportional hazards models, propensity score matching, and inverse probability treatment weighting. Following initial analysis, the examined cohort was randomly split into training (80%) and testing (20%) sets, while patients with primary GI conditions served as an external validation group. Six machine learning algorithms—Random Forest, Gradient Boosting Decision Trees, AdaBoost, Extra Trees, Bagging, and Multilayer Perceptron—were subsequently implemented to develop predictive models for in-hospital mortality. To optimally evaluate the relationship between participation in different types of social activities and depression in the elderly, Wang et al. [72] used propensity score matching (PSM) for analysis based on the counterfactual framework. The specific matching methods used were the k-nearest neighbours matching method, kernel matching method, and radius matching method. To reduce the underlying bias in observational studies, Ghosh et al. [75]developed a new deep learning architecture for propensity score matching and counterfactual prediction. Machine learning enhanced propensity score estimation by improving covariate balance, reducing bias in observational studies, and enabling robust causal inference, thereby advancing methodological rigor in treatment effect in patients with non-small lung cancer, by analysis across complex, high-dimensional healthcare and social science datasets [74] |
||||
| Monte Carlo simulation |
Weyman et al. [75] addressed limitations of manual longitudinal propensity score matching by developing a machine learning-enhanced genetic matching approach that automatically optimizes covariate history balancing. Through Monte Carlo simulation studies, the authors demonstrated superior performance of their automated method compared to traditional manual matching techniques. |
|||||
| Computer tomography – deep learning | Chen et al. [80] investigated radiomics and deep learning approaches for predicting sepsis risk following stone removal procedures (FURL/PCNL) in ureteral calculus patients. After propensity score matching, they developed: (1) a radiomics model for sepsis prediction, and (2) an enhanced deep learning model to boost predictive accuracy. LASSO regression identified 26 key predictive variables. The deep neural network (DNN) implementation showed improved AUC in internal validation, with subsequent external validation confirming model generalizability by addressing overfitting concerns. |
|||||
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
