Guadalupe Gutiérrez-Esparza 1,2,* ,†, Mireya Martínez-García 3,† , Manlio F. Márquez-Murillo 2 , Malinalli Brianza-Padilla 3 , Enrique Hernández-Lemus 4,5,* and Luis M. Amezcua-Guerra 3,*
1 “Researcher for Mexico” Program under SECIHTI, Secretariat of Sciences, Humanities, Technology, and Innovation, Mexico City 08400, Mexico
2 Division of Diagnostic and Treatment Services, National Institute of Cardiology Ignacio Chávez, Mexico City 04510, Mexico; 该Email地址已收到反垃圾邮件插件保护。要显示它您需要在浏览器中启用JavaScript。
3 Department of Immunology, National Institute of Cardiology Ignacio Chávez, Mexico City 04510, Mexico; 该Email地址已收到反垃圾邮件插件保护。要显示它您需要在浏览器中启用JavaScript。 (M.M.-G.); 该Email地址已收到反垃圾邮件插件保护。要显示它您需要在浏览器中启用JavaScript。 (M.B.-P.)
4 Computational Genomics Division, National Institute of Genomic Medicine, Mexico City 14610, Mexico
5 Center for Complexity Sciences, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico
*Correspondence: 该Email地址已收到反垃圾邮件插件保护。要显示它您需要在浏览器中启用JavaScript。 (G.G.-E.); 该Email地址已收到反垃圾邮件插件保护。要显示它您需要在浏览器中启用JavaScript。 (E.H.-L.); 该Email地址已收到反垃圾邮件插件保护。要显示它您需要在浏览器中启用JavaScript。 (L.M.A.-G.)
† These authors contributed equally to this work.
Academic Editor: Motoyuki Iemitsu Received: 11 February 2025 Revised: 3 March 2025 Accepted: 6 March 2025 Published: 17 March 2025
Citation: Gutiérrez-Esparza, G.; Martínez-García, M.; Márquez Murillo, M.F.; Brianza-Padilla, M.; Hernández-Lemus, E.; Amezcua Guerra, L.M. Tlalpan 2020 Case Study: Enhancing Uric Acid Level Prediction with Machine Learning Regression and Cross-Feature Selection. Nutrients 2025, 17, 1052. https://doi.org/ 10.3390/nu17061052
Copyright: © 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/ licenses/by/4.0/)
Abstract: Background/Objectives: Uric acid is a key metabolic byproduct of purine degradation and plays a dual role in human health. At physiological levels, it acts as an antioxidant, protecting against oxidative stress. However, excessive uric acid can lead to hyperuricemia, contributing to conditions like gout, kidney stones, and cardiovascular diseases. Emerging evidence also links elevated uric acid levels with metabolic disorders, including hypertension and insulin resistance. Understanding its regulation is crucial for preventing associated health complications. Methods: This study, part of the Tlalpan 2020 project, aimed to predict uric acid levels using advanced machine learning algorithms. The dataset included clinical, anthropometric, lifestyle, and nutritional characteristics from a cohort in Mexico City. We applied Boosted Decision Trees (Boosted DTR), eXtreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), and Shapley Additive Explanations (SHAP) to identify the most relevant variables associated with hyperuricemia. Feature engineering techniques improved model performance, evaluated using Mean Squared Error (MSE), Root-Mean-Square Error (RMSE), and the coefficient of determination (R²). Results: Our study showed that XGBoost had the highest accuracy for anthropometric and clinical predictors, while CatBoost was the most effective at identifying nutritional risk factors. Distinct predictive profiles were observed between men and women. In men, uric acid levels were primarily influenced by renal function markers, lipid profiles, and hereditary predisposition to hyperuricemia, particularly paternal gout and diabetes. Diets rich in processed meats, high-fructose foods, and sugary drinks showed stronger associations with elevated uric acid levels. In women, metabolic and cardiovascular markers, family history of metabolic disorders, and lifestyle factors such as passive smoking and sleep quality were the main contributors. Additionally, while carbohydrate intake was more strongly associated with uric acid levels in women, fructose and sugary beverages had a greater impact in men. To enhance model robustness, a cross-feature selection approach was applied, integrating top features from multiple models, which further improved predictive accuracy, particularly in gender-specific analyses. Conclusions: These findings provide insights into the metabolic, nutritional characteristics, and lifestyle determinants of uric acid levels, supporting targeted public health strategies for hyperuricemia prevention.
Keywords: uric acid; regression-based machine learning; feature selection; feature engineering; Mexico City; Tlalpan 2020 cohort