TY - JOUR AU - Quang, Minh Trong AU - Nguyen, Minh Nam PY - 2026 DA - 2026/07/01 TI - A Machine Learning-Based Diagnostic Model for Prostate Cancer Using Circulating MicroRNA Expression Profiles JO - OBM Genetics SP - 346 VL - 10 IS - 03 AB - Prostate cancer (PCa) is one of the most common malignancies among men worldwide, and early detection is critical for improving clinical outcomes. Circulating microRNAs (miRNAs) have emerged as promising non-invasive biomarkers for cancer diagnosis due to their stability in blood and association with tumor-related molecular alterations. In this study, machine learning (ML) methods were applied to large-scale circulating miRNA expression data to develop a diagnostic model for PCa detection. Serum miRNA expression profiles were obtained from the Gene Expression Omnibus dataset GSE211692, which included 6,920 samples comprising 1,027 PCa cases and 5,893 non-cancer controls. To reduce the risk of overfitting and information leakage, preprocessing, normalization, feature selection, and hyperparameter optimization were performed within the training and cross-validation framework, with the held-out testing set used only for final internal evaluation. Four ML algorithms, namely Logistic Regression, K-Nearest Neighbors, Random Forest, and CatBoost, were implemented. Principal component analysis (PCA) was additionally performed on both the training and held-out test datasets to visualize the sample distribution by case-control status. Although PCA showed clear separation between PCa and non-cancer samples, complete batch-related metadata were unavailable; therefore, potential technical batch effects could not be fully excluded. Among the evaluated algorithms, the Random Forest classifier showed the strongest internal diagnostic performance. A three-miRNA panel comprising miR-1290, miR-1307-3p, and miR-4783-3p demonstrated strong discriminatory capability for PCa classification. However, because the model was developed and evaluated on a single public dataset without external validation, the reported performance should be interpreted with caution. Further validation in independent clinical cohorts, comparison with established biomarkers such as PSA, and integration of clinicopathological variables are required before clinical translation can be considered. SN - 2577-5790 UR - https://doi.org/10.21926/obm.genet.2603346 DO - 10.21926/obm.genet.2603346 ID - Quang2026 ER -