Comparison and Data Visualization in Thyroid Cancer Disease Prediction Using Machine Learning Algorithms
DOI:
https://doi.org/10.57152/malcom.v6i1.2249Keywords:
Gradient Boosting, Machine Learning, Random Forest, SMOTE, Thyroid CancerAbstract
Thyroid cancer is a common endocrine malignancy requiring accurate early prediction for improved patient outcomes. Comprehensive comparative studies of machine learning algorithms, accompanied by systematic visualization, remain limited. This study compares tree-based algorithms (Decision Trees, Random Forest) and boosting algorithms (Gradient Boosting, XGBoost) for thyroid cancer prediction and develops visualization strategies for clinical interpretation. Four algorithms were evaluated using accuracy (correct prediction proportion), precision (positive predictive value), recall (true positive rate), F1-score (harmonic mean of precision and recall), and AUC-ROC (area under the ROC curve). Visualization techniques, including confusion matrices, ROC curves, and feature importance plots, facilitated the interpretation of the model. XGBoost achieved superior performance with accuracy 95.2%, precision 94.8%, recall 95.6%, F1-score 95.2%, and AUC-ROC 0.978, followed by Random Forest (93.5%, 92.7%, 94.1%, 93.4%, 0.965), Gradient Boosting (91.8%, 90.9%, 92.4%, 91.6%, 0.952), and Decision Trees (87.3%, 86.5%, 88.2%, 87.3%, 0.913). Feature importance analysis identified key predictors. Boosting algorithms, particularly XGBoost, demonstrate superior thyroid cancer prediction across all metrics. Integrated visualization enhances clinical interpretability, providing empirical guidance for implementing machine learning-based diagnostic support systems.
Downloads
References
A. D. Putri, F. Sholekhah, E. Dadynata, L. Efrizoni, R. Rahmaddeni, and N. Sapina, “Penerapan Algoritma Decesion Tree C4.5 untuk Memprediksi Tingkat Kelangsungan Hidup Pasien Kanker Tiroid,” MALCOM Indones. J. Mach. Learn. Comput. Sci., vol. 4, no. 4, pp. 1485–1495, 2024.
N. P. Ohori and M. Nishino, “Follicular neoplasm of thyroid revisited: current differential diagnosis and the impact of molecular testing,” Adv. Anat. Pathol., vol. 30, no. 1, pp. 11–23, 2023.
A. H. Kitahara and J. A. Sosa, “The changing incidence of thyroid cancer,” Nat. Rev. Endocrinol., vol. 12, no. 11, pp. 646–653, 2016, doi: 10.1038/nrendo.2016.110.
J.-M. Correas et al., “Advanced ultrasound in the diagnosis of prostate cancer,” World J. Urol., vol. 39, no. 3, pp. 661–676, 2021, doi: https://doi.org/10.1007/s00345-020-03193-0.
Nurjanah, “Risiko Kambuhnya Kanker Tiroid Menggunakan Algoritma Random Forest,” in Prosiding SEMNAS INOTEK (Seminar Nasional Inovasi Teknologi), 2024.
L. Safitri, K. C. Murtiwiyati, S. Chodidjah, and D. Indayanti, “Perbandingan Metode Algoritma Decision Tree C4.5 dan Naïve Bayes untuk Memprediksi Penyakit Tiroid,” Journals Ners Community, 2022.
W. Apriliah, “Prediksi Kemungkinan Diabetes pada Tahap Awal Menggunakan Algoritma Klasifikasi Random Forest,” Sist. J. Sist. Inf., vol. 9, no. 3, pp. 648–658, 2020.
D. P. Sinambela, H. Naparin, M. Zulfadhilah, and N. Hidayah, “Implementasi Algoritma Decision Tree dan Random Forest dalam Prediksi Perdarahan Pascasalin,” J. Inf. dan Teknol., vol. 5, no. 3, pp. 58–64, 2023.
R. Irfannandhy, L. B. Handoko, and N. Ariyanto, “Analisis Performa Model Random Forest dan CatBoost dengan Teknik SMOTE dalam Prediksi Risiko Diabetes,” Edumatic J. Pendidik. Inform., vol. 8, no. 2, pp. 714–723, 2024.
R. A. W. Sujana and I. M. A. Agastya, “Application of Machine Learning Algorithm for Osteoporosis Disease Prediction System,” J. Appl. Informatics Comput., 2024.
M. L. T. Alfianti and R. Supriyanto, “Perbandingan Kinerja Algoritma Random Forest, AdaBoost, dan XGBoost dalam Memprediksi Risiko Penyakit Osteoporosis,” J. Appl. Informatics Comput., vol. 8, no. 1, pp. 45–52, 2024.
A. S. Munir and R. Waluyo, “Optimasi Prediksi Kematian pada Gagal Jantung: Analisis Perbandingan Algoritma Pembelajaran Ensemble dan Teknik Penyeimbangan Data pada Dataset,” J. Sist. dan Teknol. Inf., vol. 12, no. 2, pp. 365–372, 2024.
F. M. Herza, B. Rahmat, and M. A. Haromainy, “Pengaruh RFE Terhadap Logistic Regression dan Support Vector Machine pada Analisis Sentimen Hotel,” J. Teknol. Inf., vol. 8, no. 2, pp. 125–132, 2024.
K. A. Widagdo and R. G. K. Adi, “Kombinasi Feature Selection Fisher Score dan Principal Component Analysis untuk Klasifikasi Cervix Dysplasia,” J. Teknol. Inf. dan Ilmu Komput., vol. 7, no. 3, pp. 565–572, 2024.
N. F. Sahamony, T. Terttiaavini, and H. Rianto, “Analisis Perbandingan Kinerja Model Machine Learning untuk Memprediksi Risiko Stunting pada Pertumbuhan Anak,” MALCOM Indones. J. Mach. Learn. Comput. Sci., vol. 4, no. 2, pp. 413–422, 2024.
P. W. S. Aji, R. Dijaya, and Suprianto, “Prediksi Penyakit Stroke Menggunakan Metode Random Forest,” KESATRIA J. Penerapan Sist. Inf., vol. 4, no. 2, pp. 234–241, 2023.
R. Pramudita, N. Safitri, and V. Z. Nazah, “Studi Komparatif Algoritma Machine Learning dengan Teknik Bagging dan AdaBoost pada Klasifikasi Kanker Payudara,” TEMATIK, vol. 12, no. 1, pp. 101–108, 2025, doi: https://doi.org/10.38204/tematik.v12i1.2435.
E. K. Alexander, G. C. Kennedy, and Z. W. Baloc, “Preoperative diagnosis of benign and malignant thyroid nodules by gene expression classifier,” N. Engl. J. Med., vol. 367, no. 8, pp. 705–715, 2012, doi: 10.1056/NEJMoa1203208.
K. LeClair, K. J. L. Bell, L. Furuya-Kanamori, S. A. Doi, D. O. Francis, and L. Davies, “Evaluation of gender inequity in thyroid cancer diagnosis: differences by sex in US thyroid cancer incidence compared with a meta-analysis of subclinical thyroid cancer rates at autopsy,” JAMA Intern. Med., vol. 181, no. 10, pp. 1351–1358, 2021.
F. N. Tessler et al., “ACR Thyroid Imaging, Reporting and Data System (TI-RADS): White Paper of the ACR TI-RADS Committee,” J. Am. Coll. Radiol., vol. 14, no. 5, pp. 587–595, 2017, doi: 10.1016/j.jacr.2017.01.046.
A. Yusuf, T. Al Jaber, and N. Gordon, “Comprehensive Health Tracking Through Machine Learning and Wearable Technology,” J. Data Sci. Intell. Syst., vol. 00, no. 00, pp. 1–12, 2025, doi: 10.47852/bonviewJDSIS52023588.
A. Wajiej and S. Aburagaegah, “Predictive Computational Approaches in Pharmaceutical Microbiology: Machine Learning and In Silico Integration: A Review Study,” Alqalam J. Med. Appl. Sci., vol. 8, no. 2, pp. 1017–1021, 2025, doi: 10.54361/ajmas.258274.
Y. Kobayashi et al., “Visualizing fatigue mechanisms in non-communicable diseases: an integrative approach with multi-omics and machine learning,” BMC Med. Inform. Decis. Mak., vol. 25, no. 204, 2025, doi: 10.1186/s12911-025-03034-3.
A. M. Alqudah and Z. Moussavi, “Bridging Signal Intelligence and Clinical Insight: A Comprehensive Review of Feature Engineering, Model Interpretability, and Machine Learning in Biomedical Signal Analysis,” Appl. Sci., vol. 15, no. 22, p. 12036, 2025.
H. A. Salman, A. Kalakech, and A. Steiti, “Random forest algorithm overview,” Babylonian J. Mach. Learn., vol. 2024, pp. 69–79, 2024, doi: https://doi.org/10.58496/BJML/2024/007.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 M. Zahran Yudha, Jasmir Jasmir, Fachruddin Fachruddin

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Copyright © by Author; Published by Institut Riset dan Publikasi Indonesia (IRPI)
This Indonesian Journal of Machine Learning and Computer Science is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

















