Comparison of Sentiment Analysis Algorithms with SMOTE Oversampling and TF-IDF Implementation on Google Reviews for Public Health Centers

I Gede Bintang Arya Budaya; I Ketut Putu Suniantara

doi:10.57152/malcom.v4i3.1459

Authors

I Gede Bintang Arya Budaya Institute of Technology and Business STIKOM Bali
I Ketut Putu Suniantara Institute of Technology and Business STIKOM Bali

DOI:

https://doi.org/10.57152/malcom.v4i3.1459

Keywords:

Binary Classification, Imbalanced Classes, Sentiment Classification, Supervised Learning, User Feedback

Abstract

Sentiment analysis, or opinion mining, is a key area of natural language processing that identifies sentiments in free text. As digital business services grow and user-generated content increases, analyzing sentiments in online reviews is vital for enhancing business operations and customer satisfaction. This study focuses on sentiment analysis of user reviews from Google Reviews for Public Health Centers (PHCs) in Bali, Indonesia, using five machine learning models: Logistic Regression, Support Vector Machine (SVM), XGBoost, Naive Bayes, and Random Forest. These models classified sentiments into positive and negative categories using a dataset balanced with SMOTE to improve accuracy. We divided a total of 1.834 reviews, using 20% for testing and 80% for training, to ensure a thorough evaluation under real-world conditions. Logistic Regression and Naive Bayes performed best, both achieving an accuracy of 0.89, with Logistic Regression providing a balanced precision and recall. The study enhances academic understanding of sentiment analysis in healthcare and offers insights for business administrators on handling online customer feedback. The findings stress the importance of choosing suitable machine learning techniques based on specific data characteristics and project requirements to optimize both technological and business outcomes.

Downloads

Download data is not yet available.

References

R. Wijayanti and A. Arisal, “Automatic Indonesian sentiment lexicon curation with sentiment valence tuning for social media sentiment analysis,” ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), vol. 20, no. 1, pp. 1–16, 2021.

D. Fimoza, A. Amalia, and T. H. F. Harumy, “Sentiment analysis for movie review in Bahasa Indonesia using BERT,” in 2021 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA), IEEE, 2021, pp. 27–34.

Y. Fauziah, B. Yuwono, and A. S. Aribowo, “Lexicon based sentiment analysis in Indonesia languages: A systematic literature review,” in RSF Conference Series: Engineering and Technology, 2021, pp. 363–367.

J. Ipmawati, S. Saifulloh, and K. Kusnawi, “Analisis Sentimen Tempat Wisata Berdasarkan Ulasan pada Google Maps Menggunakan Algoritma Support Vector Machine: Sentiment Analysis of Tourist Attractions Based on Reviews on Google Maps Using the Support Vector Machine Algorithm,” MALCOM: Indonesian Journal of Machine Learning and Computer Science, vol. 4, no. 1, pp. 247–256, 2024.

A. Harun and D. P. Ananda, “Analisa Sentimen Opini Publik Tentang Vaksinasi Covid-19 di Indonesia Menggunakan Naïve bayes dan Decission Tree: Analysis of Public Opinion Sentiment About Covid-19 Vaccination in Indonesia Using Naïve Bayes and Decission Tree,” MALCOM: Indonesian Journal of Machine Learning and Computer Science, vol. 1, no. 1, pp. 58–64, 2021.

E. Ditendra, S. Suryani, S. Romelah, M. H. A. Tanjung, and M. Sarah, “Perbandingan Algoritma Klasifikasi untuk Analisis Sentimen Islam Nusantara di Indonesia: Comparison of Classification Algorithms for Sentiment Analysis of Islam Nusantara in Indonesia,” Malcom: Indonesian Journal of Machine Learning and Computer Science, vol. 2, no. 1, pp. 71–77, 2022.

I. G. B. A. Budaya, I. K. Dharmendra, D. P. Agustino, I. G. Harsemadi, I. M. P. P. Wijaya, and I. G. P. M. Yusadara, “Evaluation of Public Health Centers Performance through Sentiment Analysis using LSTM in Bali Province, Indonesia,” in 2023 11th International Conference on Cyber and IT Service Management (CITSM), IEEE, 2023, pp. 1–6.

O. A. El-Said, “Impact of online reviews on hotel booking intention: The moderating role of brand image, star category, and price,” Tour Manag Perspect, vol. 33, p. 100604, 2020.

V. Schoenmueller, O. Netzer, and F. Stahl, “The polarity of online reviews: Prevalence, drivers and implications,” Journal of Marketing Research, vol. 57, no. 5, pp. 853–877, 2020.

S. A. H. Bahtiar, C. K. Dewa, and A. Luthfi, “Comparison of Naïve Bayes and Logistic Regression in Sentiment Analysis on Marketplace Reviews Using Rating-Based Labeling,” Journal of Information Systems and Informatics, vol. 5, no. 3, pp. 915–927, 2023.

M. Z. Yumarlin, J. E. Bororing, and S. Rahayu, “Analisis Sentimen Terhadap Layanan Tokopedia Berdasarkan Twitter dengan Metode Klasifikasi Support Vector Machine,” Smart Comp: Jurnalnya Orang Pintar Komputer, vol. 12, no. 1, pp. 153–163, 2023.

J. Ipmawati, S. Saifulloh, and K. Kusnawi, “Analisis Sentimen Tempat Wisata Berdasarkan Ulasan pada Google Maps Menggunakan Algoritma Support Vector Machine: Sentiment Analysis of Tourist Attractions Based on Reviews on Google Maps Using the Support Vector Machine Algorithm,” MALCOM: Indonesian Journal of Machine Learning and Computer Science, vol. 4, no. 1, pp. 247–256, 2024.

K. Afifah, I. N. Yulita, and I. Sarathan, “Sentiment Analysis on Telemedicine App Reviews using XGBoost Classifier,” in 2021 international conference on artificial intelligence and Big data analytics, IEEE, 2021, pp. 22–27.

A. Samih, A. Ghadi, and A. Fennan, “Enhanced sentiment analysis based on improved word embeddings and XGboost.,” International Journal of Electrical & Computer Engineering (2088-8708), vol. 13, no. 2, 2023.

B. Warsito and A. Prahutama, “Sentiment analysis on tokopedia product online reviews using random forest method,” in E3S Web of Conferences, EDP Sciences, 2020, p. 16006.

S. Khomsah, “Sentiment analysis on youtube comments using word2vec and random forest,” Telematika: Jurnal Informatika dan Teknologi Informasi, vol. 18, no. 1, pp. 61–72, 2021.

I. G. B. A. Budaya, L. P. S. Pratiwi, and D. P. Agustino, “Klasifikasi Sentimen untuk Analisis Kepuasan Pelayanan Puskesmas Berbasis Arsitektur LSTM,” Smart Comp: Jurnalnya Orang Pintar Komputer, vol. 12, no. 4, pp. 941–948, 2023.

S. George and V. Srividhya, “Performance evaluation of sentiment analysis on balanced and imbalanced dataset using ensemble approach,” Indian J Sci Technol, vol. 15, no. 17, pp. 790–797, 2022.

S. Singh, K. Kumar, and B. Kumar, “Sentiment analysis of Twitter data using TF-IDF and machine learning techniques,” in 2022 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COM-IT-CON), IEEE, 2022, pp. 252–255.

W. A. Prabowo and F. Azizah, “Sentiment analysis for detecting cyberbullying using tf-idf and svm,” Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), vol. 4, no. 6, pp. 1142–1148, 2020.

S. Kedas, A. Kumar, and P. K. Jain, “Dealing with Class Imbalance in Sentiment Analysis Using Deep Learning and SMOTE,” in Advances in Data Computing, Communication and Security: Proceedings of I3CS2021, Springer, 2022, pp. 407–416.

D. P. Chatterjee, S. Mukhopadhyay, S. Goswami, and P. K. Panigrahi, “Efficacy of oversampling over machine learning algorithms in case of sentiment analysis,” in Data Management, Analytics and Innovation: Proceedings of ICDMAI 2020, Volume 2, Springer, 2021, pp. 247–260.

M. H. Setiawan, I. G. A. Gunadi, and G. Indrawan, “Klasifikasi Pelayanan Kesehatan Berdasarkan Data Sentimen Pelayanan Kesehatan menggunakan Multiclass Support Vector Machine,” Jurnal Sistem dan Informatika (JSI), vol. 17, no. 1, pp. 47–54, 2022.