Perbandingan Algoritma Machine Learning untuk Klasifikasi Hoaks Berbahasa Indonesia pada Dataset Komdigi
DOI:
https://doi.org/10.62712/juktisi.v5i1.1255Keywords:
Klasifikasi Hoaks, Machine Learning, TF-IDF, SMOTE, Dataset KomdigiAbstract
The spread of Indonesian-language hoaxes continues to increase along with the development of digital platforms, making it necessary to develop an automatic classification system capable of accurately and efficiently categorizing types of hoaxes. This study compares the performance of five machine learning algorithms, namely Support Vector Machine (SVM), Random Forest, Logistic Regression, Decision Tree, and Naive Bayes, in classifying Indonesian hoax categories using the Komdigi dataset consisting of 16,308 articles across six categories. Feature representation was performed using TF-IDF with n-gram combination (1,2) enriched with text statistical features, while the extreme class imbalance was handled using SMOTE applied internally within the Stratified K-Fold Cross-Validation pipeline to prevent data leakage. Evaluation results show that SVM (LinearSVC) achieved the highest accuracy of 95.9% and cross-validation score of 0.960, while Logistic Regression outperformed others in AUC Macro at 0.952 and macro F1-Score of 0.460, reflecting the best ability to recognize all categories in a balanced manner. Decision Tree showed the lowest performance with an AUC Macro of 0.635. These findings confirm that the selection of the best algorithm depends on the priority of evaluation metrics used according to the needs. This study contributes a recommendation of effective algorithms for Indonesian hoax classification and a valid, data leakage-free methodological framework.
Downloads
References
[1] A. Sarjito, “Hoaks, Disinformasi, dan Ketahanan Nasional: Ancaman Teknologi Informasi dalam Masyarakat Digital Indonesia,” J. Gov. Local Polit., vol. 6, no. 2, pp. 175–186, 2024, doi: 10.47650/jglp.v6i2.1547.
[2] M. D. Desriansyah, I. U. Sari, and Z. Zulfahmi, “Analisis Efektivitas Algoritma Machine Learning dalam Deteksi Hoaks: Pada Berita Digital Berbahasa Indonesia,” J. Sist. Inf. Dan Inform., vol. 3, no. 2, pp. 63–69, 2025, doi: 10.47233/jiska.v3i1.2024.
[3] N. Arifin, U. Enri, and N. Sulistiyowati, “Penerapan Algoritma Support Vector Machine (SVM) dengan TF-IDF N-Gram untuk Text Classification,” STRING (Satuan Tulisan Ris. dan Inov. Teknol., vol. 6, no. 2, p. 129, 2021, doi: 10.30998/string.v6i2.10133.
[4] R. N. Ramadhon, A. Ogi, A. P. Agung, R. Putra, S. S. Febrihartina, and U. Firdaus, “Implementasi Algoritma Decision Tree untuk Klasifikasi Pelanggan Aktif atau Tidak Aktif pada Data Bank,” Karimah Tauhid, vol. 3, no. 2, pp. 1860–1874, 2024, doi: 10.30997/karimahtauhid.v3i2.11952.
[5] F. L. Asep Ripa’i, Firman Santoso, “Deteksi Berita Hoax dengan Perbandingan Website Menggunakan Pendekatan Deep Learning Algoritma BERT,” G-Tech J. Teknol. Terap., vol. 6, no. 2, pp. 295–305, 2022.
[6] C. Haryawan and Y. M. K. Ardhana, “Analisa Perbandingan Teknik Oversampling SMOTE,” JIRE (Jurnal Inform. Rekayasa Elektron., vol. 6, no. 1, pp. 73–78, 2023.
[7] M. P. Pulungan, A. Purnomo, and A. Kurniasih, “Penerapan SMOTE untuk Mengatasi Imbalance Class dalam Klasifikasi Kepribadian MBTI Menggunakan Naive Bayes Classifier,” J. Teknol. Inf. dan Ilmu Komput., vol. 11, no. 5, pp. 1033–1042, 2024, doi: 10.25126/jtiik.2024117989.
[8] C. J. L. Tobing, IGN Lanang Wijayakusuma, and Luh Putu Ida Harini, “Perbandingan Kinerja IndoBERT dan MBERT Untuk Deteksi Berita Hoaks Politik dalam Bahasa Indonesia,” JST (Jurnal Sains dan Teknol., vol. 14, no. 1, pp. 114–123, 2025, doi: 10.23887/jstundiksha.v14i1.92126.
[9] A. M. Wahid, Turino, K. A. Nugroho, D. Titi Safitri4, and F. S. Utomo, “Optimasi Logistic Regression dan Random Forest untuk Deteksi Berita Hoax Optimasi Logistic Regression dan Random Forest untuk Deteksi Berita Hoax Berbasis Hyperparameter Optimization of Logistic Regression and Random Forest for Hoax News Detection Using T,” J. Pendidik. dan Teknol. Indones., vol. 4, no. January, pp. 381–392, 2025.
[10] I. N. Rizki, D. Prayoga, M. L. Puspita, and M. Q. Huda, “Implementasi Exploratory Data Analysis Untuk Analisis Dan Visualisasi Data Penderita Stroke Kalimantan Selatan Menggunakan Platform Tableau,” J. Inform. dan Tek. Elektro Terap., vol. 12, no. 1, 2024, doi: 10.23960/jitet.v12i1.3856.
[11] I. A. Rahma and L. H. Suadaa, “Penerapan Text Augmentation untuk Mengatasi Data yang Tidak Seimbang pada Klasifikasi Teks Berbahasa Indonesia,” J. Teknol. Inf. dan Ilmu Komput., vol. 10, no. 6, pp. 1329–1340, 2023, doi: 10.25126/jtiik.2023107325.
[12] T. Gori, A. Sunyoto, and H. Al Fatta, “Preprocessing Data dan Klasifikasi untuk Prediksi Kinerja Akademik Siswa,” J. Teknol. Inf. dan Ilmu Komput., vol. 11, no. 1, pp. 215–224, 2024, doi: 10.25126/jtiik.20241118074.
[13] A. Arasy and S. Agustian, “Sentiment Classification Using Multilayer Perceptron Algorithm with TF-IDF Features Klasifikasi Sentimen Menggunakan Metode Multilayer Perceptron dengan Fitur TF-IDF,” vol. 5, no. July, pp. 908–919, 2025.
[14] M. Sulistiyono et al., “Implementasi Algoritma Synthetic Minority Over - Sampling Technique untuk Menangani Ketidakseimbangan Kelas pada Dataset Klasifikasi,” vol. 10, pp. 445–459, 2021.
[15] T. H. Pinem and Z. P. Putra, “Evaluasi Kinerja Algoritma Klasifikasi Deep Learning dalam Prediksi Diabetes,” J. Ilm. FIFO, vol. 17, no. 1, p. 17, 2025, doi: 10.22441/fifo.2025.v17i1.003.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Haris Setyo Pratomo, Panny Agustia Rahayuningsih, Muhammad Rezki

This work is licensed under a Creative Commons Attribution 4.0 International License.















