Perbandingan Algoritma Machine Learning untuk Klasifikasi Hoaks Berbahasa Indonesia pada Dataset Komdigi

Haris Setyo  Pratomo; Panny Agustia Rahayuningsih; Muhammad Rezki

doi:10.62712/juktisi.v5i1.1255

Authors

Haris Setyo Pratomo Universitas Bina Sarana Informatika
Panny Agustia Rahayuningsih Universitas Bina Sarana Informatika
Muhammad Rezki Universitas Bina Sarana Informatika

DOI:

https://doi.org/10.62712/juktisi.v5i1.1255

Keywords:

Klasifikasi Hoaks, Machine Learning, TF-IDF, SMOTE, Dataset Komdigi

Abstract

The spread of Indonesian-language hoaxes continues to increase along with the development of digital platforms, making it necessary to develop an automatic classification system capable of accurately and efficiently categorizing types of hoaxes. This study compares the performance of five machine learning algorithms, namely Support Vector Machine (SVM), Random Forest, Logistic Regression, Decision Tree, and Naive Bayes, in classifying Indonesian hoax categories using the Komdigi dataset consisting of 16,308 articles across six categories. Feature representation was performed using TF-IDF with n-gram combination (1,2) enriched with text statistical features, while the extreme class imbalance was handled using SMOTE applied internally within the Stratified K-Fold Cross-Validation pipeline to prevent data leakage. Evaluation results show that SVM (LinearSVC) achieved the highest accuracy of 95.9% and cross-validation score of 0.960, while Logistic Regression outperformed others in AUC Macro at 0.952 and macro F1-Score of 0.460, reflecting the best ability to recognize all categories in a balanced manner. Decision Tree showed the lowest performance with an AUC Macro of 0.635. These findings confirm that the selection of the best algorithm depends on the priority of evaluation metrics used according to the needs. This study contributes a recommendation of effective algorithms for Indonesian hoax classification and a valid, data leakage-free methodological framework.

Downloads

Download data is not yet available.

References

[1] A. Sarjito, “Hoaks, Disinformasi, dan Ketahanan Nasional: Ancaman Teknologi Informasi dalam Masyarakat Digital Indonesia,” J. Gov. Local Polit., vol. 6, no. 2, pp. 175–186, 2024, doi: 10.47650/jglp.v6i2.1547.

[2] M. D. Desriansyah, I. U. Sari, and Z. Zulfahmi, “Analisis Efektivitas Algoritma Machine Learning dalam Deteksi Hoaks: Pada Berita Digital Berbahasa Indonesia,” J. Sist. Inf. Dan Inform., vol. 3, no. 2, pp. 63–69, 2025, doi: 10.47233/jiska.v3i1.2024.

[3] N. Arifin, U. Enri, and N. Sulistiyowati, “Penerapan Algoritma Support Vector Machine (SVM) dengan TF-IDF N-Gram untuk Text Classification,” STRING (Satuan Tulisan Ris. dan Inov. Teknol., vol. 6, no. 2, p. 129, 2021, doi: 10.30998/string.v6i2.10133.

[4] R. N. Ramadhon, A. Ogi, A. P. Agung, R. Putra, S. S. Febrihartina, and U. Firdaus, “Implementasi Algoritma Decision Tree untuk Klasifikasi Pelanggan Aktif atau Tidak Aktif pada Data Bank,” Karimah Tauhid, vol. 3, no. 2, pp. 1860–1874, 2024, doi: 10.30997/karimahtauhid.v3i2.11952.

[5] F. L. Asep Ripa’i, Firman Santoso, “Deteksi Berita Hoax dengan Perbandingan Website Menggunakan Pendekatan Deep Learning Algoritma BERT,” G-Tech J. Teknol. Terap., vol. 6, no. 2, pp. 295–305, 2022.

[6] C. Haryawan and Y. M. K. Ardhana, “Analisa Perbandingan Teknik Oversampling SMOTE,” JIRE (Jurnal Inform. Rekayasa Elektron., vol. 6, no. 1, pp. 73–78, 2023.

[7] M. P. Pulungan, A. Purnomo, and A. Kurniasih, “Penerapan SMOTE untuk Mengatasi Imbalance Class dalam Klasifikasi Kepribadian MBTI Menggunakan Naive Bayes Classifier,” J. Teknol. Inf. dan Ilmu Komput., vol. 11, no. 5, pp. 1033–1042, 2024, doi: 10.25126/jtiik.2024117989.

[8] C. J. L. Tobing, IGN Lanang Wijayakusuma, and Luh Putu Ida Harini, “Perbandingan Kinerja IndoBERT dan MBERT Untuk Deteksi Berita Hoaks Politik dalam Bahasa Indonesia,” JST (Jurnal Sains dan Teknol., vol. 14, no. 1, pp. 114–123, 2025, doi: 10.23887/jstundiksha.v14i1.92126.

[9] A. M. Wahid, Turino, K. A. Nugroho, D. Titi Safitri4, and F. S. Utomo, “Optimasi Logistic Regression dan Random Forest untuk Deteksi Berita Hoax Optimasi Logistic Regression dan Random Forest untuk Deteksi Berita Hoax Berbasis Hyperparameter Optimization of Logistic Regression and Random Forest for Hoax News Detection Using T,” J. Pendidik. dan Teknol. Indones., vol. 4, no. January, pp. 381–392, 2025.

[10] I. N. Rizki, D. Prayoga, M. L. Puspita, and M. Q. Huda, “Implementasi Exploratory Data Analysis Untuk Analisis Dan Visualisasi Data Penderita Stroke Kalimantan Selatan Menggunakan Platform Tableau,” J. Inform. dan Tek. Elektro Terap., vol. 12, no. 1, 2024, doi: 10.23960/jitet.v12i1.3856.

[11] I. A. Rahma and L. H. Suadaa, “Penerapan Text Augmentation untuk Mengatasi Data yang Tidak Seimbang pada Klasifikasi Teks Berbahasa Indonesia,” J. Teknol. Inf. dan Ilmu Komput., vol. 10, no. 6, pp. 1329–1340, 2023, doi: 10.25126/jtiik.2023107325.

[12] T. Gori, A. Sunyoto, and H. Al Fatta, “Preprocessing Data dan Klasifikasi untuk Prediksi Kinerja Akademik Siswa,” J. Teknol. Inf. dan Ilmu Komput., vol. 11, no. 1, pp. 215–224, 2024, doi: 10.25126/jtiik.20241118074.

[13] A. Arasy and S. Agustian, “Sentiment Classification Using Multilayer Perceptron Algorithm with TF-IDF Features Klasifikasi Sentimen Menggunakan Metode Multilayer Perceptron dengan Fitur TF-IDF,” vol. 5, no. July, pp. 908–919, 2025.

[14] M. Sulistiyono et al., “Implementasi Algoritma Synthetic Minority Over - Sampling Technique untuk Menangani Ketidakseimbangan Kelas pada Dataset Klasifikasi,” vol. 10, pp. 445–459, 2021.

[15] T. H. Pinem and Z. P. Putra, “Evaluasi Kinerja Algoritma Klasifikasi Deep Learning dalam Prediksi Diabetes,” J. Ilm. FIFO, vol. 17, no. 1, p. 17, 2025, doi: 10.22441/fifo.2025.v17i1.003.

Perbandingan Algoritma Machine Learning untuk Klasifikasi Hoaks Berbahasa Indonesia pada Dataset Komdigi

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Accredited

Indexing by

The Boards

Download Template

Visitors

ISSN Portal

Members of:

Recommended Tools

Current Issue