Structural Classification of Indonesian Arithmetic Word Problems Using Hierarchical Agglomerative Clustering
DOI:
https://doi.org/10.62712/juktisi.v5i1.1026Keywords:
Hierarchical Agglomerative Clustering Arithmetic Word Problems Structural Classification Natural Language Processing Educational Data MiningAbstract
Arithmetic word problems (MWP) are a fundamental component of elementary mathematics education that integrate linguistic comprehension with quantitative reasoning. In practice, collections of MWPs are commonly organized based on teacher intuition or broad curriculum categories, which are inherently subjective and often fail to reflect the true mathematical similarity between problems. This study aims to classify Indonesian arithmetic word problems based on their underlying relational structures using Hierarchical Agglomerative Clustering (HAC). The dataset consists of 897 elementary-level arithmetic word problems represented through 143 binary features encoding five relational dimensions, namely combine, change, compare, equal groups, and fair division. Hamming Distance is employed as the dissimilarity metric, and clustering is performed using the complete linkage method. The optimal number of clusters is determined using three internal validity indices: the Calinski–Harabasz Index, Silhouette Score, and Davies–Bouldin Index. Although statistical indices favor smaller cluster configurations, four clusters are selected as the optimal number based on domain-specific interpretability, as they align with established theoretical categories of arithmetic relational structures. This approach effectively identifies latent structural patterns within the dataset and demonstrates the potential of feature-based binary representation combined with HAC for systematic MWP classification. The findings offer practical support for adaptive problem bank development, automated curriculum analysis, and intelligent tutoring system design.
Downloads
References
S. Acharya, R. Basak, and S. Mandal, “Solving Arithmetic Word Problems Using Natural Language Processing and Rule-Based Classification,” Int. J. Intell. Syst. Appl. Eng., vol. 10, no. 1, pp. 87–97, Mar. 2022, doi: 10.18201/ijisae.2022.271.
Z. Ersozlu, S. Taheri, and I. Koch, “A review of machine learning methods used for educational data,” Educ. Inf. Technol., vol. 29, no. 16, pp. 22125–22145, Nov. 2024, doi: 10.1007/s10639-024-12704-0.
Y. Zhang, Y. Yun, R. An, J. Cui, H. Dai, and X. Shang, “Educational Data Mining Techniques for Student Performance Prediction: Method Review and Comparison Analysis,” Front. Psychol., vol. 12, Dec. 2021, doi: 10.3389/fpsyg.2021.698490.
R. T. Aldisa, “Data Mining Penentuan Jurusan Siswa Menggunakan Metode Agglomerative Hierarchical Clustering (AHC,” J. MEDIA Inform. BUDIDARMA, vol. 7, no. 2, p. 873, Apr. 2023, doi: 10.30865/mib.v7i2.6092.
Andre, N. Suciati, H. Fabroyir, and E. Pardede, “Educational Data Mining Clustering Approach: Case Study of Undergraduate Student Thesis Topic,” IEEE Access, vol. 11, pp. 130072–130088, 2023, doi: 10.1109/ACCESS.2023.3332818.
F. Murtagh and P. Contreras, “Algorithms for hierarchical clustering: an overview,” WIREs Data Min. Knowl. Discov., vol. 2, no. 1, pp. 86–97, Jan. 2012, doi: 10.1002/widm.53.
R. Argiento, E. Filippi-Mazzola, and L. Paci, “Model-Based Clustering of Categorical Data Based on the Hamming Distance,” J. Am. Stat. Assoc., vol. 120, no. 550, pp. 1178–1188, Apr. 2025, doi: 10.1080/01621459.2024.2402568.
T. Märzinger, J. Kotík, and C. Pfeifer, “Application of Hierarchical Agglomerative Clustering (HAC) for Systemic Classification of Pop-Up Housing (PUH) Environments,” Appl. Sci., vol. 11, no. 23, p. 11122, Nov. 2021, doi: 10.3390/app112311122.
A. M. Ikotun, F. Habyarimana, and A. E. Ezugwu, “Cluster validity indices for automatic clustering: A comprehensive review,” Heliyon, vol. 11, no. 2, p. e41953, Jan. 2025, doi: 10.1016/j.heliyon.2025.e41953.
B. A. Hassan, N. B. Tayfor, A. A. Hassan, A. M. Ahmed, T. A. Rashid, and N. N. Abdalla, “From A-to-Z review of clustering validation indices,” Neurocomputing, vol. 601, p. 128198, Oct. 2024, doi: 10.1016/j.neucom.2024.128198.
A. A. Wani, “Comprehensive analysis of clustering algorithms: exploring limitations and innovative solutions,” PeerJ Comput. Sci., vol. 10, p. e2286, Aug. 2024, doi: 10.7717/peerj-cs.2286.
Pattern Recognition and Machine Learning. Springer New York, 2006. doi: 10.1007/978-0-387-45528-0.
D. Xu and Y. Tian, “A Comprehensive Survey of Clustering Algorithms,” Ann. Data Sci., vol. 2, no. 2, pp. 165–193, Jun. 2015, doi: 10.1007/s40745-015-0040-1.
E. K. Tokuda, C. H. Comin, and L. da F. Costa, “Revisiting agglomerative clustering,” Phys. A Stat. Mech. its Appl., vol. 585, p. 126433, Jan. 2022, doi: 10.1016/j.physa.2021.126433.
F. Ros, R. Riad, and S. Guillaume, “PDBI: A partitioning Davies-Bouldin index for clustering evaluation,” Neurocomputing, vol. 528, pp. 178–199, Apr. 2023, doi: 10.1016/j.neucom.2023.01.043.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Mochammad Dhani Aprianto, Agung Prasetya

This work is licensed under a Creative Commons Attribution 4.0 International License.















