Structural Classification of Indonesian Arithmetic Word Problems Using Hierarchical Agglomerative Clustering

Authors

  • Mochammad Dhani Aprianto Universitas Bhinneka PGRI Tulungagung
  • Agung Prasetya Universitas Bhinneka PGRI

         DOI:

https://doi.org/10.62712/juktisi.v5i1.1026

Keywords:

Hierarchical Agglomerative Clustering Arithmetic Word Problems Structural Classification Natural Language Processing Educational Data Mining

Abstract

Arithmetic word problems (MWP) are a fundamental component of elementary mathematics education that integrate linguistic comprehension with quantitative reasoning. In practice, collections of MWPs are commonly organized based on teacher intuition or broad curriculum categories, which are inherently subjective and often fail to reflect the true mathematical similarity between problems. This study aims to classify Indonesian arithmetic word problems based on their underlying relational structures using Hierarchical Agglomerative Clustering (HAC). The dataset consists of 897 elementary-level arithmetic word problems represented through 143 binary features encoding five relational dimensions, namely combine, change, compare, equal groups, and fair division. Hamming Distance is employed as the dissimilarity metric, and clustering is performed using the complete linkage method. The optimal number of clusters is determined using three internal validity indices: the Calinski–Harabasz Index, Silhouette Score, and Davies–Bouldin Index. Although statistical indices favor smaller cluster configurations, four clusters are selected as the optimal number based on domain-specific interpretability, as they align with established theoretical categories of arithmetic relational structures. This approach effectively identifies latent structural patterns within the dataset and demonstrates the potential of feature-based binary representation combined with HAC for systematic MWP classification. The findings offer practical support for adaptive problem bank development, automated curriculum analysis, and intelligent tutoring system design.

Downloads

Download data is not yet available.

Author Biography

Agung Prasetya, Universitas Bhinneka PGRI

Lecturer at Informatics Department, Universitas Bhinneka PGRI.

References

S. Acharya, R. Basak, and S. Mandal, “Solving Arithmetic Word Problems Using Natural Language Processing and Rule-Based Classification,” Int. J. Intell. Syst. Appl. Eng., vol. 10, no. 1, pp. 87–97, Mar. 2022, doi: 10.18201/ijisae.2022.271.

Z. Ersozlu, S. Taheri, and I. Koch, “A review of machine learning methods used for educational data,” Educ. Inf. Technol., vol. 29, no. 16, pp. 22125–22145, Nov. 2024, doi: 10.1007/s10639-024-12704-0.

Y. Zhang, Y. Yun, R. An, J. Cui, H. Dai, and X. Shang, “Educational Data Mining Techniques for Student Performance Prediction: Method Review and Comparison Analysis,” Front. Psychol., vol. 12, Dec. 2021, doi: 10.3389/fpsyg.2021.698490.

R. T. Aldisa, “Data Mining Penentuan Jurusan Siswa Menggunakan Metode Agglomerative Hierarchical Clustering (AHC,” J. MEDIA Inform. BUDIDARMA, vol. 7, no. 2, p. 873, Apr. 2023, doi: 10.30865/mib.v7i2.6092.

Andre, N. Suciati, H. Fabroyir, and E. Pardede, “Educational Data Mining Clustering Approach: Case Study of Undergraduate Student Thesis Topic,” IEEE Access, vol. 11, pp. 130072–130088, 2023, doi: 10.1109/ACCESS.2023.3332818.

F. Murtagh and P. Contreras, “Algorithms for hierarchical clustering: an overview,” WIREs Data Min. Knowl. Discov., vol. 2, no. 1, pp. 86–97, Jan. 2012, doi: 10.1002/widm.53.

R. Argiento, E. Filippi-Mazzola, and L. Paci, “Model-Based Clustering of Categorical Data Based on the Hamming Distance,” J. Am. Stat. Assoc., vol. 120, no. 550, pp. 1178–1188, Apr. 2025, doi: 10.1080/01621459.2024.2402568.

T. Märzinger, J. Kotík, and C. Pfeifer, “Application of Hierarchical Agglomerative Clustering (HAC) for Systemic Classification of Pop-Up Housing (PUH) Environments,” Appl. Sci., vol. 11, no. 23, p. 11122, Nov. 2021, doi: 10.3390/app112311122.

A. M. Ikotun, F. Habyarimana, and A. E. Ezugwu, “Cluster validity indices for automatic clustering: A comprehensive review,” Heliyon, vol. 11, no. 2, p. e41953, Jan. 2025, doi: 10.1016/j.heliyon.2025.e41953.

B. A. Hassan, N. B. Tayfor, A. A. Hassan, A. M. Ahmed, T. A. Rashid, and N. N. Abdalla, “From A-to-Z review of clustering validation indices,” Neurocomputing, vol. 601, p. 128198, Oct. 2024, doi: 10.1016/j.neucom.2024.128198.

A. A. Wani, “Comprehensive analysis of clustering algorithms: exploring limitations and innovative solutions,” PeerJ Comput. Sci., vol. 10, p. e2286, Aug. 2024, doi: 10.7717/peerj-cs.2286.

Pattern Recognition and Machine Learning. Springer New York, 2006. doi: 10.1007/978-0-387-45528-0.

D. Xu and Y. Tian, “A Comprehensive Survey of Clustering Algorithms,” Ann. Data Sci., vol. 2, no. 2, pp. 165–193, Jun. 2015, doi: 10.1007/s40745-015-0040-1.

E. K. Tokuda, C. H. Comin, and L. da F. Costa, “Revisiting agglomerative clustering,” Phys. A Stat. Mech. its Appl., vol. 585, p. 126433, Jan. 2022, doi: 10.1016/j.physa.2021.126433.

F. Ros, R. Riad, and S. Guillaume, “PDBI: A partitioning Davies-Bouldin index for clustering evaluation,” Neurocomputing, vol. 528, pp. 178–199, Apr. 2023, doi: 10.1016/j.neucom.2023.01.043.

Downloads

Published

2026-05-16

How to Cite

Aprianto, M. D., & Prasetya, A. (2026). Structural Classification of Indonesian Arithmetic Word Problems Using Hierarchical Agglomerative Clustering. Jurnal Komputer Teknologi Informasi Sistem Komputer (JUKTISI), 5(1), 393–400. https://doi.org/10.62712/juktisi.v5i1.1026

Issue

Section

Articles