Structural Classification of Indonesian Arithmetic Word Problems Using Hierarchical Agglomerative Clustering

Mochammad Dhani Aprianto; Agung Prasetya

doi:10.62712/juktisi.v5i1.1026

Authors

Mochammad Dhani Aprianto Universitas Bhinneka PGRI Tulungagung
Agung Prasetya Universitas Bhinneka PGRI

DOI:

https://doi.org/10.62712/juktisi.v5i1.1026

Keywords:

Hierarchical Agglomerative Clustering Arithmetic Word Problems Structural Classification Natural Language Processing Educational Data Mining

Abstract

Arithmetic word problems (MWP) are a fundamental component of elementary mathematics education that integrate linguistic comprehension with quantitative reasoning. In practice, collections of MWPs are commonly organized based on teacher intuition or broad curriculum categories, which are inherently subjective and often fail to reflect the true mathematical similarity between problems. This study aims to classify Indonesian arithmetic word problems based on their underlying relational structures using Hierarchical Agglomerative Clustering (HAC). The dataset consists of 897 elementary-level arithmetic word problems represented through 143 binary features encoding five relational dimensions, namely combine, change, compare, equal groups, and fair division. Hamming Distance is employed as the dissimilarity metric, and clustering is performed using the complete linkage method. The optimal number of clusters is determined using three internal validity indices: the Calinski–Harabasz Index, Silhouette Score, and Davies–Bouldin Index. Although statistical indices favor smaller cluster configurations, four clusters are selected as the optimal number based on domain-specific interpretability, as they align with established theoretical categories of arithmetic relational structures. This approach effectively identifies latent structural patterns within the dataset and demonstrates the potential of feature-based binary representation combined with HAC for systematic MWP classification. The findings offer practical support for adaptive problem bank development, automated curriculum analysis, and intelligent tutoring system design.

Downloads

Download data is not yet available.

Author Biography

Agung Prasetya, Universitas Bhinneka PGRI

Lecturer at Informatics Department, Universitas Bhinneka PGRI.

References

S. Acharya, R. Basak, and S. Mandal, “Solving Arithmetic Word Problems Using Natural Language Processing and Rule-Based Classification,” Int. J. Intell. Syst. Appl. Eng., vol. 10, no. 1, pp. 87–97, Mar. 2022, doi: 10.18201/ijisae.2022.271.

Z. Ersozlu, S. Taheri, and I. Koch, “A review of machine learning methods used for educational data,” Educ. Inf. Technol., vol. 29, no. 16, pp. 22125–22145, Nov. 2024, doi: 10.1007/s10639-024-12704-0.

Y. Zhang, Y. Yun, R. An, J. Cui, H. Dai, and X. Shang, “Educational Data Mining Techniques for Student Performance Prediction: Method Review and Comparison Analysis,” Front. Psychol., vol. 12, Dec. 2021, doi: 10.3389/fpsyg.2021.698490.

R. T. Aldisa, “Data Mining Penentuan Jurusan Siswa Menggunakan Metode Agglomerative Hierarchical Clustering (AHC,” J. MEDIA Inform. BUDIDARMA, vol. 7, no. 2, p. 873, Apr. 2023, doi: 10.30865/mib.v7i2.6092.

Andre, N. Suciati, H. Fabroyir, and E. Pardede, “Educational Data Mining Clustering Approach: Case Study of Undergraduate Student Thesis Topic,” IEEE Access, vol. 11, pp. 130072–130088, 2023, doi: 10.1109/ACCESS.2023.3332818.

F. Murtagh and P. Contreras, “Algorithms for hierarchical clustering: an overview,” WIREs Data Min. Knowl. Discov., vol. 2, no. 1, pp. 86–97, Jan. 2012, doi: 10.1002/widm.53.

R. Argiento, E. Filippi-Mazzola, and L. Paci, “Model-Based Clustering of Categorical Data Based on the Hamming Distance,” J. Am. Stat. Assoc., vol. 120, no. 550, pp. 1178–1188, Apr. 2025, doi: 10.1080/01621459.2024.2402568.

T. Märzinger, J. Kotík, and C. Pfeifer, “Application of Hierarchical Agglomerative Clustering (HAC) for Systemic Classification of Pop-Up Housing (PUH) Environments,” Appl. Sci., vol. 11, no. 23, p. 11122, Nov. 2021, doi: 10.3390/app112311122.

A. M. Ikotun, F. Habyarimana, and A. E. Ezugwu, “Cluster validity indices for automatic clustering: A comprehensive review,” Heliyon, vol. 11, no. 2, p. e41953, Jan. 2025, doi: 10.1016/j.heliyon.2025.e41953.

B. A. Hassan, N. B. Tayfor, A. A. Hassan, A. M. Ahmed, T. A. Rashid, and N. N. Abdalla, “From A-to-Z review of clustering validation indices,” Neurocomputing, vol. 601, p. 128198, Oct. 2024, doi: 10.1016/j.neucom.2024.128198.

A. A. Wani, “Comprehensive analysis of clustering algorithms: exploring limitations and innovative solutions,” PeerJ Comput. Sci., vol. 10, p. e2286, Aug. 2024, doi: 10.7717/peerj-cs.2286.

Pattern Recognition and Machine Learning. Springer New York, 2006. doi: 10.1007/978-0-387-45528-0.

D. Xu and Y. Tian, “A Comprehensive Survey of Clustering Algorithms,” Ann. Data Sci., vol. 2, no. 2, pp. 165–193, Jun. 2015, doi: 10.1007/s40745-015-0040-1.

E. K. Tokuda, C. H. Comin, and L. da F. Costa, “Revisiting agglomerative clustering,” Phys. A Stat. Mech. its Appl., vol. 585, p. 126433, Jan. 2022, doi: 10.1016/j.physa.2021.126433.

F. Ros, R. Riad, and S. Guillaume, “PDBI: A partitioning Davies-Bouldin index for clustering evaluation,” Neurocomputing, vol. 528, pp. 178–199, Apr. 2023, doi: 10.1016/j.neucom.2023.01.043.

Structural Classification of Indonesian Arithmetic Word Problems Using Hierarchical Agglomerative Clustering

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biography

Agung Prasetya, Universitas Bhinneka PGRI

References

Downloads

Published

How to Cite

Issue

Section

License

Accredited

Indexing by

The Boards

The Boards two

Download Template

Visitors

ISSN Portal

Members of:

Recommended Tools

Current Issue

Language