Analisys Analisis 5V Big Data pada Internet Archive untuk Pemetaan Evulosi Topik Web (1996-2026)
DOI:
https://doi.org/10.62712/juktisi.v5i1.951Keywords:
Big data, Internet Archive, K-Means Clustering, TF-IDF, Association RulesAbstract
Abstract The massive collection of digital artifacts in the Internet Archive and Wayback Machine represents a historical encyclopedia of modern civilization. However, the sheer volume of unstructured data poses challenges in extracting meaningful information, demanding advanced computational analytic approaches. This study aims to demonstrate the architectural evaluation of digital heritage stacks using a comprehensive Big Data 5V framework (Volume, Velocity, Variety, Veracity, Value), designed to map the dynamic trends of web topic evolution over three decades (1996–2026). The methodology relies on 3,000 metadata corpora extracted using K-Means clustering (K=10) with Term Frequency-Inverse Document Frequency (TF-IDF) matrix weighting for text grouping, followed by Apriori association rules
Downloads
References
E. Maemura, “All WARC and no playback: The materialities of data-centered web archives research,” Big Data Soc., vol. 10, no. 1, Jan. 2023, doi: 10.1177/20539517231163172.
J. Ogden, E. Summers, and S. Walker, “Know(ing) Infrastructure: The Wayback Machine as object and instrument of digital research,” Convergence, vol. 30, no. 1, pp. 167–189, Feb. 2024, doi: 10.1177/13548565231164759.
L. Theodorakopoulos, A. Theodoropoulou, and Y. Stamatiou, “A State-of-the-Art Review in Big Data Management Engineering: Real-Life Case Studies, Challenges, and Future Research Directions,” Sep. 01, 2024, Multidisciplinary Digital Publishing Institute (MDPI). doi: 10.3390/eng5030068.
A. M. Ikotun, A. E. Ezugwu, L. Abualigah, B. Abuhaija, and J. Heming, “K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data,” Inf. Sci. (N. Y)., vol. 622, pp. 178–210, Apr. 2023, doi: 10.1016/j.ins.2022.11.139.
M. Y. Hidayat, M. A. Yaqin, and Z. Abidin, “Semantic-Enhanced News Clustering Using TF-IDF and WordNet with K-Means,” vol. 7, no. 4, 2025, doi: 10.63158/journalisi.v7i4.1260.
I. Riadi, H. Herman, F. Fitriah, S. Suprihatin, A. Muis, and M. Yunus, “Implementation of association rule using apriori algorithm and frequent pattern growth for inventory control,” JURNAL INFOTEL, vol. 15, no. 4, pp. 369–378, Dec. 2023, doi: 10.20895/infotel.v15i4.980.
V. (enter) R. M. E. (enter) E. L. H. Abdul Hameed, “Apriori Algorithm based Association Rule Mining to Enhance Small-Scale Retailer Sales,” in 2023 IEEE 6th International Conference on Big Data and Artificial Intelligence (BDAI), Jiaxing, China: IEEE, Jul. 2023.
A. Ali, S. Naeem, S. Anam, and M. M. Ahmed, “A State of Art Survey for Big Data Processing and NoSQL Database Architecture,” International Journal of Computing and Digital Systems, vol. 14, no. 1, pp. 297–309, 2023, doi: 10.12785/ijcds/140124.
S. A. Devi and S. Siva Kumar, “A Hybrid Document Features Extraction with Clustering based Classification Framework on Large Document Sets.” [Online]. Available: www.ijacsa.thesai.org
E. Hassan et al., “A Hybrid K-Means++ and Particle Swarm Optimization Approach for Enhanced Document Clustering,” IEEE Access, vol. 13, pp. 48818–48840, 2025, doi: 10.1109/ACCESS.2025.3535226.
J. A. Diaz-Garcia, M. D. Ruiz, and M. J. Martin-Bautista, “A survey on the use of association rules mining techniques in textual social media,” Artif. Intell. Rev., vol. 56, no. 2, pp. 1175–1200, Feb. 2023, doi: 10.1007/s10462-022-10196-3.
A. Manconi, M. Gnocchi, L. Milanesi, O. Marullo, and G. Armano, “Framing Apache Spark in life sciences,” Feb. 01, 2023, Elsevier Ltd. doi: 10.1016/j.heliyon.2023.e13368.
M. Nazarovets and J. A. Teixeira da Silva, “Use of the Internet Archive to Preserve the Constituency of Journal Editorial Boards,” Publishing Research Quarterly, vol. 39, no. 4, pp. 368–388, Dec. 2023, doi: 10.1007/s12109-023-09966-w.
Y. Januzaj, E. Beqiri, and A. Luma, “Determining the Optimal Number of Clusters using Silhouette Score as a Data Mining Technique,” International journal of online and biomedical engineering, vol. 19, no. 4, pp. 174–182, 2023, doi: 10.3991/ijoe.v19i04.37059.
A. E. Ezugwu et al., “A comprehensive survey of clustering algorithms: State-of-the-art machine learning applications, taxonomy, challenges, and future research prospects,” Eng. Appl. Artif. Intell., vol. 110, p. 104743, Apr. 2022, doi: 10.1016/j.engappai.2022.104743.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Khairida Octavia Ramadhani Octavia, Micael, Syuhada Simbolon, Dwi Nina Putri Anakampun

This work is licensed under a Creative Commons Attribution 4.0 International License.















