Analisys Analisis 5V Big Data pada Internet Archive untuk Pemetaan Evulosi Topik Web (1996-2026)

Authors

  • Khairida Octavia Ramadhani Octavia University of Medan
  • Micael University of Medan
  • Syuhada Simbolon University of Medan
  • Dwi Nina Putri Anakampun University of Medan

         DOI:

https://doi.org/10.62712/juktisi.v5i1.951

Keywords:

Big data, Internet Archive, K-Means Clustering, TF-IDF, Association Rules

Abstract

Abstract The massive collection of digital artifacts in the Internet Archive and Wayback Machine represents a historical encyclopedia of modern civilization. However, the sheer volume of unstructured data poses challenges in extracting meaningful information, demanding advanced computational analytic approaches. This study aims to demonstrate the architectural evaluation of digital heritage stacks using a comprehensive Big Data 5V framework (Volume, Velocity, Variety, Veracity, Value), designed to map the dynamic trends of web topic evolution over three decades (1996–2026). The methodology relies on 3,000 metadata corpora extracted using K-Means clustering (K=10) with Term Frequency-Inverse Document Frequency (TF-IDF) matrix weighting for text grouping, followed by Apriori association rules

Downloads

Download data is not yet available.

References

E. Maemura, “All WARC and no playback: The materialities of data-centered web archives research,” Big Data Soc., vol. 10, no. 1, Jan. 2023, doi: 10.1177/20539517231163172.

J. Ogden, E. Summers, and S. Walker, “Know(ing) Infrastructure: The Wayback Machine as object and instrument of digital research,” Convergence, vol. 30, no. 1, pp. 167–189, Feb. 2024, doi: 10.1177/13548565231164759.

L. Theodorakopoulos, A. Theodoropoulou, and Y. Stamatiou, “A State-of-the-Art Review in Big Data Management Engineering: Real-Life Case Studies, Challenges, and Future Research Directions,” Sep. 01, 2024, Multidisciplinary Digital Publishing Institute (MDPI). doi: 10.3390/eng5030068.

A. M. Ikotun, A. E. Ezugwu, L. Abualigah, B. Abuhaija, and J. Heming, “K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data,” Inf. Sci. (N. Y)., vol. 622, pp. 178–210, Apr. 2023, doi: 10.1016/j.ins.2022.11.139.

M. Y. Hidayat, M. A. Yaqin, and Z. Abidin, “Semantic-Enhanced News Clustering Using TF-IDF and WordNet with K-Means,” vol. 7, no. 4, 2025, doi: 10.63158/journalisi.v7i4.1260.

I. Riadi, H. Herman, F. Fitriah, S. Suprihatin, A. Muis, and M. Yunus, “Implementation of association rule using apriori algorithm and frequent pattern growth for inventory control,” JURNAL INFOTEL, vol. 15, no. 4, pp. 369–378, Dec. 2023, doi: 10.20895/infotel.v15i4.980.

V. (enter) R. M. E. (enter) E. L. H. Abdul Hameed, “Apriori Algorithm based Association Rule Mining to Enhance Small-Scale Retailer Sales,” in 2023 IEEE 6th International Conference on Big Data and Artificial Intelligence (BDAI), Jiaxing, China: IEEE, Jul. 2023.

A. Ali, S. Naeem, S. Anam, and M. M. Ahmed, “A State of Art Survey for Big Data Processing and NoSQL Database Architecture,” International Journal of Computing and Digital Systems, vol. 14, no. 1, pp. 297–309, 2023, doi: 10.12785/ijcds/140124.

S. A. Devi and S. Siva Kumar, “A Hybrid Document Features Extraction with Clustering based Classification Framework on Large Document Sets.” [Online]. Available: www.ijacsa.thesai.org

E. Hassan et al., “A Hybrid K-Means++ and Particle Swarm Optimization Approach for Enhanced Document Clustering,” IEEE Access, vol. 13, pp. 48818–48840, 2025, doi: 10.1109/ACCESS.2025.3535226.

J. A. Diaz-Garcia, M. D. Ruiz, and M. J. Martin-Bautista, “A survey on the use of association rules mining techniques in textual social media,” Artif. Intell. Rev., vol. 56, no. 2, pp. 1175–1200, Feb. 2023, doi: 10.1007/s10462-022-10196-3.

A. Manconi, M. Gnocchi, L. Milanesi, O. Marullo, and G. Armano, “Framing Apache Spark in life sciences,” Feb. 01, 2023, Elsevier Ltd. doi: 10.1016/j.heliyon.2023.e13368.

M. Nazarovets and J. A. Teixeira da Silva, “Use of the Internet Archive to Preserve the Constituency of Journal Editorial Boards,” Publishing Research Quarterly, vol. 39, no. 4, pp. 368–388, Dec. 2023, doi: 10.1007/s12109-023-09966-w.

Y. Januzaj, E. Beqiri, and A. Luma, “Determining the Optimal Number of Clusters using Silhouette Score as a Data Mining Technique,” International journal of online and biomedical engineering, vol. 19, no. 4, pp. 174–182, 2023, doi: 10.3991/ijoe.v19i04.37059.

A. E. Ezugwu et al., “A comprehensive survey of clustering algorithms: State-of-the-art machine learning applications, taxonomy, challenges, and future research prospects,” Eng. Appl. Artif. Intell., vol. 110, p. 104743, Apr. 2022, doi: 10.1016/j.engappai.2022.104743.

Published

2026-04-01

How to Cite

Octavia, K. O. R., Micael Zecsen Saragih, Syuhada Simbolon, & Dwi Nina Putri Anakampun. (2026). Analisys Analisis 5V Big Data pada Internet Archive untuk Pemetaan Evulosi Topik Web (1996-2026). Jurnal Komputer Teknologi Informasi Sistem Komputer (JUKTISI), 5(1), 178–190. https://doi.org/10.62712/juktisi.v5i1.951

Issue

Section

Articles