Pengenalan Ekspresi Wajah Peserta Didik di Ruang Kelas Menggunakan Vision Transformer (ViT)

Authors

  • Muhammad Fakhri Fadhlurrahman Universitas Pendidikan Indonesia
  • Munir Universitas Pendidikan Indonesia
  • Yaya Wihardi Universitas Pendidikan Indonesia

DOI:

https://doi.org/10.62712/juktisi.v4i2.531

Abstract

Abstrak

Ekspresi wajah merupakan bentuk komunikasi non-verbal yang penting dalam memahami kondisi emosional peserta didik di ruang kelas. Pemahaman ini dapat membantu pendidik menyesuaikan metode pengajaran sesuai dengan keadaan emosional siswa, sehingga proses belajar mengajar menjadi lebih efektif. Penelitian ini bertujuan untuk mengembangkan dan menerapkan sistem pengenalan ekspresi wajah secara real-time di ruang kelas dengan memanfaatkan arsitektur Vision Transformer (ViT). Dua pendekatan sistem dikembangkan dalam penelitian ini: sistem dual-stage yang memanfaatkan kombinasi model deteksi wajah YOLOv11s dan model pengenalan ekspresi wajah HybridViT (ResNet-50), serta sistem single-stage yang menggunakan model YOLOv11s untuk langsung mendeteksi emosi dari citra wajah. Dataset yang digunakan meliputi Real-world Affective Face Database (RAF-DB), Face Detection Dataset, dan Facial Expression in Classroom, yang masing-masing digunakan untuk pelatihan awal dan fine-tuning model. Hasil pengujian menunjukkan bahwa sistem dual-stage memiliki performa klasifikasi yang lebih baik dengan nilai mean Average Precision (mAP) sebesar 0,2846, dibandingkan sistem single-stage dengan mAP sebesar 0,1603. Sebaliknya, dari segi efisiensi inferensi, sistem single-stage lebih unggul dengan latensi rata-rata per wajah sebesar 0,290 ms (6.539 FPS) di GPU dan 1,862 ms (545 FPS) di CPU, dibandingkan sistem dual-stage yang memiliki latensi lebih tinggi. Selain itu, evaluasi menunjukkan ketidakseimbangan performa antar kelas emosi akibat distribusi data yang tidak merata. Secara keseluruhan, kedua pendekatan menunjukkan potensi yang menjanjikan untuk implementasi sistem pengenalan ekspresi wajah di ruang kelas. Keduanya masih dapat ditingkatkan dari segi akurasi, generalisasi antar emosi, serta efisiensi waktu inferensi melalui peningkatan kualitas dataset dan eksplorasi teknik pelatihan lanjutan.

Kata Kunci: Pengenalan Ekspresi Wajah,  Vision Transformer, YOLOv11s, Real-Time, Ruang Kelas, Dual-Stage, Single-Stage

Abstract

Facial expressions serve as an essential form of non-verbal communication in understanding students' emotional states in the classroom. This understanding enables educators to adjust their teaching methods according to students' emotions, thus improving the effectiveness of the learning process. This study aims to develop and implement a real-time facial expression recognition system in classroom settings by utilizing the Vision Transformer (ViT) architecture. Two system approaches were developed: a dual-stage system combining a YOLOv11s face detection model with a HybridViT (ResNet-50) facial expression recognition model, and a single-stage system using a YOLOv11s model to directly detect emotions from facial images. The datasets used include the Real-world Affective Faces Database (RAF-DB) and the Facial Expression in Classroom Dataset, which were employed for model training and fine-tuning, respectively. Evaluation results demonstrate that the dual-stage system achieves superior classification performance with a mean Average Precision (mAP) of 0.2846, compared to the single-stage system's mAP of 0.1603. However, in terms of inference efficiency, the single-stage system outperforms the dual-stage system, achieving a lower average latency per face of 0.290 ms (6.539 FPS) on GPU and 1.862 ms (545 FPS) on CPU. The evaluation also highlights an imbalance in classification performance across emotion classes, primarily due to the uneven distribution of training and fine-tuning data. Overall, both approaches exhibit promising potential for facial expression recognition applications in classroom environments. Further improvements in accuracy, emotional generalization, and computational efficiency can be achieved through enhanced dataset quality, balanced emotion representation, and exploration of advanced training techniques.

Keywords: Facial Expression Recognition,  Vision Transformer, YOLOv11s, Real-Time, Classroom, Dual-Stage, Single-Stage

Downloads

Download data is not yet available.

References

C. Frith, “Role of facial expressions in social interactions,” Philosophical Transactions of the Royal Society B: Biological Sciences, vol. 364, no. 1535, pp. 3453–3458, 2009, Accessed: Jul. 31, 2025. [Online]. Available: https://pmc.ncbi.nlm.nih.gov/articles/PMC2781887/pdf/rstb20090142.pdf

M. Batty and M. J. Taylor, “Early processing of the six basic facial emotional expressions,” Cognitive brain research, vol. 17, no. 3, pp. 613–620, 2003, Accessed: Jul. 31, 2025. [Online]. Available: https://www.ece.uvic.ca/~bctill/papers/facerec/Batty_Taylor_2003.pdf

S. Julika and D. Setiyawati, “Kecerdasan emosional, stres akademik, dan kesejahteraan subjektif pada mahasiswa,” Gadjah Mada Journal of Psychology (GamaJoP), vol. 5, no. 1, pp. 50–59, 2019, Accessed: Jul. 31, 2025. [Online]. Available: https://journal.ugm.ac.id/gamajop/article/download/47966/24933

Y. Tian, T. Kanade, and J. F. Cohn, “Facial expression recognition,” in Handbook of face recognition, Springer, 2011, pp. 487–519. Accessed: Jul. 31, 2025. [Online]. Available: http://www.cs.usfca.edu/~byuksel/affectivecomputing/readings/facial_expression/tian2011.pdf

Y. Huang, F. Chen, S. Lv, and X. Wang, “Facial expression recognition: A survey,” Symmetry (Basel), vol. 11, no. 10, p. 1189, 2019, Accessed: Jul. 31, 2025. [Online]. Available: https://www.mdpi.com/2073-8994/11/10/1189

J. Grafsgaard, J. B. Wiggins, K. E. Boyer, E. N. Wiebe, and J. Lester, “Automatically recognizing facial expression: Predicting engagement and frustration,” in Educational data mining 2013, 2013. Accessed: Jul. 31, 2025. [Online]. Available: https://cise.ufl.edu/research/learndialogue/pdf/LearnDialogue-Grafsgaard-EDM-2013.pdf

H. Hikmatiar, N. Sya’bania, and B. Hamsa, “Relation of Facial Expressions and Student Learning Outcomes in Face Recognition-Based Online Learning Article Info,” Jurnal Kajian Teknologi Pendidikan, vol. 9, no. 1, pp. 1–13, Apr. 2024, doi: 10.17977/um039v9i12024p1.

D. Canedo and A. J. R. Neves, “Facial expression recognition using computer vision: A systematic review,” Applied Sciences, vol. 9, no. 21, p. 4678, 2019, Accessed: Jul. 31, 2025. [Online]. Available: https://www.mdpi.com/2076-3417/9/21/4678

D. Bhatt et al., “CNN variants for computer vision: History, architecture, application, challenges and future scope,” Electronics (Basel), vol. 10, no. 20, p. 2470, 2021, Accessed: Jul. 31, 2025. [Online]. Available: https://www.mdpi.com/2079-9292/10/20/2470

S. Sunardi, A. Fadlil, and D. Prayogi, “Sistem Pengenalan Wajah pada Keamanan Ruangan Berbasis Convolutional Neural Network,” J-SAKTI (Jurnal Sains Komputer dan Informatika), vol. 6, no. 2, pp. 636–647, 2022, Accessed: Jul. 31, 2025. [Online]. Available: https://tunasbangsa.ac.id/ejurnal/index.php/jsakti/article/viewFile/480/453

R. Nurhawanti, “Sistem Pendeteksi Sepeda Motor Pelanggar Marka Jalan Menggunakan Metode Convolutional Neural Networks (CNNs),” Universitas Pendidikan Indonesia, Bandung, 2019. Accessed: Aug. 17, 2025. [Online]. Available: http://siad.cs.upi.edu//assets/files/5cd8ea27df49828119.pdf

M. D. L. Yudha, “Deteksi Sepeda Motor di Jalan Raya Menggunakan Faster R-CNN Berbasis VGG16,” Universitas Pendidikan Indonesia, Bandung, 2020. Accessed: Aug. 17, 2025. [Online]. Available: http://siad.cs.upi.edu//assets/files/5f4bca8cdf1a121971.pdf

F. A. Febriyanti, “Image Processing Dengan Metode Convolutional Neural Network (Cnn) Untuk Deteksi Penyakit Kulit Pada Manusia,” Kohesi J. Sains dan Teknol, vol. 3, no. 10, pp. 21–30, 2024, Accessed: Jul. 31, 2025. [Online]. Available: https://ejournal.warunayama.org/index.php/kohesi/article/view/4088/3803

M. R. M. A., “Pemetaan dan Identifikasi Kesiapan Petik Tanaman Teh Berdasarkan Citra Drone Menggunakan Mask Region-Based Convolutional Neural Network (Mask R-CNN) dan Green Leaf Index (GLI),” Universitas Pendidikan Indonesia, Bandung, 2023. Accessed: Aug. 17, 2025. [Online]. Available: http://siad.cs.upi.edu//assets/files/65a4b203c369329936.pdf

S. K. Wulandari and J. Jasmir, “Penggunaan Resnet-50 Untuk Deteksi Penyakit Ikan Air Tawar di Akuakultur Studi Kasus pada Akuakultur Asia Selatan,” in Prosiding Seminar Nasional Bisnis, Teknologi Dan Kesehatan (SENABISTEKES), 2024, pp. 17–24. Accessed: Jul. 31, 2025. [Online]. Available: https://www.ejournal.ummuba.ac.id/index.php/SENABISTEKES/article/download/2205/1113

A. Anggraini and H. Zakaria, “Penerapan Metode Deep Learning Pada Aplikasi Pembelajaran Menggunakan Sistem Isyarat Bahasa Indonesia Menggunakan Convolutional Neural Network (Studi Kasus: SLB-BC Mahardika Depok),” JURIHUM: Jurnal Inovasi dan Humaniora, vol. 1, no. 4, pp. 452–464, 2023, Accessed: Jul. 31, 2025. [Online]. Available: https://jurnalmahasiswa.com/index.php/Jurihum/article/view/723/432

A. J. Moshayedi, A. S. Roy, A. Kolahdooz, and Y. Shuxin, “Deep learning application pros and cons over algorithm,” EAI Endorsed Transactions on AI and Robotics, vol. 1, no. 1, p. e7, 2022, Accessed: Jul. 31, 2025. [Online]. Available: https://www.academia.edu/download/115942971/2022_JR_Deep_Learning_Application_Pros_and_Cons_Over.pdf

S. Fakhar et al., “Smart classroom monitoring using novel real-time facial expression recognition system,” Applied Sciences, vol. 12, no. 23, p. 12134, 2022, Accessed: Jul. 31, 2025. [Online]. Available: https://www.mdpi.com/2076-3417/12/23/12134

A. Dosovitskiy et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020, Accessed: Jul. 31, 2025. [Online]. Available: https://arxiv.org/pdf/2010.11929/1000

K. Han et al., “A survey on visual transformer,” arXiv preprint arXiv:2012.12556, 2020, Accessed: Jul. 31, 2025. [Online]. Available: https://arxiv.org/pdf/2012.12556

R. J. Gunawan, B. Irawan, and C. Setianingsih, “Pengenalan Ekspresi Wajah Berbasis Convolutional Neural Network Dengan Model Arsitektur VGG16,” eProceedings of Engineering, vol. 8, no. 5, 2021, Accessed: Jul. 31, 2025. [Online]. Available: https://openlibrarypublications.telkomuniversity.ac.id/index.php/engineering/article/view/16400/16113

A. L. S. Guntoro, E. Julianto, and D. Budiyanto, “Pengenalan Ekspresi Wajah Menggunakan Convolutional Neural Network,” Jurnal Informatika Atma Jogja, vol. 3, no. 2, pp. 155–160, 2022, Accessed: Jul. 31, 2025. [Online]. Available: https://ojs.uajy.ac.id/index.php/jiaj/article/download/6790/2839

D. V. Sang, N. Van Dat, and others, “Facial expression recognition using deep convolutional neural networks,” in 2017 9th International Conference on Knowledge and Systems Engineering (KSE), 2017, pp. 130–135. Accessed: Jul. 31, 2025. [Online]. Available: https://www.researchgate.net/profile/Dinh-Sang/publication/321257241_Facial_expression_recognition_using_deep_convolutional_neural_networks/links/5b12a7824585150a0a619d6c/Facial-expression-recognition-using-deep-convolutional-neural-networks.pdf

S. Minaee, M. Minaei, and A. Abdolrashidi, “Deep-emotion: Facial expression recognition using attentional convolutional network,” Sensors, vol. 21, no. 9, p. 3046, 2021, Accessed: Jul. 31, 2025. [Online]. Available: https://www.mdpi.com/1424-8220/21/9/3046

N. R. Faikar, “Pengenalan Emosi Manusia Menggunakan Log-Gabor Convolutional Networks Melalui Pendekatan Facial Region Segmentation,” Universitas Pendidikan Indonesia, Bandung, 2020. Accessed: Aug. 17, 2025. [Online]. Available: http://siad.cs.upi.edu//assets/files/5f49e8fe6fec438740.pdf

S. M. Wahyono, “Evaluasi Kepuasan Pelanggan Berdasarkan Ekspresi Wajah Menggunakan Real Time Detection Transformer (RT-DETR),” Universitas Pendidikan Indonesia, Bandung, 2025. Accessed: Aug. 17, 2025. [Online]. Available: http://siad.cs.upi.edu//assets/files/67a34fb4799d446045.pdf

A. Chaudhari, C. Bhatt, A. Krishna, and P. L. Mazzeo, “ViTFER: facial emotion recognition with vision transformers,” Applied System Innovation, vol. 5, no. 4, p. 80, 2022, Accessed: Jul. 31, 2025. [Online]. Available: https://www.mdpi.com/2571-5577/5/4/80

Downloads

Published

2025-08-28

How to Cite

Muhammad Fakhri Fadhlurrahman, Munir, & Yaya Wihardi. (2025). Pengenalan Ekspresi Wajah Peserta Didik di Ruang Kelas Menggunakan Vision Transformer (ViT). Jurnal Komputer Teknologi Informasi Sistem Informasi (JUKTISI), 4(2), 1047–1058. https://doi.org/10.62712/juktisi.v4i2.531