STUDENT DEPRESSION SCREENING BASED ON THE OPTIMUM DATA BALANCING AND RANDOM FOREST

  • M. Sayyidul Adnan UIN Maulana Malik Ibrahim Malang
  • Irwan Budi Santoso UIN Maulana Malik Ibrahim Malang
  • Cahyo Crysdian UIN Maulana Malik Ibrahim Malang
Keywords: data mining, depression, imbalanced data, random forest, smote, threshold tuning

Abstract

Abstract: Mental health issues, particularly depression among young adult university students, are often detected late due to stigma and reluctance to seek medical consultation. The objective of this study is to develop an early screening model employing machine learning techniques, specifically the random forest algorithm, on a dataset of 268 students (aged 17-29 years; consisting of 98 males and 170 females) within a multicultural educational setting. The principal challenges associated with this dataset are class imbalance and the potential for data leakage from clinical scores. This study implements a rigorous feature selection approach that involves the elimination of depression score features and the utilization of the Synthetic Minority Over-sampling Technique (SMOTE) to balance the training data distribution. Furthermore, a Threshold Tuning strategy is employed to prioritize detection sensitivity (Recall). The findings indicate that reducing the decision threshold to an optimal value of 0.25 led to a substantial enhancement in the recall value, increasing it from 36% (baseline) to 77%. A feature importance analysis was conducted, the results of which indicated that Total Social Connectedness (ToSC) is the most dominant predictor. In summary, the present study corroborates the notion that optimizing sensitivity through threshold tuning is of paramount importance for medical screening. Furthermore, social isolation factors emerge as more significant indicators of depression risk than demographic attributes.

           
Keywords: data mining; depression; imbalanced data; random forest; smote; threshold tuning

 

 

Abstrak: Masalah kesehatan mental, khususnya depresi di kalangan mahasiswa dewasa muda, sering terdeteksi terlambat akibat stigma dan enggan mencari konsultasi medis. Tujuan studi ini adalah mengembangkan model skrining dini menggunakan teknik machine learning, khususnya algoritma random forest, pada dataset 268 mahasiswa (usia 17-29 tahun; terdiri dari 98 laki-laki dan 170 perempuan) dalam lingkungan pendidikan multikultural. Tantangan utama yang terkait dengan dataset ini adalah ketidakseimbangan kelas dan potensi kebocoran data dari skor klinis. Studi ini menerapkan pendekatan seleksi fitur yang ketat, yang melibatkan eliminasi fitur skor depresi dan penggunaan Teknik Over-sampling Minoritas Sintetis (SMOTE) untuk menyeimbangkan distribusi data pelatihan. Selain itu, strategi Penyesuaian Ambang Batas diterapkan untuk memprioritaskan sensitivitas deteksi (Recall). Hasil penelitian menunjukkan bahwa mengurangi ambang batas keputusan ke nilai optimal 0,25 menyebabkan peningkatan signifikan dalam nilai recall, dari 36% (dasar) menjadi 77%. Analisis pentingnya fitur dilakukan, hasilnya menunjukkan bahwa Total Social Connectedness (ToSC) adalah prediktor yang paling dominan. Secara ringkas, studi ini membenarkan bahwa mengoptimalkan sensitivitas melalui penyesuaian ambang batas sangat penting untuk skrining medis. Selain itu, faktor isolasi sosial muncul sebagai indikator risiko depresi yang lebih signifikan daripada atribut demografis.

 

Kata kunci: penambangan data; depresi; data tidak seimbang; hutan acak; smote; penyesuaian ambang batas

References

D. Phiri, F. Makowa, V. L. Ame-lia, Y. V. A. Phiri, L. P. Dlamini, and M. H. Chung, “Text-Based Depression Prediction on Social Media Using Machine Learning: Systematic Review and Meta-Analysis,” J. Med. Internet Res., vol. 27, 2025, doi: 10.2196/59002.

I. F. Kristiana, N. A. Karyanta, E. Simanjuntak, U. Prihatsanti, T. M. Ingarianti, and M. Shohib, “Social Support and Acculturative Stress of International Students,” Jun. 01, 2022, MDPI. doi: 10.3390 /ijerph19116568.

S. Shahwan et al., “The potential impact of an anti-stigma interven-tion on mental health help-seeking attitudes among university stu-dents,” BMC Psychiatry, vol. 20, no. 1, Dec. 2020, doi: 10.1186/s12 888-020-02960-y.

Sindi Putri Ayu, Finkah Sabillah, Nurhidayah Nurhidayah, Bilqis Salsabila, Risma Anita Puriani, and Rizki Novirson, “Deteksi Dini Perilaku Depresi pada Siswa Sekolah Menengah,” WISSEN : Jurnal Ilmu Sosial dan Humanio ra, vol. 3, no. 2, pp. 191–201, May 2025, doi: 10.62383/wissen .v3i2.754.

W. Luo, B.-L. Zhong, and H. F.-K. Chiu, “Prevalence of depressive symptoms among Chinese universi ty students amid the COVID-19 pandemic: a systematic review and meta-analysis,” Epidemiol. Psy-chiatr. Sci., vol. 30, p. e31, Mar. 2021, doi: 10.1017/S20457960210 00202.

J. Deng et al., “The prevalence of depressive symptoms, anxiety symptoms and sleep disturbance in higher education students during the COVID-19 pandemic: A sys-tematic review and meta-analysis,” Psychiatry Res., vol. 301, p. 113863, Jul. 2021, doi: 10.1016/j.psychres.2021.113863.

A. Abbas, G. B. Rincón, L. Wang, and M. K. Siddiqui, “Investigating the Impact of Technostress on Per-ceived Hybrid Learning Environ-ment and Academic Performa nce,” Electronic Journal of e-Learning, vol. 21, no. 4, pp. 382–393, Nov. 2023, doi: 10.3419 0/ejel.21.4.3084.

M. Abdullah and N. Negied, “De-tection and prediction of Future Mental disorder from Social Media Data using Machine Learning, Ensemble Learning, and Large Language Models.,” IEEE Access, 2024, doi: 10.1109/ACCESS.2024.3406469.

M. Tabares Tabares, C. Vélez Ál-varez, J. Bernal Salcedo, and S. Murillo Rendón, “Anxiety in young people: Analysis from a machine learning model,” Acta Psychol. (Amst)., vol. 248, Aug. 2024, doi: 10.1016/j.actpsy.2024.104410.

N. Wang, R. Kamil, S. A. R. Al-Haddad, N. Ibrahim, and Z. Zhao, “Enhancing AI Depression Detec tion Using Transfer Learning,” Contemporary Mathematics, pp. 3054–3080, May 2025, doi: 10.37256/cm.6320256184.

V. Borisov, T. Leemann, K. Seßler, J. Haug, M. Pawelczyk, and G. Kasneci, “Deep Neural Networks and Tabular Data: A Survey,” IEEE Trans. Neural Netw. Learn. Syst., vol. 35, no. 6, pp. 7499–7519, Jun. 2024, doi: 10.1109/TNNLS.2022.3229161.

A. N. S. Kinasih, A. N. Handaya-ni, J. T. Ardiansah, and N. S. Da-manhuri, “Comparative analysis of decision tree and rand om forest classifiers for structured data clas-sification in machine learning,” Science in Information Technology Letters, vol. 5, no. 2, pp. 13–24, Nov. 2024, doi: 10.31763/sitech.v5i2.1746.

K.-A. Allen, M. L. Kern, C. S. Rozek, D. M. McInerney, and G. M. Slavich, “Belonging: a review of conceptual issues, an integrative framework, and directions for fu-ture research,” Aust. J. Psychol., vol. 73, no. 1, pp. 87–102, Jan. 2021, doi: 10.1080/00049530 .2021.1883409.

A. Maleku et al., “The hidden minority: Discrimination and ment al health among international students in the US during the COVID‐19 pandemic,” Health Soc. Care Community, vol. 30, no. 5, Sep. 2022, doi: 10.1111/hsc .13683.

H. Shannon, K. Bush, P. J. Ville-neuve, K. G. Hellemans, and S. Guimond, “Problematic Social Media Use in Adolescents and Young Adults: Systematic Review and Meta-analysis,” JMIR Ment. Health, vol. 9, no. 4, p. e33450, Apr. 2022, doi: 10.2196/33450.

M. Kavitha, “Enhanced Cost-sensitive Ensemble Learning for Imbalanced Class in Medical Da-ta,” Journal of Electrical Systems, vol. 20, no. 7s, pp. 1043–1053, May 2024, doi: 10.5278 3/jes.3520.

G. A. Pradipta, R. Wardoyo, A. Musdholifah, I. N. H. Sanjaya, and M. Ismail, “SMOTE for Handling Imbalanced Data Problem : A Review,” in 2021 6th International Conference on Informatics and Computing, ICIC 2021, Institute of Electrical and Electronics Engine ers Inc., 2021. doi: 10.1109/ ICIC54025.2021.9632912

I. D. Mienye and Y. Sun, “Perfor-mance analysis of cost-sensitive learning methods with application to imbalanced medical data,” In-form. Med. Unlocked, vol. 25, p. 100690, 2021, doi: 10.1016/j.imu.2021.100690.

Published
2026-03-30
How to Cite
Adnan, M. S., Budi Santoso, I., & Crysdian , C. (2026). STUDENT DEPRESSION SCREENING BASED ON THE OPTIMUM DATA BALANCING AND RANDOM FOREST. JURTEKSI (jurnal Teknologi Dan Sistem Informasi), 12(2), 309 - 316. https://doi.org/10.33330/jurteksi.v12i2.4458
Section
Articles