dc.description.abstract | Around the world, depression is a prevalent mental illness and it affects the way people think, feel, talk and conduct their daily activities. The stigma associated with depression makes it to be misdiagnosed in the mild or the moderate stages and in its worst state, depression can lead to disability or suicide. Numerous investigations have explored the frequency of depression using different machine learning techniques with results varying across study groups. However, machine learning models have not been utilized to model the prevalence of depression among Kenyan university students. Therefore, this study employed machine learning techniques to model the prevalence of depression among Murang’a University of Technology (MUT) students. The specific objectives were to model the prevalence of depression among MUT students using logistic regression, random forest classifier and support vector machine algorithms, to assess the efficacy of these algorithms in accurately modeling the prevalence of depression among MUT students and determine the best machine learning technique for modeling. A sample of 1448 students from a population of 10,127 students participated in the study by completing questionnaires on sociodemographic and other factors associated with depression. The questionnaires were administered via online platforms. Participants were selected using proportionate stratified random sampling ensuring that a representative sample was chosen from each school. The data gathered was examined using descriptive and inferential statistics. Depression was measured using the Patient Health Questionnaire scale (PHQ-9). Using a cut-off point of 10; 25.97% students had depressive symptoms. This comprised of 19.61% moderate symptoms and 6.35% severe symptoms. Depressive symptoms were significantly more common among male students (𝑂𝑅 = 1.86,𝑝−𝑣𝑎𝑙𝑢𝑒 = 4.48×10−5,95% 𝐶𝐼:0.32,0.92), students in their third (𝑂𝑅 = 1.84,𝑝−𝑣𝑎𝑙𝑢𝑒 = 0.00734,95% 𝐶𝐼:0.17,1.06) and fourth (𝑂𝑅 = 2.17,𝑝−𝑣𝑎𝑙𝑢𝑒 = 0.00407,95% 𝐶𝐼:0.25,1.31) years of study. The variables that were significantly related to depression were, gender (𝑂𝑅 = 0.60,𝑝−𝑣𝑎𝑙𝑢𝑒 = 0.00363 ,95% 𝐶𝐼: −0.85,−0.17), social support network (𝑂𝑅 = 0.40,𝑝−𝑣𝑎𝑙𝑢𝑒 = 4.78×10−10,95% 𝐶𝐼: −1.19,−0.62), financial situation (𝑂𝑅 = 0.71,𝑝−𝑣𝑎𝑙𝑢𝑒 = 0.0016 ,95% 𝐶𝐼: −0.55,−0.13) and past abuse, trauma and neglect (𝑂𝑅 = 2.18,𝑝−𝑣𝑎𝑙𝑢𝑒 <2×10−16,95% 𝐶𝐼:0.60,0.96). The confusion matrix criteria were used to select the best machine learning algorithm in modeling depression prevalence among MUT students. The random forest algorithm demonstrated superior performance compared to support vector machine and logistic regression, with accuracy at 98.68%, sensitivity at 95%, specificity at 100%, a positive predictive value of 100%, and a negative predictive value of 98.24%. Following closely was the support vector machine and lastly logistic regression. In conclusion the machine learning models showcased remarkable efficiency in classifying depression instances based on a diverse set of features. The study recommends the utilization of RF algorithm to model the prevalence of depression among university students. Implementing targeted interventions founded on identified risk and protective factors and exploring the long-term outcomes of these interventions would contribute to the evolving field of mental health research within academic settings. | en_US |