AI early warning system could help universities spot at-risk students sooner
AI could help universities spot struggling students weeks before final grades are known, but simpler machine learning may outperform deep learning when student datasets are small and imbalanced, according to a new study by Chen-Chung Chi of Tamkang University in Taiwan.
The study, AI-Driven Sustainable Transformation of the Educational Supply Chain: Comparative Evaluation of Machine Learning Models for an Early Warning System and Design-Level Frameworks for Institutionalization and Impact Assessment, published in Sustainability, evaluates AI models for predicting student failure in a programming course. The research finds that a Random Forest model using SMOTE detected at-risk students more reliably than GRU and LSTM models, offering a practical early warning tool for higher education institutions seeking to reduce attrition and support sustainable student success.
Student attrition becomes a supply chain risk in higher education
The paper argues that universities often rely on delayed signals, including midterm and final examinations, to identify struggling students. By then, the best window for intervention may have closed. An early warning system can change that timing by identifying risk while there is still enough time for tutoring, advising or instructional adjustment.
Tamkang University already uses the Smart PASS platform, which brings together its learning management system, advising tools and student success functions. The study focuses on improving the existing Performance and Engagement Diagram system, which classifies students through fixed thresholds based on performance and engagement. This rule-based system has practical limits. Fixed thresholds can be sensitive to class size, outliers and early-semester sparsity, when many students have not yet generated enough activity data.
The proposed AI system is designed to replace the fixed classification step with a model-generated failure probability while preserving the broader dashboard and notification workflow.
Random Forest detected at-risk students earlier than deep learning models
The study tested learning trajectory data from 188 students in one programming course across four semesters. The first two semesters, covering 90 students, were used for training. The next two semesters, covering 98 students, were used for temporal validation, meaning the model was tested on later cohorts rather than a random split of the same pool.
The dataset included 30 students who failed, creating a strong class imbalance. Because the goal of an early warning system is to identify students at risk of failing, the study treated the fail category as the positive class. This made recall a key measure of whether the system was catching students who needed help.
Three models were compared: Random Forest with SMOTE, GRU and LSTM. SMOTE was used to address imbalance for the Random Forest model by creating synthetic minority examples in the training data. The deep learning models used replicated minority cases with small Gaussian noise. The results favored the Random Forest model. Across prediction weeks 6 to 16 and both validation semesters, Random Forest achieved 85.59 percent accuracy, 91.19 percent recall for failing students, 58.89 percent precision and a 70.36 percent F1 score.
Most importantly for intervention, the Random Forest model provided usable warnings by Week 6, with fail-recall of 87.86 percent. This means the system could identify most students who eventually failed while leaving roughly 12 weeks for instructors or advisors to intervene.
The LSTM and GRU models performed worse on the at-risk group despite stronger headline accuracy in some cases. During early weeks, both deep learning models often collapsed toward the majority class, meaning they tended to predict students as passing and missed failing students. LSTM became usable only around Week 14, while GRU remained less reliable for most of the semester.
Deep learning models are often assumed to be better for time-series data, but the study shows that this is not always true in small, imbalanced educational datasets. In this setting, a simpler and cheaper model was more useful for the real operational goal: catching students early enough to help them.
Weekly learning patterns matter for early intervention
The study also compared different ways of representing student activity data. Original weekly features captured what students did each week and cumulative features averaged activity over time. A mixed approach combined both. Original weekly features produced the strongest sensitivity to failing students, with fail-recall of 90.36 percent. Cumulative and mixed features improved precision and F1 scores but sacrificed some recall.
For student support, missing an at-risk student can be more costly than sending an unnecessary check-in to a student who is doing fine. This makes recall especially important. As a result, the study favors original weekly features when the priority is early detection.
The research also finds that cumulative features did not dilute information. In fact, they showed stronger average separation between passing and failing students. However, their smoother profile made them less responsive to sudden drops or changes in behavior. Original weekly features were more sensitive to abrupt signals that a student was slipping.
The study examined activity types logged in the iClass learning management system, including homework, forum participation, exams, custom instructor-defined activities and web links. Random Forest importance scores emphasized homework, forum and exam-related signals, while LSTM permutation analysis emphasized exams, homework and custom activities.
Both models agreed that web link click-through data had little predictive value. The study explains that opening a linked resource does not show how deeply a student engaged with the material, suggesting that universities should be careful about treating all digital traces as meaningful learning signals.
For instructors, the key takeaway is that consistent submission behavior, assessment performance and course-specific activity patterns are more useful than simple clicks. An ideal early warning system should therefore focus on learning behaviors that show effort, comprehension and task completion rather than superficial platform activity.
Implications and limitations for AI in student success systems
The findings suggest that universities do not need complex deep learning systems to begin improving early student support. A Random Forest model with careful imbalance handling may offer a more practical starting point for small course-level deployments. Many universities do not have large datasets, advanced AI teams or the infrastructure needed to train complex models. A lower-cost, reproducible approach that works on small datasets may be more realistic for early warning systems.
The study also proposes a three-tier institutionalization pathway:
- Level 1 involves instructor pilots, where individual teachers use the system and track whether flagged students receive support.
- Level 2 moves to departmental use, where academic advisors and curriculum teams can monitor risk patterns across courses.
- Level 3 embeds the system into the broader university platform for wider use.
A separate impact framework is also proposed for future evaluation, which covers student outcomes, resource efficiency, organizational learning and talent output. Notably, these are design-level proposals, not measured outcomes from the present study.
Instead of claiming that the system has already improved graduation rates, reduced institutional costs or delivered measurable long-term social impact, the study shows that early risk detection is technically feasible in a single course and provides a framework for testing broader effects later.
The study also acknowledges some limitations. It is based on one programming course at one university, with 188 students and only 30 failure cases, so the findings cannot automatically be generalized across disciplines, universities or learning management systems. The binary pass-fail outcome also limits the analysis. Student performance is more complex than a simple pass or fail label, and future work could examine different achievement levels or types of academic risk. The study also found semester-to-semester variation, showing that models may need regular updates as cohorts and teaching patterns change.
The deep learning models were tested with relatively simple imbalance handling. More advanced techniques, including focal loss, class-weighted training, 3D SMOTE, attention mechanisms or transformer-based models, could change the comparison in future research. Privacy and governance will also matter if such systems move from course pilots to campus-wide use. Early warning systems deal with sensitive student data and can affect how instructors, advisors and institutions perceive learners. The proposed rollout path implies that transparency, auditing and careful intervention design will be necessary.
- FIRST PUBLISHED IN:
- Devdiscourse

