GPT-4 Tutoring in Nigeria Boosts English Scores, Offers Scalable, Cost-Effective Model

A World Bank-led study in Nigeria found that AI-powered tutoring using GPT-4 significantly improved students’ English proficiency, achieving gains equivalent to two years of traditional schooling. The low-cost, scalable intervention showed especially strong results for girls and digitally literate students.


CoE-EDP, VisionRICoE-EDP, VisionRI | Updated: 25-05-2025 09:38 IST | Created: 25-05-2025 09:38 IST
GPT-4 Tutoring in Nigeria Boosts English Scores, Offers Scalable, Cost-Effective Model
Representative Image.

In a groundbreaking initiative led by the World Bank’s Education Global Department, with support from the Mastercard Foundation, a team of researchers has unveiled the remarkable potential of generative artificial intelligence (AI) to transform education in low-resource settings. Their study, From Chalkboards to Chatbots, tested whether Microsoft Copilot, powered by GPT-4, could serve as an effective virtual tutor for secondary school students in Nigeria. Implemented over six weeks in nine public schools in Benin City, the intervention used a randomized controlled trial (RCT) to examine how AI-powered after-school sessions impacted learning outcomes. The results show not only significant academic gains, especially in English, but also powerful insights into cost-effectiveness, equity, and scalability, positioning the initiative as a pioneering model for AI in education.

AI Tutoring Delivers Real Results

The intervention centered on after-school computer lab sessions where first-year senior secondary students engaged with Copilot for English language instruction. Over twelve 90-minute sessions, students interacted with the chatbot in pairs, guided by teachers trained to supervise without offering direct instruction. Instead, the sessions followed carefully designed prompts aligned with Nigeria’s national curriculum and rooted in learning science principles such as retrieval practice, elaborative interrogation, and contextual examples. Despite infrastructural setbacks like internet disruptions and power outages, students maintained high engagement.

The academic gains were striking. Students in the treatment group scored 0.31 standard deviations higher on the final standardized assessment than their control-group peers. In English, the program’s primary focus, students improved by 0.24 standard deviations, while their performance on school-wide third-term English exams rose by 0.21 standard deviations, indicating that the benefits extended beyond the immediate scope of the intervention. These effects place the program among the top-performing education interventions globally, especially in secondary education, where gains are typically harder to achieve.

Gender, Equity, and the Digital Divide

The study revealed compelling insights into who benefited most. While students across all performance levels saw improvements, the program had the strongest effects on girls, helping bridge pre-existing gender gaps in achievement. Girls, especially those from a single-gender school with lower baseline scores, showed higher learning gains relative to boys. Students with stronger prior academic performance also benefited more, as did those from higher socioeconomic backgrounds, likely due to greater familiarity with digital tools.

These findings point to both promise and peril. On the one hand, AI tutoring can close gender gaps; on the other, it risks exacerbating inequality if digital access and literacy aren’t addressed. The intervention clearly worked best for students already somewhat comfortable with technology, raising an urgent call for policies that improve infrastructure, digital access, and early exposure to technology, especially in rural and low-income areas.

Attendance Matters: The More, The Better

Beyond broad treatment effects, the researchers conducted a dose-response analysis to measure how learning gains correlated with program attendance. The average student attended about 72% of sessions, and each additional day of attendance increased scores by 0.031 standard deviations. Extrapolating these gains suggests that a full academic year of AI-supported tutoring could lead to learning gains of up to 2.23 standard deviations, a result nearly unprecedented in education research.

Interestingly, the early sessions had little impact, as students were acclimating to the format and technology. But after this initial adjustment, gains grew steadily with each session. Even under a pessimistic scenario where students attended only half the sessions, predicted gains remained high, around 1.2 standard deviations, highlighting the importance of consistent participation and the power of sustained exposure to AI learning environments.

Unmatched Cost-Effectiveness in Education

Perhaps the most transformative takeaway is the program’s cost-efficiency. At $48 per student for the six-week pilot, the intervention generated learning equivalent to two years of traditional schooling in Nigeria. Scaling it to a full academic year would cost just $124 per student, yet deliver even higher learning returns. Using metrics such as Equivalent Years of Schooling (EYOS) and Learning-Adjusted Years of Schooling (LAYS), the researchers estimate between 0.3 and 0.9 LAYS per student, placing the intervention in the top tier globally in terms of impact per dollar.

The benefit-cost ratio ranged from 161 to 260, depending on how wage returns were calculated, far exceeding returns from traditional high-dosage tutoring programs in the United States, which often deliver ratios under 10. This impressive efficiency is largely thanks to the use of a freely available, off-the-shelf tool, minimizing overhead while eliminating the need for expensive adaptive software and proprietary content libraries.

A Vision for Scalable, Equitable AI in Education

The study offers compelling policy implications. First, it demonstrates that generative AI, when embedded within curriculum and supervised by trained teachers, can deliver personalized learning at scale. Importantly, the researchers argue that the success wasn’t just due to AI, but to the thoughtful integration of technology, pedagogy, and teacher oversight. This hybrid approach, combining AI with human facilitation, positions technology as a force multiplier rather than a replacement.

Second, the pilot reinforces the need for systemic investments in teacher training, digital infrastructure, and AI literacy, starting early in students’ education. Policymakers must ensure that AI doesn’t become yet another tool that benefits only the privileged. The potential is clear: with responsible use and inclusive design, AI can help democratize access to quality education, even in regions where teacher shortages and resource constraints have long held back progress.

The research is a blueprint for the future of learning in the Global South. As AI capabilities continue to evolve, this study sets a precedent for how countries can harness cutting-edge technology not just to catch up with global standards, but to leap ahead.

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback