Swiis Federal Institute of Technology Zürich

06/09/2026 | News release | Distributed by Public on 06/10/2026 00:20

How AI chatbots become better learning coaches

How AI chatbots become better learning coaches

Many AI systems answer questions in a matter of seconds - and, in the process, often prevent people from doing exactly what learning is all about: thinking for themselves. Machine learning expert Jakub Mačina is therefore developing models that don't provide pupils with finished solutions, but rather help them to develop their understanding step by step.

Good AI tutors ask questions rather than giving away the answer. Screenshot of TutorRL, the teaching model developed by researchers at ETH Zurich. (Image: Adobe Stock / Montage ETH Zurich)

In brief

  • ETH researchers are developing AI models that guide learners in their thinking rather than providing them with finished solutions.

  • The model known as TutorRL is freely available and, in the long term, is intended to act as a learning coach in mathematics and other STEM subjects.

  • The model and the MathTutorBench benchmark evaluate how effectively language models bring together technical and pedagogical abilities.

Just five years ago, it was unthinkable that upper secondary pupils would be learning with AI on a regular basis. Today, this is an everyday reality for many. According to a representative survey from 2024, more than two-thirds of 12- to 19-year-olds in Switzerland regularly use AI for school. There are now specialised models for this purpose, such as LearnLM from Gemini or "Study mode" from OpenAI, as well as a series of small providers that have specialised in the development of AI tutors, such as Khanmigo, "Synthesis Tutor" or "Squirrel AI". Does this mean that teachers will soon be replaced by artificial intelligence?

Guiding people's learning rather than simply answering questions

Postdoctoral student Jakub Mačina researches how large language models (LLMs) can be used for teaching and learning. He works at the interface between artificial intelligence and learning sciences, collaborating with Professor of Computer Science Mrinmaya Sachan and learning scientist Manu Kapur.

Mačina wants to establish how LLMs can become pedagogically valuable learning coaches. "Our aim is not to replace teachers, but rather to apply AI in teaching so that they can make their work more efficient," says the researcher. Most LLMs are still poorly suited to learning. "They're fine-tuned to generate answers and solutions rather than to support users in the learning process." However, this runs counter to the aim of getting pupils to think for themselves and to actively engage with specific subject matter. Even if you explicitly prompt LLMs to help with learning instead of providing a finished solution, the results are generally unsatisfactory, says the researcher.

According to Mačina, good teachers have three key abilities: "They have expertise in their subject, they know where pupils hit stumbling blocks and where learning problems occur - and they have the pedagogical skills to guide pupils in solving these problems." With a view to using such criteria to test the suitability of various LLMs for teaching, Mačina worked with researchers from TU Darmstadt to develop a benchmark for the teaching of mathematics, known as MathTutorBench. Based on conversations with teachers and other data relating to the teaching process, the team developed a points system for specific teaching abilities that allows the comparison of LLMs. MathTutorBench analyses and compares the responses from the LLMs with those of teachers and rates them accordingly. It is freely available to download as an open source file and is used by researchers and teaching developers to compare the quality of different models.

"What we really want is satisfactory collaboration between humans and the LLMs - not for the models to do the thinking for us."
Machine learning expert Jakub Mačina

Mačina has used MathTutorBench to test the LLMs for learning from OpenAI and Google, among others. This testing revealed significant differences. "We often see that there's a trade-off between the various criteria - one model might perform very well in terms of mathematics expertise, but not in terms of its pedagogical abilities. In another model, it might be the other way around. They usually fail to achieve a balance." It's also striking, he says, that most models lose track and drift off at some point when dealing with multi-stage answers.

"A better balance between expertise and teaching abilities than traditional LLMs"

In a second project with the same team, Mačina developed a proprietary LLM that aims for a better balance between pedagogics and didactics, on the one hand, and technical expertise, on the other. He trained the model by having a virtual pupil interact with a virtual teacher in multiple steps, making do without expensive training data. The model learns from the simulated interaction and with feedback from a second model, which monitors the teaching/learning process and evaluates the virtual teacher's responses. The LLM is therefore learning on a continuous basis in a process known as reinforcement learning.

"The big advantage is that we don't need huge quantities of data and can make do with far smaller language models," Mačina explains. For comparison, the latest LLMs from OpenAI or Google have hundreds of billions to trillions of parameters. To put it simply, parameters are a measure of an LLM's cognitive abilities. Mačina's model makes do with just seven billion parameters.

"With our model, we see that there's a better balance between technical expertise and teaching abilities than with traditional LLMs." It is also less prone to drifting off course, he adds - even in the case of a learning interaction with 20 steps, it doesn't lose track. During the learning process, the model can also be asked about the reasons for certain answers and decisions. "This allows teachers to track and monitor the learning process," says Mačina

Will there soon be an AI tutor for Master's students?

Mačina's LLM is now freely available under the name TutorRL and has already been downloaded more than a thousand times. "To date, TutorRL is one of the few LLMs that are freely accessible and optimised for learning," he says. However, he admits that the model is yet to be tested and evaluated with learners in a classroom setting. To this end, he is currently looking for partners at schools. So far, the system also only works for teaching maths at upper-secondary and early Bachelor's level. However, Mačina can certainly imagine that, in the longer term, the model will also be used in other STEM (science, technology, engineering and mathematics) subjects and will also be sufficiently powerful for use at the Master's level.

In his view, however, the results are not only relevant to teaching, but also of broader value for the further development of artificial intelligence. Collaborative problem-solving, like that of TutorRL, will be essential for many areas of work in the future, as human judgement will continue to be a vital component. "What we really want is satisfactory collaboration between humans and the LLMs - not for the models to do the thinking for us," says Mačina.

References

Macina J, Daheim N, Hakimi I, Kapur M, Gurevych I, Sachan M: MathTutorBench: A Benchmark for Measuring Open-ended Pedagogical Capabilities of LLM Tutors. In: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. DOI: external page 10.18653/v1/2025.emnlp-main.11

Dinucu-Jianu D, Macina J, Daheim N, Hakimi I, Gurevych I, Sachan M: From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning. In: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. DOI: external page 10.18653/v1/2025.emnlp-main.15

Newsletter subscription

Get the latest ETH News everyday
Swiis Federal Institute of Technology Zürich published this content on June 09, 2026, and is solely responsible for the information contained herein. Distributed via Public Technologies (PUBT), unedited and unaltered, on June 10, 2026 at 06:20 UTC. If you believe the information included in the content is inaccurate or outdated and requires editing or removal, please contact us at [email protected]