Cornell University

04/06/2026 | Press release | Distributed by Public on 04/06/2026 08:54

Decoding great teaching and more: New app analyzes conversational data at scale

A new open-source app developed by the National Tutoring Observatory (NTO) offers researchers and practitioners a reliable and cost-effective way to analyze large datasets of text with the help of agentic AI, an advanced form of artificial intelligence that can make decisions and solve problems.

Co-developed with the instructional design and product engineering firm FreshCognate, the app allows users to upload transcripts, such as from tutoring sessions, and applies an agentic AI pipeline to annotate them. This type of analysis could identify key tutoring moves that boost student performance.

"Human experts have been the gold standard for 'coding' or annotating conversational data, but it is a costly and frankly arduous task for humans," said Rene Kizilcec, associate professor of information science in the Cornell Ann S. Bowers College of Computing and Information Science. "Effectively designed AI systems can excel at these kinds of repetitive tasks under human guidance."

The NTO is a collaboration of researchers, educators and tutoring platforms led by Kizilcec with researchers at the Massachusetts Institute of Technology and Carnegie Mellon University (CMU). The collaboration seeks to collect tutoring data at scale and create the largest open access database of teaching data.

"In talking with researchers and tutoring providers, we quickly realized that everyone has different questions to investigate in the data. We needed a fast and scalable way to enable them to annotate and analyze the data from many perspectives," said Kizilcec, who directs the Future of Learning Lab. "In the context of tutoring, we don't really know what happens in the interaction between teachers and students that leads to meaningful improvements in student outcomes. There's a whole lot we can learn if people can analyze these data at scale and build tutoring products that better serve kids."

The new tool is named Sandpiper - a play on the word pipeline and also a nod to Cornell's Lab of Ornithology. The team will launch the app in an open-invitation webinar () on April 9, when the tool will become publicly available.

Sandpiper allows users to upload thousands of transcripts, apply orchestrated AI models to annotate them reliably at the level of each session, or even each utterance, and adjust their annotation instructions quickly based on how well the AI annotations agree with expert annotations. For example, in tutoring transcripts, the tool can identify when a tutor elicits deep thinking; offers assistance, like breaking up a task into smaller steps; or changes gears to tailor the assistance to meet a student's specific needs.

"This works for millions of hours of recording, with exponentially more text," said Rachel Slama, associate director of the Future of Learning Lab. "It was a really timely opportunity to harness the power of AI, but do it in a responsible, regimented way as a researcher-AI collaboration."

The NTO has already partnered with researchers at the University of Pennsylvania, Stanford, Vanderbilt and CMU who are test driving the app and giving feedback on how to improve its functionality.

"It's a game changer," said Ryan Baker, director of The Penn Center for Learning Analytics, which is piloting the app to analyze tutoring strategies. "The NTO's tools can speed up every part of the research process for us. We can do higher precision research faster and better - from labeling data to building and validating our models to documenting our approach and findings in replicable ways."

Currently, the team is adding functionality to the app to anonymize uploaded transcripts, safely process audio, video and whiteboard data, and enable analysis of gaze tracking and other non-verbal cues. They are also improving the ease of generating prompts for the AI models.

Cornell University published this content on April 06, 2026, and is solely responsible for the information contained herein. Distributed via Public Technologies (PUBT), unedited and unaltered, on April 06, 2026 at 14:54 UTC. If you believe the information included in the content is inaccurate or outdated and requires editing or removal, please contact us at [email protected]