01/28/2026 | News release | Distributed by Public on 01/28/2026 14:02
Researchers at Stony Brook University are working to improve how artificial intelligence systems think through multi-step problems, which can help AI perform better in real-world environments.
Jiawei "Joe" ZhouThe project, led by Jiawei "Joe" Zhou, assistant professor in the Department of Applied Mathematics and Statistics and the Department of Computer Science,and Niranjan Balasubramanian, associate professor in the Department of Computer Science, focuses on improving the AI thought process through long-horizon tasks.
Long-horizon tasks are challenging not only due to the number of steps involved, but also because they span extended amounts of time. The research is supported by $70,000 in cash funding and $50,000 in AWS promotional credits through an Amazon Research Award.
"Many real-world tasks require agents to reason over extended time horizons while interacting with complex environments," Zhou said. "As those reasoning chains grow longer, they tend to become more diffuse and error-prone, which ultimately limits how useful these systems can be in practice."
AI agents built on large language models (LLMs), generate and reason using human language. An AI system, for example, that is powered by a large language model might be asked to plan a trip by booking a flight, reserving a hotel and emailing the itinerary to a user. To complete the task, the system must reason through multiple steps and remember earlier decisions, in addition to adjusting its plan along the way.
"This work exemplifies the kind of high-impact, interdisciplinary research that defines Stony Brook Engineering," said Andrew C. Singer, dean of the College of Engineering and Applied Sciences at Stony Brook University. "By tackling the fundamental challenge of how AI systems reason over time and complexity, this team is advancing capabilities that are essential not only for next-generation artificial intelligence, but for real-world applications where efficiency, adaptability, and reliability truly matter."
While today's AI models can perform impressively on short tasks, they often struggle when decisions must be made across extended sequences of actions. In real-world settings AI agents must plan, adapt and respond to feedback over many steps.
Niranjan BalasubramanianBalasubramanian emphasized that simply increasing computing power is not a sustainable solution.
"Scaling up models alone isn't enough," Balasubramanian said. "What we really need are smarter reasoning strategies that allow agents to focus on what matters and discard unnecessary intermediate steps."
The research will build on AppWorld, a large-scale interactive environment developed by the Stony Brook team to simulate realistic digital tasks. AppWorld allows AI agents to operate across nine commonly used applications, such as email, payments, and productivity tools, through hundreds of application programming interfaces (APIs). APIs are tools that let different software programs communicate and share information with each other.
Tasks in AppWorld can require up to 40 steps that involve tens of millions of small pieces of text (tokens), which makes it particularly demanding for AI systems. Despite recent advances in language models, current success rates on these tasks remain under 50 percent.
"AppWorld lets us study what actually goes wrong when agents try to operate in realistic, interactive environments," Balasubramanian said. "It exposes the limits of long-range reasoning in a way that static benchmarks simply can't."
To address these challenges, the team is developing reinforcement learning (RL)-based methods that train AI agents to compress and refine their reasoning processes. Instead of retaining every intermediate step, agents learn to identify and preserve only the most relevant information needed to complete a task accurately.
"Our goal is to teach agents how to think more efficiently, not only longer," Zhou said. "By compressing reasoning chains and pruning redundant steps, we can reduce computational cost while actually improving decision quality."
The approach rewards agents for both task success and reasoning efficiency, encouraging them to balance accuracy with speed and resource use. According to the researchers, this method also improves an agent's ability to adapt when conditions change mid-task.
"We're especially interested in how agents recover from errors or unexpected feedback," Balasubramanian said. "Efficient reasoning allows them to revise decisions without having to replay or store everything they've done before."
Beyond performance gains, the project addresses growing concerns about the cost and scalability of large AI systems. By reducing token usage and memory demands, the researchers aim to make advanced AI agents more accessible and sustainable.
"All of our tools, datasets, and evaluation frameworks will be released openly," Zhou said. "We want this work to benefit the broader research community and accelerate progress in agentic AI."
With its emphasis on real-world interaction and efficiency, this research positions Stony Brook University at the forefront of efforts to make advanced AI agents more capable in complex situations.