NetScout Systems Inc.

09/22/2025 | Press release | Distributed by Public on 09/23/2025 08:07

How to Break Away from Legacy Troubleshooting and Accelerate MTTR

How to Break Away from Legacy Troubleshooting and Accelerate MTTR

IT professionals reveal challenges they still face.

Eileen Haggerty
September 22nd, 2025
RSS Feed

A couple of our recent blogs have shared findings from a survey administered at Cisco Live 2025 with 319 IT professionals participating. The first was about the top troubleshooting surprises we learned from the survey results, and the second shared details on how observability can help reduce mean time to resolution (MTTR).

In this blog post, we build on the survey findings with additional insights and key takeaways you can use to guide your observability strategy.

How Frequent Are IT Disruptions?

IT system disruptions, technical glitches, and network problems happen every day around the world; some we've highlighted in recent blog posts, and others are emerging now:

  • Airline groundings due to outages lasting three hours
  • A software bug that caused a two-day outage that impacted several state agencies and required closing offices early
  • A central bank in Africa that experienced a 24-hour network failure impacting customer-facing services such as the real-time gross settlement (RTGS) services used by banks and retailers for money transfers and payments
  • An infrastructure technology company that had a DNS issue affecting its user community for more than an hour

These examples highlight the fact that no company is immune from unplanned, unexpected network disruptions.

Our blog post regarding MTTR revealed that problem resolution for 80 percent of respondents from large companies took a few hours or more. In fact, 25 percent shared that took from a day to a week to restore service. And this was simply too long.

How Much Time Is Spent on MTTK?

Mean time to knowledge (MTTK) is one of four stages in the MTTR process. So, we asked the Cisco attendees participating in our survey what percentage of the total time was spent on that stage. Not surprisingly, it was significant.

For companies with 10,000 or more employees, more than 45 percent indicated that the MTTK stage consumed between 26 and 50 percent of the overall time; about a quarter of the respondents said it took them 51 to 75 percent of the total time; and nearly 9 percent believed it took more than 75 percent of the troubleshooting time (see Figure 1).

Figure 1: Respondents in companies with 10,000 or more employees answered the question regarding what percentage of time was spent on MTTK stage of the troubleshooting process at their company.

Keep in mind, these IT professionals have already noted that troubleshooting is often lengthy, with 80 percent saying it takes their organization a few hours or more.

When Is It Time to Launch a "War Room"?

The increased complexity of today's infrastructure has been due, in part, to the explosion in the number of vendors involved in an enterprise network. This has compounded the challenges of problem isolation and resolution when issues occur. A long-used technique when quick resolution eludes an IT organization is to engage a war room that brings together multiple IT team members along with vendors involved in the affected service delivery.

Respondents in companies with 10,000 or more employees indicated that the majority of their organizations (73 percent) initiate their war rooms within the first day of an unresolved problem (see Figure 2). War rooms have been criticized for ineffectiveness, protracted troubleshooting, and lack of a single source of information for all participants to use for troubleshooting. Despite this, it was surprising that the survey found only 3.5 percent of participants reported their companies had discontinued use of war rooms as a troubleshooting tool.

Figure 2: The majority of respondents in companies with 10,000 or more employees (73 percent) initiate war rooms for troubleshooting within the first day.

What Data Sources Are Used for Troubleshooting?

The survey included questions on data sources in use by the participants' organizations. Metrics, events, logs, and traces (MELT) was the No. 1 data source as identified by 42.1 percent of the participants. This was followed by packet captures/pcaps (26.3 percent), flow data/e.g., NetFlow (19.3 percent), and deep packet inspection (DPI), at 9.6 percent.

Because there is a difference in what is used for troubleshooting and what IT professionals would like to use, the survey asked respondents if it was important/valuable to leverage DPI in their observability process. Analysis of the responses showed that participants in companies with 10,000 or more employees had a significantly higher perceived value, with 41.6 percent responding that DPI is very important/valuable as part of their observability process (see Figure 3).

Figure 3: The majority of respondents in companies with 10,000 or more employees (83 percent) believe DPI is important or very important in understanding issues better and solving problems faster.

Need for Data to Monitor, Troubleshoot, and Optimize Infrastructure Environments

The survey did not directly ask the participants why they thought DPI was very important in their observability process. However, there could be several reasons that contributed to this input.

  • The complexity and traffic volumes involved in networks supporting 10,000 or more employees may require DPI-level details.
  • The number of vendors and point tools that currently exist in networks supporting large employee communities has proven ineffective, inefficient, and at times conflicting. The IT process would benefit from DPI as a single source of truth across the overall infrastructure.
  • The lengthy outages and frequent disruptions have had an impact on revenue, costs, and reputation.
  • Tool sprawl has become costly and ineffective, prolonging troubleshooting and increasing MTTR.
  • War rooms have proven inefficient and lack collaboration with IT teams and vendors, particularly when there is not a single common set of data for all to rely on.

As a result, DPI proves more valuable in larger organizations, where complex infrastructures, numerous vendors and tools, and a growing number of applications and services mean greater risks to productivity, revenue, costs, customer service, and reputation when network disruptions take longer to identify and resolve.

In fact, we highlighted recent research by Enterprise Management Associates (EMA) in its April 2025 study "Enterprise Strategies for Hybrid, Multi-Cloud Networks" that touched on what data was used and what was needed by IT teams. EMA's research revealed that 49 percent of IT professionals in hybrid, multicloud environments view network flow data as critical for monitoring, troubleshooting, and optimizing their cloud networks. Another 38 percent identified packet data as equally important. An interesting revelation in this study was that only 29 percent of the respondents reported being fully satisfied with their current monitoring tools.

Summary and Actions

Because this blog concludes the information revealed in the survey conducted at Cisco Live 2025, we wanted to close with a review of the revelations and establish some actions that might be taken in light of the implications from these findings. Some key results include the following:

  • All too often, employees, not existing monitoring tools, are reporting a network or application disruption (61.8 percent, according to IT executives surveyed).
  • It is taking too long to resolve these problems, which risks greater negative impact on revenue, costs, employee productivity, and customer care (81 percent of cases required at least a few hours or as long as a week to resolve the problem).
  • Part of this lengthy resolution time is likely due to excessive time being spent on discovering the "why" and "where" (mean time to knowledge, or MTTK) of the problem. (More than 70 percent of the respondents reported 25 to 75 percent of the overall resolution time was spent in the MTTK stage, and 8.8 percent indicated it was more than 75 percent of the time.)
  • IT organizations use several different data sources in observability tools today; however, DPI is leveraged less than 10 percent of the time, according to participants in companies with 10,000 or more employees.

Yet when asked about the role of DPI in their observability process, employees in large organizations reported it as important or very important, with 83.2 percent affirming its value. Ways to impact early warning identification (mean time to identify, or MTTI) and rapid troubleshooting with DPI (MTTK) were identified as essential actions that can dramatically reduce MTTR. Given that the current processes and outcomes leveraging legacy tools and incomplete datasets are less than optimal, isn't it time to give NETSCOUT nGenius solutions for observability a try?

To learn more about observability strategies, read "Comprehensive Observability Dramatically Improves Troubleshooting Time."

NetScout Systems Inc. published this content on September 22, 2025, and is solely responsible for the information contained herein. Distributed via Public Technologies (PUBT), unedited and unaltered, on September 23, 2025 at 14:07 UTC. If you believe the information included in the content is inaccurate or outdated and requires editing or removal, please contact us at [email protected]