Dynatrace Inc.

08/13/2025 | Press release | Distributed by Public on 08/13/2025 09:52

Remediation intelligence: Accelerate MTTR with AI-powered context and knowledge

There's a hidden barrier to deep understanding of your revenue, performance, and bottom line that has nothing to do with tools or telemetry. It's organizational knowledge, the remediation know-how that's scattered across hundreds of documents, private notebooks, dashboards, and in the minds of your engineers. This implicit, loosely documented knowledge is invisible to machines and rarely timely for humans. So, when a high-priority incident hits, this knowledge gap fills your war rooms with engineers tasked with solving problems they don't own.

Delivering reliable, business-critical applications to production is more complex than ever. The growing complexity and granularity of modern software systems and the trend to shifting more and more responsibilities to development teams (shift left) lead to increased pressure on your development teams.

In fact, traditional development teams now have a wider set of responsibilities; they must be highly skilled and educated in a multitude of domains. The responsibilities of these teams now range from specification, planning, testing, risk assessment, cost estimates, and deployment, to load testing, UI testing, integration testing, and on-call responsibilities for their software services.

While development teams need to be literate in all new technology stacks, cloud resources, and quality assurance methods, they also need to work with numerous tools.

The invisible bottleneck in your remediation process

The whole shift-left trend has pushed operational responsibility closer to development, expanding workloads to include on-call rotations, more frequent deployments, and growing expectations around uptime. When incidents occur, dozens of engineers are dragged into war rooms to perform analysis of the underlying root causes.

Reducing the Mean Time to Repair (MTTR) is essential to business continuity and customer satisfaction. Without access to the right knowledge at the right time, even skilled teams lose momentum. In high-pressure situations caused by critical incidents, it becomes more important than ever to ensure that all relevant information is shared with every role involved. This requires the most automated and intelligent methods available for effectively collecting, analyzing, and distributing information.

The on-call engineer's journey

Take Omar, an SRE; an automated voice jolts him awake to summon him into a war room in the middle of the night after a routine update caused a spike in failed requests for a cloud-based payment service. It's a P1 incident. With each passing minute, merchants are losing value, support requests surge, and customers are complaining. Dozens of caffeinated engineers are already in the war room.

Logs point to timeouts, but this is just a symptom; the real problem is somewhere else. Reading every message and document would take hours, so Omar scans for summaries, key findings, and any mention of his team's services. Several hypotheses have already been tested, and one points to a potential issue in a backend service Omar's team owns. Meanwhile, customer complaints are beginning to surface from other time zones. The payment service is failing, and customer success managers are growing increasingly anxious.

And while a similar outage has happened before, Omar is not able to find any documentation or insights into how to remediate the issue.

Why organizational knowledge doesn't scale

When remediation history lives in documents, scattered across teams, formats, and platforms, engineers waste time searching instead of solving. Even well-documented incidents don't prevent recurrence if they're disconnected from future incidents. Even centralized platforms like Backstage don't help if they can't surface the right guidance at the right time.

Without a way to systematically identify, reuse, and scale this knowledge, it remains reactive. That's not just inefficient, it's a blocker to building intelligent automation and truly preventative operations. This is the hidden obstacle, silently inflating your MTTR, buried knowledge that costs time, delays response, and drains focus from what really matters. This is what remediation intelligence solves.

Introducing remediation intelligence

Dynatrace has a long history of providing DevOps teams with AI-driven tools for anomaly detection, root cause identification, and incident impact assessment in complex application environments. Over the past decade, it has contributed to reducing mean time to resolution (MTTR) by learning application behavior and analyzing dependencies in real time.

Building on this foundation, Dynatrace launched remediation intelligence, which adds an additional element to the incident response process from alert through resolution. It assists engineers during remediation by combining Davis® AI root cause and impact analysis with input from global community knowledge and internal expertise. It integrates data such as logs, metrics, traces, and topological context into a single view and offers support for documenting post-incident reviews.

Figure 1. The problems page displays all important information, allowing you to directly access all incident-relevant error logs.

Close the knowledge gap: Embedded troubleshooting knowledge

What truly sets Dynatrace remediation intelligence apart is its ability to proactively surface relevant internal knowledge at the moment it's needed most. It adds an AI-guided assistive layer to the Problems app that brings implicit, organizational knowledge directly into the flow of incident response. Once a problem is detected, Davis AI scans the historical data, surfacing past remediation playbooks, troubleshooting dashboards, and notebooks that were used to resolve similar issues.

With troubleshooting guides, we introduce a context-aware guidance system, built on Davis AI, that connects current incidents with prior resolution paths. It makes organizational knowledge queryable, remediation patterns reusable, and every responder effective, even when they're solving an unfamiliar issue.

Figure 2. Review related documents from similar past incidents.

Remediation intelligence surfaces the most relevant remediation insights

When an incident occurs, Dynatrace excels at automatically analyzing and surfacing technical insights. It collects and organizes all relevant signals-logs, metrics, traces, and topology-into a single coherent problem. No fragmented alerts. No disconnected symptoms. Just one structured, AI-curated incident view. In parallel, Dynatrace AI scans all documents marked as troubleshooting-relevant. Using advanced semantic search and vector embeddings, it ranks and surfaces the most relevant past incidents, dashboards, notes, and postmortems, based on their similarity to the current problem. This is not just keyword matching; Dynatrace understands patterns, failure modes, and system relationships, surfacing ranked, high-similarity incidents in the problem view.

Figure 3. Example of a troubleshooting guide

From observability to trusted automation: The power of context-aware AI

The future of resilient, self-healing systems lies in the seamless integration of observability, AI, and organizational knowledge. When these elements come together, they form the foundation for trusted automation-a system that not only reacts to incidents but learns from them, adapts, and eventually prevents them altogether.

At the heart of this vision is context. Effective auto-remediation depends on the ability to precisely identify the root cause of an issue and understand its broader impact across the application stack. But automation doesn't stop at detection. By capturing and integrating the remediation strategies used by engineers, Dynatrace builds a living knowledge base. This organizational knowledge, when combined with AI-driven root cause analysis, allows the system to replicate proven remediation paths and suggest next steps with increasing accuracy.

With all relevant data and insights unified in a single platform, engineers gain a single pane of glass view into their systems. This not only streamlines manual remediation efforts but also lays the groundwork for flexible, context-aware auto-remediation. Each incident you resolve fuels the knowledge. Over time, as the system learns, it evolves from reactive automation to proactive incident prevention-anticipating issues before they escalate and taking preemptive action.

This is the vision Dynatrace is delivering: a future where engineers can trust automation not just to respond, but to understand, learn, and improve-turning every incident into a step toward greater system intelligence and reliability.

Empower your teams: turn hard-won operational insights into scalable remediation power

Dynatrace remediation intelligence is ready to work for you today. To start benefiting, opt into Davis CoPilot®, the Dynatrace generative AI assistant. Once turned on, you'll need to configure Davis CoPilot to learn from a curated set of Dynatrace documents-specifically, Notebooks and Dashboards that are either created directly from detected problems or clearly labeled with the prefix "TSG" in their titles (short for Troubleshooting Guide).

Davis CoPilot will analyze and learn from your team's historical remediation efforts, capturing valuable insights and strategies, allowing it to proactively suggest relevant documentation and guidance when similar incidents are detected in the future, helping your engineers respond faster and more effectively.

Figure 4. Turn on Davis CoPilot in Settings.

All data uploaded to Dynatrace remains strictly private. All data and remediation insights remain within your tenant. Davis CoPilot treats your documents as strictly confidential and never shares or transfers this information outside your Dynatrace environment. Your team's knowledge stays private, secure, and entirely under your control, while still powering smarter, more context-aware automation.

Learn more about document suggestions and Dynatrace remediation intelligence in our documentation, and read about discovering relevant troubleshooting guides and how to create new ones.

Don't let organizational knowledge stay buried. Make it actionable. Make it scalable.
Learn more about remediation intelligence
Dynatrace Inc. published this content on August 13, 2025, and is solely responsible for the information contained herein. Distributed via Public Technologies (PUBT), unedited and unaltered, on August 13, 2025 at 15:52 UTC. If you believe the information included in the content is inaccurate or outdated and requires editing or removal, please contact us at [email protected]