Splunk Inc.

01/23/2025 | News release | Distributed by Public on 01/24/2025 12:29

Matching AI Strengths to Blue Team Needs

Much has been written about how AI, particularly Large Language Models (LLMs), will transform cybersecurity. Some say it'll be for the worse, and some say it'll be for the better. Although SURGe firmly believes that AI will end up helping defenders much more than it helps threat actors, it's sometimes hard to envision the exact form in which this help will, or should, come. There are many cybersecurity vendors with AI-enabled products and even some Open Source projects that might harness the power of AI in helpful ways, but "cybersecurity" is complex. It is made up of many individual functional areas, each with its own set of challenges to be addressed. How do we decide which of these challenges are good candidates to solve with AI and which should be addressed by other means?

It all starts with taking a hard look at what LLMs are actually good at.

So what are LLMs good at?

Although LLMs can do many things, let's focus on just a few that are most relevant to our security needs. Here's a list of what we consider to be the most relevant to a typical Blue Team, along with appropriate caveats for each. Keep in mind that depending on the problem(s) you want to solve, your list might be slightly different.

  • Document Summarization: Reducing large documents to smaller chunks, extracting key points, ideas, or facts. AI is generally very good at this, though its judgement of what's "key" may not always match yours.
  • Content Generation & Manipulation: The "generative" in generative AI. Creating new content where none existed, or manipulating existing content according to user instructions. Human review is required to catch hallucinations.
  • Language Translation: Translating text or code from one language to another. Applies almost equally to human languages as it does to programming languages. Translation can be excellent, though accuracy is highly dependent on the training data, and not all languages are equally represented.
  • Knowledge Augmentation & Retrieval: Providing alternative ways to access information, such as allowing a user to query or "chat with" a database of documents. Can allow less experienced users to operate at the level of their more experienced colleagues, though AI advice is never a replacement for human expertise.
  • Instruction Following: Tell the AI what to do step-by-step and have those instructions followed faithfully. Can be used as a simple form of automation itself, or combined with other LLM strengths to produce more advanced effects. The more complicated the instructions, though, the more likely the AI is to skip some or all of them.
  • Contextual Analysis & Interpretation: The ability of AI to extract "meaning" or draw conclusions from provided content. Without special training, the AI likely has only a superficial understanding, and it may miss deeper or less obvious implications.

How do AI's strengths match the needs of the blue team?

With a good list of AI strengths in mind, let's take a look at the problem from the opposite point of view: what challenges does my Blue Team face? We need to be careful here to set the scope to something large enough to be helpful, but small enough that we're not trying to solve every conceivable problem having to do with cybersecurity. To that end, consider the following high-level diagram of functional areas within a typical SOC that directly support its mission of detecting and responding to security incidents:


Figure 1: Typical SOC functions supporting incident detection and response


As the key cycle supporting the SOC's mission, this model gives us a lot to work with, while still keeping the scope of the problem down to a reasonable level.

Note that the "Automated Detection" function, though arguably the most mission-critical function of all, has historically been well served by signatures, rules, statistics, ML, and even AI. We're excluding that function so we can focus on those areas in which AI could provide some wins that might not have been feasible before.

Now, we can begin examining the challenges each functional area faces and determining where AI might be able to help.

Cyber Threat Intelligence (CTI)

An awful lot of threat intelligence work revolves around the consumption or production of plain text, whether it's collecting posts from a dark web forum, processing vendor-generated reports, or creating intelligence products for your own internal audiences. Since LLMs are all about language, they're a natural match. Document summarization can help your team consume more intelligence in the same amount of time, and language translation will expand your collection capabilities. Content generation can help produce intelligence products more quickly by automating the creation of "first draft" reports, and knowledge augmentation can help your analysts or, if you are brave, your intel customers, drill down and get quick answers to pressing questions.

Threat Hunting

Hunting involves a substantial amount of research (about actors and their techniques) as well as text generation (primarily in the form of reports and hunt documentation). These attributes make it similar to threat intelligence, and many of those use cases are also relevant here.

Where hunting really differs, though, is in the middle phase (which the PEAK framework refers to as the "Execute" phase). This is where all the data analysis happens, as well as where all the insights are generated. While some of this may involve plain text, it's more likely to require data search and retrieval, data summarization, statistical analysis, visualization, or generating code to identify malicious activity that may be hiding in your data. LLMs are good at many of these things as well. In particular, off-the-shelf AI models are quite good at generating, for example, Python code or SPL queries for more in-depth data analysis.

Detection Engineering

Just as threat hunting is similar to threat intelligence, detection engineering is similar to, but distinct from, threat hunting. There is a substantial amount of research and analysis, but detection engineering often involves actually demonstrating the "malicious" behavior in a sandbox environment in order to capture realistic data with which to create detections. LLMs can assist engineers by summarizing the steps necessary to replicate the malicious behavior, as well as translating them to exact commands that match the specific environment they'll be executing in.

On the back end of the process, similar to threat hunting, LLMs can also help with the creation of detection signatures or rules, since even many general off-the-shelf models are quite good at generating SPL, Python, YARA, or Sigma code. While these should probably not be pushed directly into production without extensive testing, providing a "first draft" to a detection engineer can be a quick, cheap, and effective way to accelerate the detection engineering process.

Alert Triage / Incident Response / Digital Forensics

These three areas, though distinct in theory, are so tightly bound together that it is easier to consider them as a unit for our purposes. You can most easily see this interrelationship by considering the process of triaging and responding to alerts as a set of questions the analyst must answer:

  • What does this alert mean?
  • Was this an actual attack?
  • Was the attack successful?
  • What assets were affected?
  • What did the attacker do (or try to do)?
  • How should we respond?

LLMs can potentially help analysts answer each of these questions, though the questions get more difficult (and the answers more ambiguous) as you progress through the list.

It's easy to see how LLMs can help analysts figure out what an unfamiliar type of alert is trying to tell them. Although many teams have documentation to explain their most common alerts, it's rare that every possible alert is well documented. Simply sending the alert message to the LLM, perhaps with some basic supporting data such as hostnames, IP addresses, and ports involved, can help generate a concise description of what type of activity the system detected, as well as provide some background information to help the analyst make good decisions.

A helpful implementation strategy would be to start by trying to answer the questions at the top of the list and gradually work your way to the end. Each individual question is a good milestone, but you might also consider treating them as groups. For example, questions #1 - #3 are the most important to the Triage function, so addressing them as a group makes sense.

Reporting & Lessons Learned

In a way, this final phase is similar to the CTI phase, so many of the best use cases for LLMs are the same, or similar. This phase is all about documenting what you saw and did during the incident, usually producing plain text for management and stakeholder consumption. Having an LLM consume response notes (e.g., from incident ticket updates) and creating a rough draft of interim or final reports can be a big time saver. Take care, though, when it comes to the root cause analyses, lessons learned, and recommendations for improvement; these typically require detailed local knowledge and complex reasoning. You will need a human to review their outputs for anything more than the simplest of incidents.

Getting the best fit

To get the most security bang from your AI investment, it's essential to first understand whether the problems you hope to solve are good fits for LLM technology. Start by taking a realistic look at what AI is actually good at. Pair this with an understanding of the processes and workflows your security team goes through regularly, and look for the ones that best fit the most AI strengths. Whether you're buying a product, adopting an open source tool, or creating your own solution in-house, using a prioritization scheme like this will help ensure that you're targeting solutions that have the best chance of making your security operations more efficient and effective and at a reasonable cost. Above all, don't expect that the AI will perform perfectly, even with tasks for which it is well-suited. Properly used, LLMs can provide massive productivity gains, but it's still up to the accountable humans to ensure the quality of the final products.