Radware Ltd.

09/10/2025 | Press release | Distributed by Public on 09/10/2025 14:23

CVE is the new PoC

In a previous blog, I wrote about proof of concept (PoC) exploits and the risks involved in publishing them before a patch is available. But what if I told you that the PoC isn't even necessary today, and publishing the common vulnerabilities and exposures (CVE) description alone might be just a few prompts away from a working exploit?

Then & Now

In the old days (like two years ago), turning a CVE into a working exploit was a long and demanding process. You probably needed to be a security expert with a deep understanding of systems and code. And you often required advanced environments, extensive trial and error, and a significant investment of time and technical skill.

Today, with the rise of large language models (LLMs)-especially ChatGPT4-much of that expertise is no longer necessary. An LLM acting as an agent (not just a chatbot, but one that can interact with websites, run code, analyse results, adapt its strategy, and more) can take a CVE description and a decent prompt and generate a working PoC. With today's LLMs, yesterday's PoC is today's session with ChatGPT.

CVE = PoC

In April 2024, a research paper demonstrated how large language models (LLMs) can autonomously exploit one-day vulnerabilities.

One-day vulnerabilities are security flaws that have already been publicly disclosed (usually through a CVE) but remain unpatched in many systems. In the article, the researchers described an experiment. They took 15 one-day vulnerabilities and tested two key scenarios:

  1. With a CVE description:
    • Ten different LLMs were tested and all but one failed completely. This includes popular open-source models and GPT-3.5.
    • Only GPT-4 succeeded and the results were remarkable. It exploited 87% of the vulnerabilities, generating working PoCs for 13 out of 15 cases.
      It achieved this using only:
      • The CVE description
      • Access to tools like terminal, code execution and web browsing
      • A prompt that encourages it no to give up and try different paths
      • A framework called ReAct (through LangChain platform) that allows GPT-4 to act like an agent

    This showed that you don't need sophisticated tools or great knowledge to create an exploit. You just need a CVE-and encouragement for GPT-4 to not give up.

  2. Without a CVE description:

    After only GPT-4 succeeded in creating a PoC based on a CVE, the researchers tested only its ability to discover and exploit vulnerabilities without any prior information, such as details provided in a CVE. Here the model could only exploit one vulnerability out of 15 (7%).

To sum it up with a sentence: The CVE itself became the exploit blueprint and GPT-4 was just here to fill in the missing code.

When CVEs Become Exploits

These new capabilities of LLMs have significant implications for the cybersecurity world.

Now that exploiting vulnerabilities no longer requires advanced programming skills, a new wave of attackers with limited technical knowledge may emerge simply by using the right prompt.

Another critical challenge is the shrinking time window between publishing a CVE and deploying a patch. With LLMs capable of generating working exploits within minutes, vendors and defenders need to move much faster than before. Beyond speed and accessibility, LLM-generated exploits introduce deeper challenges: from blurring ethical boundaries to overwhelming traditional defense tools. In a world where AI can scan, craft, and adapt attacks in real time, defenders must rethink what preparedness really means.

XBOW

XBOW is an autonomous system developed by former teams from Microsoft, GitHub, and OpenAI, designed to simulate the work of elite white-hat hackers. Instead of relying on human penetration testers, XBOW starts with a general attack goal (e.g., "find RCE or XSS vulnerabilities") and independently orchestrates the full exploitation process. It combines LLMs with symbolic reasoning and decision trees to scan for attack surfaces, generate payloads, test them, refine the approach, and validate successful exploitation-all without human involvement. What makes it even more impressive is its ability to adapt in real time: changing tactics, evading defenses like EDR, and choosing alternative paths when needed.

HackerOne is a leading bug bounty platform where ethical hackers report security vulnerabilities to companies in exchange for rewards. On August 1, 2025, for the first time ever, a non-human topped HackerOne's leaderboard. It was XBOW. Reports stated that XBOW found over 1000 vulnerabilities. While they didn't specify how many were zero-day or one-day vulnerabilities, other sources confirm that XBOW is capable of identifying true zero-days. Its false positive rate is also extremely low, meaning the system is highly accurate.

Conclusion

The boundaries between CVE and PoC are starting to blur. When a detailed vulnerability description can lead directly to a working exploit that's generated in minutes by a language model, we may need to rethink what responsible disclosure really means. Transparency is still a core value in cybersecurity, but in the age of autonomous AI agents, the cost of that transparency may be higher than we thought. If GPT-4 can already do this, then what should we expect now that GPT-5 is here? (Hint: It may not even need a description to build an exploit!) The line between "vulnerability discovered" and "vulnerability exploited" could disappear entirely and be replaced by a few seconds of processing and a prompt.

Posted in: Threat Intelligence
Radware Ltd. published this content on September 10, 2025, and is solely responsible for the information contained herein. Distributed via Public Technologies (PUBT), unedited and unaltered, on September 10, 2025 at 20:23 UTC. If you believe the information included in the content is inaccurate or outdated and requires editing or removal, please contact us at [email protected]