AI Can’t Hack It—Yet: Why Generative Models Still Struggle With Real Exploits

Cyber Jill
Jul 10
4 min read

It’s 2025, and large language models (LLMs) can draft corporate policies, pass the bar exam, and generate believable malware boilerplate. But when it comes to writing a working exploit? Not so much.

That’s the conclusion from a new study conducted by Forescout’s Vedere Labs, which put over 50 AI models—from polished commercial giants to sketchy underground variants—through a battery of real-world vulnerability research (VR) and exploit development (ED) tests. The verdict: “vibe hacking,” the idea that AI can autonomously break into systems, is still mostly a myth.

“From our analysis, it looks like so-called ‘vibe hacking’ is not a current threat,” said Michele Campobasso, lead researcher at Forescout. “But considering the pace at which LLMs evolve, we expect this technology to become more effective in solving more complex cognitive tasks.”

A Lab-Tested Reality Check for AI Exploits

Between February and April 2025, Forescout researchers simulated the mindset of an opportunistic attacker, testing open-source, commercial, and underground LLMs on two common VR tasks and two ED challenges derived from public datasets and cybersecurity wargames.

The results were sobering—even for defenders expecting the worst:

48% of models failed the first VR task, and 55% failed the second.

66% bombed the first ED challenge, while a staggering 93% failed the second.

Only three commercial models managed to produce a functional exploit for the most advanced task.

Despite widespread concern about uncensored or "weaponized" AIs—like DarkGPT, WormGPT, and EvilAI—most underground models choked on basic prompts, struggled with session memory, or returned broken code. Many were little more than repackaged versions of commercial tools like Claude or Gemini.

Open Source and Underground AIs: More Bark Than Bite

The study painted a bleak picture for attackers leaning on open-source or “black hat” models.

Open models were the worst performers across the board. Even those advertised as cybersecurity-specific failed to identify simple bugs or construct even trivial exploits. One, Lily-Cybersecurity-7B, repeatedly hallucinated nonexistent vulnerabilities—like XSS—in C code snippets.

Underground models—those circulating in Telegram channels or sold through shady marketplaces—fared only slightly better. Some succeeded at identifying vulnerabilities in tightly scoped code, but failed on follow-up tasks due to message length caps, formatting issues, or alignment blocks inherited from their commercial LLM underpinnings.

Ironically, these “unrestricted” tools often offered less usability than their better-regulated commercial cousins.

Commercial AIs: The Devil’s in the Prompt

Commercial LLMs performed the best overall, with models like Gemini 2.5 Pro, ChatGPT o3-mini-high, and DeepSeek V3 showing real promise—if you knew how to push them. These systems could, with the right steering, identify memory bugs and even craft working exploits against toy binaries.

Still, the success rate was modest. For instance, DeepSeek V3 produced a working ED2 exploit, but only on its fourth attempt—and only after the researchers manually corrected syntax issues and widened memory inspection windows.

Campobasso noted that while these exploits worked in lab conditions, they required significant human intervention. “We are still far from LLMs that can autonomously generate fully functional exploits,” he said.

The Alignment Loophole: Jailbreaks, Not Just for Phones

Despite fears of AI refusing to cooperate, alignment safeguards were surprisingly easy to bypass in many commercial models. A bit of rephrasing—asking the model to “audit” code rather than “exploit” it—was often enough to get results.

That said, alignment did sometimes get in the way. Microsoft’s Copilot flat-out refused to tackle any of the test cases. Gemini occasionally sanitized special characters, breaking exploit syntax. And ChatGPT o4 gave solid guidance but wouldn't write shellcode unless coaxed over multiple attempts.

Still, this friction was minimal compared to the benefits: better output formatting, more accurate debugging help, and actual working payloads—something no underground or open model consistently delivered.

AI Exploits Are Coming—But So Are AI Defenses

Campobasso doesn’t discount the threat posed by rapidly improving models. He pointed to reasoning-capable AIs and task-specific fine-tuning as key accelerants. But he also noted several obstacles that will likely delay full-blown “vibe hacking”:

Resources: High-performing models are still compute-heavy, limiting access to well-funded groups like state-sponsored actors.

Export controls: AI models capable of generating military-grade exploits may soon fall under Wassenaar-like restrictions.

Cognitive limits: Current models still struggle with multi-step reasoning, memory handling, and basic exploit logic.

More importantly, he emphasized, AI helps defenders too.

“At Forescout, we’re using AI to generate readable threat reports, integrate intelligence with tools like Microsoft Copilot, and deploy honeypots for ransomware tracking,” said Campobasso. “The same technology reshaping offense can—and must—be used to strengthen defense.”

What Should Defenders Do Now?

The study's takeaway: AI may eventually make vulnerability discovery and exploit development easier, but it hasn’t yet revolutionized cyber offense. Organizations should focus less on panicking about AI-powered attackers and more on getting the basics right.

Patch management. Segmentation. Least privilege. Zero Trust. These are still the best defenses—even against an AI-written exploit.

“Most real-world attacks still rely on known vulnerabilities, misconfigurations, and human error,” Campobasso noted. “So these tools may soon increase the number of attacks, but not their sophistication.”

In other words: AI isn’t rewriting the cyber playbook yet. But it’s learning fast. And defenders would do well to study just as hard.