The AI Security Delusion: Why GenAI Innovation Is Leaving Enterprise Defenses in the Dust

Jun 24, 2025
3 min read

Amid the frenzy to embed generative AI (genAI) into every crevice of modern enterprise, a new report from Cobalt delivers a hard truth: the cybersecurity safeguards meant to defend this transformative technology are woefully outdated—and attackers know it.

The State of LLM Security Report 2025, released today by offensive security leader Cobalt, reveals that while nearly three-quarters of IT leaders recognize genAI threats as their top concern, a full third still aren’t conducting regular security assessments on their large language model (LLM) deployments. It’s a gap that leaves a dangerously exposed attack surface in some of the most sensitive parts of modern digital infrastructure.

“Threat actors aren’t waiting around, and neither can security teams,” said Gunter Ollmann, CTO of Cobalt. “Our research shows that while genAI is reshaping how we work, it’s also rewriting the rules of risk.”

A Surge in Innovation, a Stall in Security

The survey of 450 security leaders and practitioners paints a picture of organizations charging headlong into genAI—94% reported a significant rise in adoption over the past year—while foundational security controls remain stuck in a pre-AI era. Over 36% of respondents confessed that the pace of genAI development has outstripped their teams' ability to secure it.

And the numbers from Cobalt’s penetration testing platform back that up: 32% of all LLM security findings were classified as “serious,” the highest rate across all asset categories. But only 21% of those serious flaws have actually been fixed—the lowest remediation rate of any tested environment.

Even when critical LLM vulnerabilities are patched, it’s usually only the easy ones. The average time to resolve a fixed high-severity LLM issue is just 19 days, suggesting teams are prioritizing low-hanging fruit while more complex and risky issues remain unaddressed.

The Industry Blind Spots

Cobalt’s report reveals that while some sectors like financial services are taking a more conservative, controlled approach—boasting one of the lowest serious vulnerability rates (11.2%)—others like manufacturing and education are falling behind. In education, for instance, 100% of respondents claimed genAI hadn’t outpaced their defenses, yet only 33% were conducting regular assessments, despite facing one of the highest rates of serious vulnerabilities (17.6%).

In manufacturing, where production uptime often trumps proactive security, nearly three-quarters of respondents expressed concern over traditional attack vectors like exploited vulnerabilities. Yet the sector showed a striking need for improved AI-specific testing.

The Nature of the Threat

The report’s case studies highlight the unique and evolving nature of LLM threats: from prompt injection attacks that trick models into exposing sensitive PII, to model denial-of-service (DoS) attacks that flood APIs with high-complexity prompts, causing systems to grind to a halt.

Even more alarming is what Cobalt calls “Excessive Agency”—instances where attackers coax models into exceeding their programmed boundaries through subtle, multi-step prompts, effectively weaponizing the model’s own reasoning process.

This isn’t about theoretical AI safety dilemmas. These are live, exploitable vulnerabilities inside enterprise environments—and they’re increasingly the preferred target of advanced attackers.

Perception vs. Reality

Perhaps the most damning finding isn’t in the vulnerabilities themselves, but in how organizations perceive them. While 76% of executives worry about long-term genAI threats like adversarial attacks, only 68% of practitioners share the same level of concern. Conversely, more frontline security engineers are focused on immediate operational risks like inaccurate data outputs—pointing to a disconnect between strategic foresight and day-to-day reality.

This division extends to remediation. Executives are more likely to believe they’re prepared for genAI’s risks—yet their teams are often under-resourced, looped in late, and facing immense pressure to ship new features without delay.

“Much like the rush to cloud adoption, genAI has exposed a fundamental gap between innovation and security readiness,” said Ollmann. “Mature controls were not built for a world of LLMs.”

The Path Forward

Cobalt is advocating for a shift away from reactive security audits toward proactive, human-led penetration testing tailored specifically for LLM systems. The company stresses the need for continuous testing—not just of the models themselves, but also the APIs, data pipelines, and third-party components that feed them.

Their advice to security leaders: start treating genAI like the high-value, high-risk asset it is. That means integrating offensive testing into development cycles, demanding security transparency from AI suppliers, and mandating close collaboration between security and AI/ML engineering teams.

Because, as the report makes clear, securing the AI-powered future won’t come from legacy controls or checkbox compliance. It will require security teams who can think—and act—with the same creativity and adaptability as the models they’re trying to protect.

TL;DR: Cobalt’s new LLM security report reveals an alarming gap between the pace of genAI adoption and enterprise security readiness. High-severity vulnerabilities in LLMs are being discovered faster than they’re fixed—and only human-led, proactive testing can close the gap before attackers do.