Phishing Kits Go Interactive, Letting Vishing Callers Control MFA Sessions in Real Time

Jan 25
4 min read

Phishing has always borrowed from theater. What is changing now is the stage direction.

New research from Okta Threat Intelligence shows that modern phishing kits are no longer static web traps. They are interactive tools designed to work in lockstep with a human voice on the other end of the line. In these hybrid attacks, a caller guides a victim through a login flow in real time while dynamically controlling what the victim sees in their browser.

The result is a form of vishing that feels uncannily legitimate, even to security-aware users.

According to Okta, multiple intrusion groups are now using custom phishing kits built specifically for voice-based social engineering campaigns. These kits are sold as a service and are actively targeting users of major identity providers like Google, Microsoft, and Okta, as well as cryptocurrency platforms. Instead of relying on generic credential-harvesting pages, the tools give attackers live control over authentication sessions as they unfold.

“Once you get into the driver’s seat of one of these tools, you can immediately see why we are observing higher volumes of voice-based social engineering,” said Moussa Diallo, threat researcher at Okta Threat Intelligence. “Using these kits, an attacker on the phone to a targeted user can control the authentication flow as that user interacts with credential phishing pages. They can control what pages the target sees in their browser in perfect synchronization with the instructions they are providing on the call. The threat actor can use this synchronization to defeat any form of MFA that is not phishing-resistant.”

At the heart of these campaigns is what Okta describes as real-time session orchestration. Attackers begin with reconnaissance, learning which applications a target uses and which phone numbers are associated with IT support. They then call the victim while spoofing a trusted number and direct them to a phishing site that looks like a legitimate login page.

When the victim enters their credentials, those details are instantly relayed to the attacker, often through messaging platforms like Telegram. The attacker then attempts a real login to the legitimate service and sees which multifactor authentication challenge is triggered.

From there, the phishing kit updates the victim’s browser in real time to match the attacker’s verbal script.

If the attacker is prompted with a push notification, the victim may suddenly see a page explaining that a push has just been sent and should be approved. If a one-time passcode is required, the site seamlessly pivots to request it. The web experience and the voice instructions reinforce each other, stripping away the friction that usually alerts users to fraud.

Even safeguards like number-matching push notifications offer limited protection in this scenario. Because the attacker is speaking directly to the victim, they can simply instruct them which number to select. As Okta notes, these methods are not phishing-resistant by design.

By contrast, authentication approaches that rely on cryptographic binding between the user and the device, such as passkeys or Okta FastPass, break the attacker’s ability to proxy the session in real time.

Diallo believes this is only the beginning. As demand for voice-based fraud grows, both the tooling and the human expertise are being commoditized.

“Vishing is becoming such an in-demand area of expertise that, much like access to these kits, that expertise is also sold on an as-a-service basis,” he said.

Earlier phishing frameworks with basic real-time features are already being copied and refined into new kits built exclusively for callers. Where attackers once paid for broad platforms that targeted many services at once, a new market is emerging for bespoke control panels tailored to individual providers.

Security leaders say the findings reflect a broader shift in how attackers operate against identity systems. Cory Michal, CSO at AppOmni, said the research documents a tactic defenders have been wrestling with for some time, but one that continues to scale effectively.

“This research usefully documents a tactic defenders have been battling for a while: Attackers blending vishing and phishing into a single, real-time ‘guided’ flow where the caller team can steer a victim through MFA prompts and adapt the web experience on the fly,” Michal said. “The underlying tradecraft isn’t brand new, but it’s still highly effective at scale.”

Michal also pointed to the central role identity now plays in enterprise security. As SaaS platforms increasingly function as the organizational control plane, identity providers have become the most valuable choke point for attackers. Gaining access to an account often unlocks wide-reaching privileges and downstream data.

He noted that the rapid evolution of these kits mirrors a broader trend toward automation and AI-assisted development within criminal ecosystems, allowing attackers to iterate faster and lower the skill barrier for complex intrusions.

For defenders, the guidance is blunt. Technical controls matter more than user awareness alone.

“In a workplace context, there is no substitute for enforcing phishing resistance for access to resources,” Diallo said.

That means deploying phishing-resistant authentication by default, restricting access based on trusted networks and devices, and closely monitoring identity and SaaS audit logs for signs of abuse. Some financial institutions are also experimenting with live caller verification inside mobile apps, allowing users to confirm whether an incoming call is legitimate.

The attacks may feel new, but the lesson is familiar. As authentication becomes more sophisticated, so does the social engineering wrapped around it. When the browser and the voice on the phone tell the same convincing story, trust becomes the weakest link.