Massive Data Leak Exposes 2.6 Million Duolingo Users, Raising Concerns of Targeted Phishing

Cyber Jack
Aug 23, 2023
3 min read

Updated: Oct 10, 2023

A data breach has rocked the language learning giant Duolingo, as the scraped data of 2.6 million users has surfaced on a notorious hacking forum. The exposed information potentially enables cybercriminals to carry out precise phishing attacks, leveraging the acquired details.

With a staggering 74 million monthly users globally, Duolingo stands as one of the largest platforms in the language learning sphere. However, its reputation is now challenged by the leakage of user data that occurred earlier this year.

The scraped dataset, originally offered for sale at $1,500 on the now-defunct Breached hacking forum in January 2023, contains a mix of public login names, real names, email addresses, and internal data associated with Duolingo's services. Although real names and login names are part of a user's public profile, the inclusion of email addresses poses a more alarming threat, potentially facilitating targeted attacks.

Upon investigation, Duolingo acknowledged that the scraped data was sourced from public profiles but failed to address the inclusion of private email addresses, which adds a layer of sensitivity to the exposed information.

The leaked dataset recently resurfaced on a revamped version of the Breached hacking forum, available for just $2.13. The release was met with a post on the forum stating, "Today I have uploaded the Duolingo Scrape for you to download, thanks for reading and enjoy!"

This breach was executed by exploiting an exposed application programming interface (API), which has been openly shared since March 2023. The API allows users to input a username and retrieve JSON output containing a user's public profile information. Alarmingly, it can also be used to verify if an email address corresponds to a valid Duolingo account.

Despite being alerted to the API's abuse in January, Duolingo has yet to secure it, leaving it accessible to anyone on the web. This vulnerability enabled threat actors to feed large quantities of email addresses, potentially sourced from previous breaches, into the API and confirm their association with Duolingo accounts. This process ultimately led to the compilation of a dataset containing both public and non-public details.

Furthermore, another actor shared their own scraped data acquired through the API, emphasizing the significance of certain fields that indicate users with elevated permissions. These users are viewed as more valuable targets for phishing campaigns.

The prevalence of scraped data and its dismissal as harmless by companies is a recurring issue. While companies often argue that the majority of the data is public, the fusion of public and private data heightens the risks and potential legal breaches. In a parallel context, Facebook's 2021 breach exposed 533 million users' data, leading to a €265 million ($275.5 million) fine by the Irish data protection commission. Likewise, a Twitter API bug recently triggered the exposure of user data, prompting an investigation by the same authority. Despite such incidents, the vulnerability of APIs and scraped data remains a contentious and unresolved challenge. Richard Bird, Chief Security Officer, Traceable AI, shared the frustration that can come from a data breach due to API security gaps:

“As both a customer of Duolingo and a security professional I find this breach to be irritating and inexcusable. As a customer I don't need one more company exercising poor stewardship over the data that I've entrusted them with. I want to enjoy my experience learning Spanish without having to worry about who has my information. As a security professional I find it unconscionable that Duolingo hasn't moved to secure the API that allowed the breach in the first place.

Duolingo's delay or inattention to fixing the source of the API related data scraping is really unacceptable. Failing to act with urgency on an API breach like this is an open invitation to the bad guys to just keep picking and poking at your systems and processes trying to find a bigger payday. If companies' like Duolingo are informed of a problem and then drag their feet in addressing it, it sends the "weak prey" signal to the bad actors of the world.” ###