GitGuardian Report Shows Over Two Million Secrets Detected on Public GitHub in 2020; Growing 20% YoY
GitGuardian, a cyber security start-up specialized in securing software development with automated secrets detection & remediation, today announced the results of its 2021 State of Secrets Sprawl on GitHub report. The report, which is based on GitGuardian’s constant monitoring of every single commit pushed to public GitHub, indicates an alarming growth of 20% year-over-year in the number of secrets found. A growing volume of sensitive data, or secrets, like API keys, private keys, certificates, username and passwords end up publicly exposed on GitHub, putting corporate security at risk as the vast majority of organizations are either ignoring the problem or poorly equipped to cope with it.
According to the report, 15% of leaks on GitHub occur within public repositories owned by organizations and 85% of the leaks occur on developers’ personal repositories. Secrets present in all these repositories can be either personal or corporate and this is where the risk lies for organizations as some of their corporate secrets are exposed publicly through their current or former developer’s personal repositories.
Types of Secrets Found
27.6% Google keys
15.9% Development tools (Django, RapidAPI, Okta,
15.4% Data storage (MySQL, Mongo, Postgres,...)
12% Other (including CRM, Cryptos, identity providers, payments systems, monitoring)
11.1% Messaging systems (Discord, Sendgrid, Mailgun, Slack, Telegram, Twilio…)
8.4% Cloud provider (AWS, Azure, Google, Tencent, Alibaba…)
6.7% Private keys
1.9% Social network
0.8% Version Control Platform (GitHub, GitLab)
0.4% Collaboration tools (Asana, Atlassian, Jira, Trello, Zendesk...)
Top 10 File Extensions
As you might expect, with the many programming languages, frameworks and coding practices adopted throughout the world, there is a very long list of extensions that can contain secrets here is the view of the top 10.
Top 10 file extensions account for 81% of all the results,
The top 3 accounting for over 56% of the results
9.6% Environment variables file
GitHub is more than ever “The Place to Be” for developers when it comes to innovating, collaborating and networking. GitHub gathers more than 50 million developers working on their personal and/or professional projects. When 60 million repositories are created in a year and nearly two billion contributions added, some risks arise for companies even if they don’t use GitHub or open source their code, because their developers do.
As architectures move to the cloud and rely more on components and applications, the growth of commits occurring and the use of digital authentication credentials has increased the number of secrets detected. To compound the problem companies are pushing for shorter release cycles, developers have many technologies to master, and the complexity of enforcing good security practices increases with the size of the organization, the number of repositories, the number of developer teams and their geographical spread.
As Talend, a GitGuardian customer states it, “We launched an audit using GitGuardian, and several leaked secrets were brought to our attention. What was very interesting and what we didn't anticipate was that most of the alerts came from the personal code repositories of our developers." Anne Hardy - CISO Talend
Companies can’t avoid the risk of secrets exposure even if they put in place centralized secrets management systems. Solutions are available for them to automate secrets detection and put in place the proper remediation, but the market is far from mature on this subject. “The reality is most organizations are operating blind. Most leaks of organization’s credentials on public GitHub occur on developers’ personal repositories, where organizations often have no visibility, let alone the authority to enforce any kind of preventive security measures.” Jeremy Thomas, CEO GitGuardian. Companies need to scan not only public repositories but also private repositories to prevent lateral movements of malicious actors.
Some best practices can be followed to limit the risk of secrets exposure or the impact of a leaked credential:
Never store unencrypted secrets in .git repositories
Don’t share your secrets unencrypted in messaging systems like Slack
Store secrets safely
Restrict API access and permissions
But respecting them is not sufficient and companies need to secure the SDLC with automated secrets detection.
Choosing a secrets detection solution, they need to take into account:
Monitoring developers’ personal repositories capacities
Secrets detection performance - Accuracy, precision & recall
Integration with remediation workflows
Easy collaboration between Developers, Threat Response and Ops teams.
GitGuardian’s secrets detection engine has been running in production since 2017, analyzing billions of commits coming from GitHub. Since day one, GitGuardian began to train and benchmark its algorithms against the open source code. It allowed GitGuardian to build a language agnostic secrets detection engine, integrating new secrets or new ways of declaring secrets really fast while keeping a really low number of false positives. GitGuardian has developed the vastest library of specific detectors, being able to detect more than 200 different types of secrets.
Download the State of Secrets Sprawl on GitHub report here
GitGuardian is a cybersecurity startup solving the issue of secrets sprawling through source code, a widespread problem that leads to some credentials ending up in compromised places or even in the public space. The company solves this issue by automating secrets detection for Application Security and Data Loss Prevention purposes. GitGuardian helps developers, ops, security and compliance professionals secure software development, define and enforce policies consistently and globally across all their systems. GitGuardian solutions monitor public and private repositories in real time, detect secrets and alert to allow investigation and quick remediation.