|

Secrets Exposed: Why modern development, open source repositories spill secrets en masse

The Circle CI breach and other recent hacks expose why the secrets problem is so prolific

Paul Roberts
Blog Author

Paul Roberts, Content Lead at ReversingLabs. Read More...

code illustration

For software development teams, the warning just after the New Year from DevOps platform vendor CircleCI to immediately rotate any secrets they had stored on the company’s continuous integration platform was worse than a nightmare. It was more like one of those horror films in which the police tell you that those creepy phone calls are coming from inside the house! 

As in those movies, the threat for organizations is real. In the case of CircleCI, for example, the world learned that, in December 2022, as-yet unidentified assailants compromised a development system used by a remote CircleCI engineer, planted malware and stole data, including customer environment variables, tokens, and keys from CircleCI’s production systems. The stolen information was subsequently used to access third-party systems of at least five CircleCI customers (those were the customers who actually disclosed breaches to CircleCI).

CircleCI is just one prominent data point in an ongoing epidemic of spilled and stolen secrets, as malicious actors scour public and private code repositories for an easy path into sensitive IT environments. Behind that epidemic lie larger forces, including changes to application development practices, tooling, and architectures that increasingly put sensitive development secrets and development organizations at risk of compromise.

Here's what your team needs to know about why secrets are leaking, to better understand the problem. In our next post in this Secrets Exposed series, you'll learn about the how.

[ Secrets Exposed Special Report: How hackers are gaining access to software secrets | How to mitigate risk from secrets leaks — and prevent future breaches ]

Secrets in software are the worst kept secrets

The hack of CircleCI was just the latest in a string of attacks on CI/CD platforms aimed at stealing sensitive credentials and information.

In 2021, for example, a flaw in the Travis CI platform exposed secrets stored in hundreds of thousands of open-source projects that use the platform. A report the next year found that tens of thousands of user tokens were likewise exposed through the Travis CI API, providing unfettered access to more than 770 million historical clear-text logs. Some of those contained stored credentials and other secrets. 

In 2022, several major corporations were affected by the leak of secrets and sensitive information stored in code repositories. In March, for example, the Lapsus$ hacking group leaked hundreds of gigabytes of internal source code belonging to Samsung and Nvidia. An analysis of the leaked Samsung code by GitGuardian revealed that over 6,000 secrets stored in the code were revealed in that leak. 

Then, in October last year, Toyota revealed that credentials for a database containing personal information on hundreds of thousands of customers were left exposed in an open-source repository associated with a contractor who had worked on the company’s telematics application. Five years went by before the leak was detected. And in December, Okta revealed that malicious actors accessed the company’s private GitHub repositories and copied code associated with the company's Workforce Identity Cloud (WIC), an enterprise-facing access and identity management tool. 

More recently, in January 2023, a group of researchers who analyzed the cybersecurity of vehicle telematics systems reported that they also discovered multiple incidents of credentials and access tokens exposed in telematics applications and source code repositories that could be used to send unauthorized commands to vehicles, as well as access sensitive data and systems on automakers’ and suppliers’ networks. 

Changing dev tools, practices drive secrets spills

What explains the sudden interest in secrets exposed via code repositories? Much of the shift is due to changes in the application development ecosystem, including the shift from waterfall to agile and DevOps methodologies over the past two decades. 

With that came an embrace of more modular applications, a greater reliance on open-source and shared, third-party code in application development, and the emergence of large, public code repositories such as GitHub, PyPi, and npm.

Development activity and source code control moved from tightly protected, corporate-owned networks and development assets to shared, cloud-based development platforms and source code repositories. Those platforms turbo-charged development, collaboration and code reuse — and made it easy to move code seamlessly between projects. 

David Neuman, Senior Analyst at TAG Cyber, said the ubiquity of software repos is key.

"[Repositories have become] like LinkedIn for developers. They’re where they live and work, and they're a very busy space.” 
David Neuman

The changes to the development ecosystem brought with it new cyber risks that development organizations, until recently, have overlooked. These include:

1. Repositories are where the secrets are 

The ease of access and sheer amount of code residing on shared platforms like GitHub, npm and PyPI makes them rich targets for attackers looking for a back door into security-conscious organizations. That includes scouring open source repositories for secrets, and leveraging these repositories to push malware to the systems of unsuspected developers. 

In August, for example, a malicious actor forked 35,000 open source projects on GitHub, inserting a malicious backdoor into the forked projects in an effort to fool would-be adopters. 

2. Leaks that last for months

The sheer number of exposed secrets lurking in both public and private code repositories makes them low-hanging fruit for attackers. Secrets and credentials were the most common cause of data breaches and can lurk unnoticed for months, according to IBM’s 2022 Cost of Data Breach Report (registration required). The average length of time before leaked credentials were identified was 327 days.

GitHub itself informed partners of over 1.7 million potential secrets exposed in public repositories in 2022 in an effort to prevent the misuse of those tokens, the company said. 

GitGuardian scanned more than 1 billion GitHub commits and found over 2 million exposed secrets, which amounts to thousands each day.

The amount of activity on these platforms – there were over 1 billion Github code commits in 2021 alone – can make it difficult for development organizations to “see the forest for the trees” when it comes to risks like “back-doored” open-source modules or leaked secrets lurking in code commits, said Neuman. For example, GitGuardian scanned more than 1 billion GitHub commits and found over 2 million exposed secrets, which amounts to thousands each day.

3. Blurred dev team boundaries 

The inevitable blurring between personal and professional, open and closed also frustrates efforts to avoid breaches. According to GitGuardian’s data, for example, 85% of secret leaks occurred on developers’ personal repositories, whereas 15% occurred in repositories managed by corporations. The exposed secrets included access keys to Google resources (27%); development tools like Django, RapidAPI, and Okta (16%) and cloud providers like AWS, Azure, Google and Tencent (8%), GitGuardian reported.  

Karlo Zanki, a threat researcher at ReversingLabs, who spends his time analyzing open source repositories for security risks, explained the scope of the threat.  

“We come across a lot of secret leaks — it’s not that uncommon."
Karlo Zanki

Zanki said credentials aren’t hard to find. While discovering actual user credentials is challenging, with a high rate of phony or placeholder credentials leading to a high “false positive” rate, detecting something like AWS or Google Access tokens is easier.

4. Secrets can disappear in as little as 20 seconds

The same features that make platforms like GitHub so powerful for accessing, analyzing and sharing code also make them easy for both good actors and bad to spot security and privacy lapses. By one estimate, the median time to discovery for a key leaked to GitHub is 20 seconds, with detection times ranging from half a second to over four minutes. In other words: By the time most development teams realize they have accidentally exposed secrets, those secrets have almost certainly been detected by a malicious actor. 

By the time most development teams realize they have accidentally exposed secrets, those secrets have almost certainly been detected by a malicious actor. 

It’s time for action on secrets

Awareness of software supply chain risk is growing, along with attacks on the software supply chain. In a survey by Dimensional Research of more than 300 employees at firms engaged in software development, for example, more than two-thirds of respondents (69%) identified threats and malware lurking in open-source repositories as at risk. More than half (58%) named CI/CD tooling flaws as a risk.

However, that growing awareness has yet to produce tangible results in terms of how organizations manage that risk. The handling of development secrets is just one, clear example of that. Despite numerous incidents of breaches resulting from exposed development secrets, malicious actors continue to hit pay dirt when scouring open-source repositories for stored credentials, access tokens and other development secrets. That exposed information then contributes to downstream attacks against development organizations and, eventually, their customers. 

To change that, development organizations must first understand that malicious actors are on the prowl for secrets stored in code and take action to limit that risk. That means focusing energy and resources in areas that have proven to contribute to secrets disclosure. 

For example, organizations should undertake audits of their existing private and public code repositories, looking for exposed secrets in code and for ways to eliminate stored secrets or remediate the risk they pose. They should also assess overlaps between private, corporate repositories and code repositories maintained by individual developers working within or on behalf of their organization. Finally, organizations need to understand the speed with which secrets disclosure happens and look for ways to prevent secrets leaks from happening — or to greatly narrow the window of exposure and prevent damaging leaks.

[ Secrets Exposed Special Report: How hackers are gaining access to software secrets | How to mitigate risk from secrets leaks — and prevent future breaches ]