Safeguard Your Private Data, Programmers: Discover Secret Scanning
Previously on...
In my previous blogs, you may have noticed my growing interest in security and privacy topics. Of course, there is my already existing passion for DevOps. However, in my latest blog post, I outlined how DotNet 6 offers possibilities to store sensitive data outside the committable configuration: e.g. appsettings.json
. In this post, I will combine all three topics and discuss possibilities for cleaning the Git history of sensitive information.
Context
Secrets may be stored intentionally for convenience, but they can also be hidden in text files, Slack messages, and debug application logs, or may simply be forgotten. Git's design promotes the free distribution of code and that makes it easy for secrets to be leaked.
Attackers can exploit those leaks to gain access to sensitive information. By this personal services can be targeted. Code reviews will not detect secrets buried in Git's history.
I want to give a real-world scenario where storing a token leads to data leakage, discussed in the following article: 7 Real-Life Data Breaches Caused by Insider Threats | Ekran System. Let me take the case of Slack.
In December 2022, Slack's security team detected suspicious activity on the company's GitHub account, revealing that a malicious actor had gained unauthorized access to the company's resources by stealing Slack employees' tokens.
The investigation of this cybersecurity incident showed the perpetrators stole Slack's private code repositories which often contain sensitive information. Slack did not disclose the type of information stolen or disclosed further information on who the vendor was or what services or products they provided. This decision may be due to several factors, such as preserving the integrity of ongoing investigations, preventing additional harm, or maintaining the confidentiality of affected parties. Publicly revealing certain details could inadvertently aid the perpetrators or heighten the risk of subsequent attacks.
Protect the remote GIT repository
These days a lot of tooling exists to help you protect your GIT repositories from containing sensitive data. I mention Windows Defender for Azure DevOps, GitLeaks for AWS/Azure and Advanced Security on Github.
Windows Defender for Azure DevOps
Microsoft has a product called Windows defender that offers a plugin for Azure Devops
Defender for DevOps empowers developers prioritize critical code fixes with Pull Request annotations and assign developer ownership by triggering custom workflows.
Enabling it is quite easy. Look at the following page to know more.
Configure the Microsoft Security DevOps Azure DevOps extension | Microsoft Learn
GitLeaks
I will not discuss the AWS part. I am Azure focussed, so that is what I discuss in this post.
Here you find a summary of the video
As a developer, I do not always have the rights or the power to change the environment to leverage an enterprise tool like Windows Defender.
However, there is a public repository available on GitHub called GitLeaks. Let me copy and paste their definitions:
Gitleaks is a SAST tool for detecting and preventing hardcoded secrets like passwords, api keys, and tokens in git repos. Gitleaks is an easy-to-use, all-in-one solution for detecting secrets, past or present, in your code.
I encourage you to read my blog post where I position SAST in the development lifecycle.
The following blog is a good read as a Developer's Guide to Using Gitleaks to Detect Hardcoded Secrets.
That blog will discuss Gitleaks. The blog explains that the tool is ISO-27001 compliant and works on public, private, remote, or local repositories.
Gitleaks has two main commands: detect and protect:
The detect command scans the repository for potential vulnerabilities and generates a report with a list of potential vulnerabilities.
The protect command creates a hook to prevent commits that introduce security vulnerabilities.
In a development environment
I can set up GitLeaks on my developer computer using Docker
, Go
or use Make
to create my build. The beauty of GitLeaks is that I can set it up as a precommit
-hook.
After configuring my pre-commit hook, I can try to commit a client secret and the following will be the result.
➜ git commit -m "this commit contains a secret"
Detect hardcoded secrets.................................................Failed
In an Azure Pipeline
By using GitLeaks with Azure DevOps, regularly scanning code using GitLeaks and integrating it into the development pipeline can prevent code merges with potential vulnerabilities and prevent them from moving into production, protecting organizations' valuable data, and reputations.
I can use Gitleaks as a plugin on Visual Studio Marketplace. When reading that page, I noticed that the author JoostVoskuil gives credit to a colleague of mine: Jesse Houwing - Xebia | Xpirit.
Mark Patton gives a good view on how to use GitLeaks in Azure DevOps.
The post provides instructions for adding the task and configuration with the appropriate parameters, including the path to the repository, report output location, and thresholds for failing a build.
This is a good demonstration of this matter:
GitHub Advanced Security: Secret Scanning
GitHub offers what they call Advanced security. One of those features is called "Secret Scanning": GitHub scans every public repository for known types of secrets to avoid any misuse of accidentally committed secrets. Secret scanning searches through all Git history on all branches present in your GitHub repository for secrets. Any strings that match patterns provided by secret scanning partners, other service providers, or defined by organizations or users are reported as alerts in the Security tab of repositories.
The secret scanning feature offers alerts for partners that run automatically on public repositories and public npm
packages to alert service providers about leaked secrets on GitHub.com. Another part of this feature is the encryption of the identified secrets using symmetric encryption during transmission and rest.
Secret scanning alerts for users are available for free on all public repositories. Organizations using GitHub Enterprise Cloud can enable secret scanning alerts for users on their private as well as internal repositories and public repositories for free with a license for GitHub Advanced Security. An alert is also sent to contributors who committed the secret if they haven't ignored the repository.
When you want to know more about GitHub AdvancedSecurity, Go visit this blogpost about Advanced Security. It is written by my Xebia | Xpirit college Rob Bos and he is known as an authority when it comes to Github.
How to manually undo my mistake
There are multiple strategies to undo a mistake after doing a git push
. I can opt for removing the file history or replacing text. I listed filter-branch
, filter-repo
and BFG
as possible options. GitHub offers a page about what to do when you committed a secret and you want to remove it.
First thing first: Retract your secret
If I have leaked a secret and pushed it to a remote repository like GitHub or Azure DevOps, it is imperative that I consider it leaked. I should revoke it immediately.
GitGuardian has a blog post on this matter. I want to give credit to Rob to bring this to my attention.
If I remove the secret from GIT's history, it may still be accessible by bad actors who may have obtained it earlier. The window of abuse should be as small as possible! I can still rewrite the history and do the cleanup, but security-wise, I need to revoke it first. GitGuardian offers a GitHub page (link) where I can a good overview of where and what I can do to revoke a secret.
Gihub offers a functionality called: "push protection". GitHub will not accept the content and will thus not leak it. This functionality will be available on Azure DevOps as well. I can enable push protection on GitHub for your public repos for free.
Remove file history
If I have accidentally committed and pushed an appsettings.json
file containing a ClientSecret
to a Git repository, it is essential to remove or revoke the ClientSecret
immediately.
First, remove the
ClientSecret
from theappsettings.json
file of the local repository.Commit the changes and push the changes to the remote Git Repository
git add .
git commit -m "Removed ClientSecret from appsettings"
git push
However, the ClientSecret
still exists in the repository history, and it is accessible through Git commands like git log
or git checkout
.
To remove the ClientSecret
from the Git repository history:
Identify the hash of the Git commit that added the
appsettings.json
file with theClientSecret
. I can do this by running thegit log
command and finding the commit that added the file.Run the following command to remove the file from the commit history:
git filter-branch --tree-filter 'rm -f appsettings.json' --prune-empty HEAD
This command will remove the file from the commit history for all branches.
push --force
will overwrite the remote repository's history
git push --force
The command overwrites the existing commit history for that file. This can lead to loss of data and confusion for other team members.
Replace text
Luckily, there are other options available, such as replacing text. Read it on the site of GitGuardian or read the following summary
Download and install
git-filter-repo
Create
replacements.txt
with on the left of==>
, what I want to replace and on the right side of==>
the text that I want to replace with:toreplace==>replacewidth.
With a concrete example:'123abc'==>ENV[‘AUTH_TOKEN’].
‘eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c’==>ENV[‘AUTH_TOKEN’]
Use
git filter-repo --replace-text ../replacements.txt --force
to remove selected lines of code containing sensitive informationForce Push:
- Use
git push --all --tags --force
for remote repositories
- Use
BFG Repo Cleaner
There is an alternative for the filter-branch
and filter-repo
commands that seems quite popular.
The BFG
is a simpler and faster alternative to git-filter-branch
for cleansing unwanted data from your Git repository history, such as removing large files or sensitive information like passwords and credentials. While git-filter-branch
is a powerful tool with capabilities beyond what BFG
can offer. BFG
excels in the tasks due to its speed, and simplicity.
The following post Removing sensitive data from your Git history with BFG - DEV Community
I will give a summary on how to clean sensitive information using the BFG.
Use --replace-text
to clean strings from your repo history. Each string will be rewritten as "***REMOVED***"
by default. This is a two-step process.
- Create a file
passwords.txt
. Add a line for each secret that needs to be removed.
fooPassword1
barPassword2
ey...
- Execute
bfg --replace-text
Execute the command with a reference to passwords.txt and the repository in question, here called foobar.git
$ bfg --replace-text passwords.txt foobar.git
Outro
I had initially used filter-branch
for my Dotnet 6 Configuration post to erase sensitive info from my history. But, the more I delved into the topic, the more I realised there's a lot more to it than meets the eye. My next move is to put my research to the test and show what commands I used by attaching some before and after screenshots to illustrate the effects.
Subscribe to my newsletter
Read articles from Kristof Riebbels directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Kristof Riebbels
Kristof Riebbels
I am a backend developer from Belgium. My first language is C#. Mostly working on the .net tech stack. Powershell is the script language that I am most familiar with. I love automating stuff. Tools you work with should be tools that you like to work with :). Loving the devops scene as well. At the moment, my platform of choice is Azure, but looking at GitHub these days as well. I do have some experience with typescript. but that is not my strongest suit. Working with Rider and Resharper, so thanks Jetbrains for making great tools :)