Is Your LLM Leaking Sensitive Data? A Developer’s Guide to Preventing Sensitive Information Disclosure
Your data has been exposed—and not because of a classic bug, but because your LLM accidentally leaked it. Sensitive information disclosure is a growing concern, especially with the rise of Large Language Models (LLMs) in our apps. This vulnerability occurs when your model unintentionally ingests confidential data, like PII or SSNs, or reveals sensitive information, such as user passwords, API keys, or proprietary algorithms. For developers building AI applications, particularly in sectors like healthcare and finance, addressing sensitive information disclosure vulnerabilities is critical.
To put the urgency into context, the OWASP Foundation in October 2023, reported that sensitive information disclosure is one of the top 10 most common vulnerabilities in LLM applications.
Let’s explore the nature of sensitive information disclosure in the context of LLMs and review the relevant OWASP guidelines. We’ll also show you how to mitigate these risks by using Pangea, a security API platform that helps you remove PII and implement access control in your AI application.
Understanding Sensitive Information Disclosure
Sensitive information disclosure occurs when an application or system inadvertently exposes confidential information, such as users’ Social Security numbers and other personal details, in plain text. Not only is it a significant risk to store users’ personal information in plain text, but it could also potentially lead to substantial financial losses for businesses in the event of a data breach.
Unlike traditional web applications, where vulnerabilities may arise from code flaws or configuration errors, LLMs introduce unique challenges due to their complex and often opaque inner workings. Where did this training data come from? What is in it?
OWASP's Top 10 vulnerabilities for LLMs highlights sensitive information disclosure as a significant risk, listed as LLM06. This vulnerability can occur when an LLM inadvertently reveals sensitive information due to improper data sanitization, lack of robust input validation, or overfitting during the training process. OWASP recommends several best practices to mitigate this risk, including rigorous data sanitization and redaction, restricting access to sensitive data during training, and implementing robust authorization or access controls to data (OWASP Top 10 AI Security).
Examples of this Vulnerability
Consider an LLM trained on a dataset containing customer support transcripts. If the training data is not properly sanitized, the model might inadvertently learn and reproduce sensitive information, such as credit card numbers or personal addresses, in its responses. Another scenario might involve a model that, due to overfitting, memorizes specific user queries containing users’ personal data, which it could later regurgitate when prompted.
Typical Vulnerability Points
Training Data Leakage: When sensitive data is included in the training set without proper anonymization or exclusion, the model might unintentionally expose this data during operation. This is especially problematic in systems where LLMs interact with other users or systems that handle sensitive data.
- Real-world Incident: Google C4 dataset, which is used as training data for many LLMs, including Meta’s LLAMA models. This dataset contains voter registration information from the states of Colorado and Florida (Washington Post).
No User Input Validation: When an LLM processes user input (also known as model inference), it often uses the input to improve its responses through a machine learning process called reinforcement learning. However, it is common for users to inadvertently insert personal information while trying to give the LLM more context for a question. As a result, the lack of input validation could further lead to sensitive information leakage in the future.
- Real-world Incident: A CyberHaven study, after the launch of ChatGPT, found that 11% of data employees put into ChatGPT was confidential information (ranging from PII and Personal Health Information (PHI) to source code disclosure)
Prompt Injection Attacks: Attackers can craft specific inputs designed to manipulate the LLM into disclosing sensitive information. This is particularly dangerous when the LLM lacks robust input validation and filtering mechanisms.
Misconfigurations and Access Controls: Inadequate access controls can lead to scenarios where unauthorized users gain access to sensitive information stored within or processed by the LLM.
Why should I care about it?
Sensitive information disclosure isn’t just a compliance issue or a box to check for security audits—it's a problem that can have immediate, tangible impacts on your work as a developer. Imagine you’re deploying an LLM in a production environment, and due to a small oversight, your model starts leaking API keys, user credentials, or even confidential business data. Or even worse, lets say your LLM app takes in user PII like SSNs and uses user inputs to improve itself using re-enforcement learning. Now the model can potentially spit out customer SSNs to any bad actor. That’s not just a bug in your app; it could lead to significant security breaches, forcing you to scramble for fixes while under the immense pressure of a major data breach.
When sensitive information gets exposed, the repercussions extend beyond just the immediate fallout. For instance, attackers could exploit leaked data to gain unauthorized access to other parts of the system, setting off a chain reaction of vulnerabilities that weren't even on your radar. Worse, these breaches could undermine your users' trust in the system you helped build, erasing months or years of hard work with a single vulnerability.
Mitigation Strategies
Data Redaction: Ensure that all sensitive information is properly sanitized before being included in any LLM inputs from users and training datasets. This includes removing or anonymizing PII and other confidential data from users’ inputs.
Input Validation and Output Filtering: Implement robust validation mechanisms to ensure that only appropriate inputs are processed by the LLM. Additionally, filter outputs to prevent the accidental disclosure of sensitive information.
Secure by Design Practices: Adopt security by design principles, ensuring that all phases of LLM development, from data collection to deployment, include stringent security checks and balances. This could include adding tools such as audit logs for all inputs and outputs in an LLM app. Learn more about it in the Secure by Design hub.
Implementing Strong Access Controls: Effective access control is critical in preventing unauthorized users from accessing sensitive information. This includes enforcing the principle of least privilege, implementing relationship-based access controls (ReBAC) and Role-based access controls (RBAC) frameworks, and regularly auditing access logs to detect and respond to unauthorized activities
Pangea’s Redact API allows you to redact any PII, PHI, API keys, and much more from your LLM inputs or training data sets through its advanced NLP and Regex engine.
In addition, Pangea’s AuthZ service allows you to add ReBAC and RBAC authorization controls to implement strong access control policies in your LLM app.
Getting Started with Pangea Redact:
Adding the Pangea Redact API is a quick way to prevent most forms of PII from being inputted into your LLM and it’s easy to setup.
Step 1: Signup for an account on pangea.cloud
Head over to pangea.cloud and create an account for free. Then in the developer console, enable the “Redact” service and grab the newly-created “Pangea Token” from the dashboard. Paste this token in your .env file.
Step 2: Add a redact API call to your LLM app
At every instance where user input is being sent to an LLM, call the Pangea redact API and then send out the sanitized response to the LLM. Here’s an example in python with OpenAI GPT 3.5 as the LLM:
from pangea.services import Redact
redact = Redact("<PANGEA_REDACT_TOKEN>")
def clean_prompt(prompt): return redact.redact(text=prompt).result.redacted_text
...
...
# OpenAI generates text with a redacted prompt
completion = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{
"role": "user",
"content": redacted_prompt
}
]
)
You can find full working code-examples in Python and JavaScript
Here's a video walk through of the same 👇
Conclusion
Sensitive information disclosure is a vulnerability that developers building LLM and GenAI apps can’t ignore. By understanding how these leaks happen, using tools and techniques to prevent users’ confidential information from getting leaked, and applying solid mitigation strategies, you can significantly reduce the chances of exposing sensitive user information. As AI security keeps evolving, staying updated and proactive is the best way to make sure your systems are secure by design.
Pangea’s Redact and AuthZ API allow developers to get confidential data redaction and robust access control setup in just a few minutes. If you are interested in trying out these services, you can start here for free.
Subscribe to my newsletter
Read articles from Pranav Shikarpur directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by