How to handle Data Encryption for sensitive data including embeddings, is encrypted both at rest and in transit (AzureOpenAI)

Certainly! Ensuring that sensitive data is encrypted both at rest and in transit is critical when working with Large Language Models (LLMs) like Azure OpenAI, especially when dealing with embeddings generated from sensitive data.

Below, I'll explain how to achieve this with code examples using a sample sensitive text dataset. We'll cover:

  1. Encrypting Data at Rest: Using Azure Blob Storage with client-side encryption.

  2. Encrypting Data in Transit: Ensuring secure communication with Azure services using HTTPS/TLS.

  3. Encrypting Embeddings: Encrypting embeddings before storing them.

  4. Managing Encryption Keys: Using Azure Key Vault for key management.


Prerequisites

  • Azure Subscription: An active Azure account.

  • Azure Storage Account: For storing encrypted data.

  • Azure Key Vault: For managing encryption keys.

  • Azure OpenAI Access: To use OpenAI services on Azure.

  • Python Environment: With necessary packages installed.


1. Sample Sensitive Text Dataset

Let's start with a sample sensitive dataset:

sensitive_data = [
    "Patient Name: Alice Smith, SSN: 123-45-6789, Diagnosis: Hypertension",
    "Patient Name: Bob Johnson, SSN: 987-65-4321, Diagnosis: Type 2 Diabetes"
]

2. Encrypting Data at Rest

a. Setting Up Azure Key Vault for Key Management

First, create an encryption key in Azure Key Vault.

# Install necessary packages
!pip install azure-keyvault-keys azure-identity

from azure.identity import DefaultAzureCredential
from azure.keyvault.keys import KeyClient
from azure.keyvault.keys.crypto import CryptographyClient, EncryptionAlgorithm

# Replace with your Key Vault URL
key_vault_url = "https://<your-key-vault-name>.vault.azure.net/"

credential = DefaultAzureCredential()
key_client = KeyClient(vault_url=key_vault_url, credential=credential)

# Create or get an encryption key
key_name = "encryption-key"
key = key_client.get_key(key_name)

b. Encrypting the Data Locally

Use the key to encrypt your sensitive data before storing it.

import base64

# Initialize Cryptography Client
crypto_client = CryptographyClient(key=key, credential=credential)

encrypted_data = []

for data in sensitive_data:
    data_bytes = data.encode('utf-8')
    encrypt_result = crypto_client.encrypt(EncryptionAlgorithm.rsa_oaep, data_bytes)
    encrypted_content = base64.b64encode(encrypt_result.ciphertext).decode('utf-8')
    encrypted_data.append(encrypted_content)

c. Storing Encrypted Data in Azure Blob Storage

!pip install azure-storage-blob

from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient

# Replace with your Storage Account connection string
connection_string = "<your-storage-account-connection-string>"
container_name = "encrypted-data"

blob_service_client = BlobServiceClient.from_connection_string(connection_string)
container_client = blob_service_client.get_container_client(container_name)

# Create the container if it doesn't exist
try:
    container_client.create_container()
except Exception as e:
    pass  # Container already exists

# Upload encrypted data
for idx, encrypted_content in enumerate(encrypted_data):
    blob_name = f"patient_record_{idx}.enc"
    blob_client = container_client.get_blob_client(blob_name)
    blob_client.upload_blob(encrypted_content, overwrite=True)

Note: Azure Storage encrypts data at rest by default using Microsoft-managed keys. By encrypting the data before sending it to Azure Blob Storage, you're adding an extra layer of security.


3. Encrypting Data in Transit

When communicating with Azure services, ensure that all connections use HTTPS, which encrypts data in transit using TLS.

# When creating clients, ensure the service URLs start with 'https://'
blob_service_client = BlobServiceClient.from_connection_string(connection_string)

All Azure SDKs use HTTPS by default. If you're making REST API calls, ensure the endpoints start with https://.


4. Using Azure OpenAI Securely

a. Setting Up the Azure OpenAI Client

!pip install openai

import os
import openai

# Set up the OpenAI API credentials
openai.api_type = "azure"
openai.api_base = "https://<your-resource-name>.openai.azure.com/"
openai.api_version = "2023-06-01-preview"
openai.api_key = os.environ["AZURE_OPENAI_API_KEY"]

b. Generating Embeddings Securely

# Generate embeddings for the sensitive data
embeddings = []

for data in sensitive_data:
    response = openai.Embedding.create(
        input=data,
        engine="text-embedding-ada-002"
    )
    embedding = response['data'][0]['embedding']
    embeddings.append(embedding)

c. Encrypting Embeddings Before Storage

Since embeddings can potentially leak information, encrypt them before storage.

import numpy as np

encrypted_embeddings = []

for embedding in embeddings:
    # Convert embedding list to bytes
    embedding_bytes = np.array(embedding, dtype=np.float32).tobytes()
    encrypt_result = crypto_client.encrypt(EncryptionAlgorithm.rsa_oaep, embedding_bytes)
    encrypted_embedding = base64.b64encode(encrypt_result.ciphertext).decode('utf-8')
    encrypted_embeddings.append(encrypted_embedding)

d. Storing Encrypted Embeddings

# Store encrypted embeddings in Blob Storage
for idx, encrypted_embedding in enumerate(encrypted_embeddings):
    blob_name = f"embedding_{idx}.enc"
    blob_client = container_client.get_blob_client(blob_name)
    blob_client.upload_blob(encrypted_embedding, overwrite=True)

5. Decrypting Data When Needed

a. Decrypting the Data

from azure.keyvault.keys.crypto import EncryptionAlgorithm

decrypted_data = []

for encrypted_content in encrypted_data:
    ciphertext = base64.b64decode(encrypted_content)
    decrypt_result = crypto_client.decrypt(EncryptionAlgorithm.rsa_oaep, ciphertext)
    data = decrypt_result.plaintext.decode('utf-8')
    decrypted_data.append(data)

b. Decrypting Embeddings

decrypted_embeddings = []

for encrypted_embedding in encrypted_embeddings:
    ciphertext = base64.b64decode(encrypted_embedding)
    decrypt_result = crypto_client.decrypt(EncryptionAlgorithm.rsa_oaep, ciphertext)
    embedding_bytes = decrypt_result.plaintext
    embedding = np.frombuffer(embedding_bytes, dtype=np.float32).tolist()
    decrypted_embeddings.append(embedding)

6. Ensuring Secure Key Management

  • Key Rotation: Regularly rotate your encryption keys in Azure Key Vault.

  • Access Policies: Limit access to the Key Vault to essential personnel and services.

  • Monitoring: Enable logging and monitoring for Key Vault access.


7. Additional Security Measures

a. Use Private Endpoints

Establish a private link to Azure services to ensure that data does not traverse the public internet.

# This is set up in the Azure Portal or using Azure CLI

b. Enforce Network Security Groups (NSGs)

Control network traffic to and from your resources.

c. Enable Azure Defender

Use Microsoft Defender for Cloud to protect your resources.


8. Summary

By encrypting sensitive data and embeddings before storing them and ensuring that all communications are encrypted, you significantly enhance the security of your data when using Azure OpenAI services.

Key Takeaways:

  • Encrypt Data at Rest: Use client-side encryption in addition to Azure's default encryption.

  • Encrypt Data in Transit: Always use HTTPS/TLS for communication.

  • Encrypt Embeddings: Treat embeddings as sensitive data.

  • Manage Keys Securely: Use Azure Key Vault for key management.

  • Secure Access: Limit and monitor access to sensitive resources.


Important Notes

  • Performance Impact: Encrypting and decrypting data adds computational overhead. Assess the performance implications for your application.

  • Compliance: Ensure that your encryption methods comply with relevant regulations.

  • Error Handling: Include robust error handling in production code.


References


0
Subscribe to my newsletter

Read articles from Sai Prasanna Maharana directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Sai Prasanna Maharana
Sai Prasanna Maharana