Securing Your Data: A Deep Dive into nxs-data-anonymizer
Data anonymization is a process for ensuring privacy, compliance, and security. It allows organizations to use and share data while protecting sensitive information. The need for secure data handling has led to the development of powerful anonymization tools like nxs-data-anonymizer, a solution designed to anonymize database dumps for PostgreSQL, MySQL, MariaDB, and Percona.
In this article, we’ll explore key features of nxs-data-anonymizer and how it supports privacy-focused data management, based on insights from various sources that discuss anonymization principles, tool configurations, and best practices.
The Importance of Data Anonymization
Data anonymization is essential for protecting personal data when shared or used for development, testing, or analysis. Stripping identifiable information from datasets enables engineers and data scientists to work on realistic data without risking exposure to sensitive details. This practice has far-reaching implications, particularly in fields that rely on heavy data usage, such as healthcare, finance, and technology.
Core Features
As one of the leading tools for data anonymization, nxs-data-anonymizer focuses on database dumps and offers several features designed to maintain data integrity while anonymizing sensitive information. It enables users to anonymize fields in various databases using filters, templates, and custom scripts.
Flexible data faking based on:
Go templates and Sprig template’s library. You may also use values of other columns for the same row to build more flexible rules;
External commands you may execute to create table field values;
Security enforcement rules;
Link cells across the database to generate the same values;
Stream data processing.;
Easy to integrate into your CI/CD.
How nxs-data-anonymizer works
Here’s a basic example of how nxs-data-anonymizer is configured:
variables:
adminPassword:
type: template
value: "preset_admin_password"
adminAPIKey:
value: "preset_admin_api_key"
filters:
public.users:
columns:
username:
value: "{{ if eq .Values.username \"admin\" }}{{ .Values.username }}{{ else }}user_{{ .Values.id }}{{ end }}"
password:
type: command
value: /path/to/script.sh
unique: true
api_key:
value: "{{ if eq .Values.username \"admin\" }}{{ .Variables.adminAPIKey }}{{ else }}{{- randAlphaNum 50 | nospace | lower -}}{{ end }}"
unique: true
This configuration manages user credentials and API keys in the public.users
table, ensuring sensitive data like passwords and API keys are either generated dynamically or retrieved from predefined variables. Unique values are enforced to prevent duplicates.
In the variables, adminPassword
and adminAPIKey
store preset values that can be reused throughout the configuration. adminPassword
is set up as a template type, and adminAPIKey
holds a predefined API key for the administrator.
In the filters for the public.users
table:
username: The value for usernames is conditionally generated. If the username is "admin," it keeps the original name, otherwise, it generates a new username using the user's ID (e.g., user_1, user_2).
password: The password value is generated using an external script located at /path/to/script.sh and is ensured to be unique for each user.
api_key: The API key is conditionally set. If the username is "admin," it assigns the
adminAPIKey
value; otherwise, it generates a unique 50-character alphanumeric API key for other users.
The Role of Filters and Foreign Keys
Filters in nxs-data-anonymizer are used to define how specific columns in your database should be anonymized. They provide a flexible way to specify rules for data transformation, ensuring that sensitive information is replaced with anonymized values. Filters can be applied at different levels, including tables, columns, and data types.
Key Components of Filters
Table-Specific Filters to define rules for specific tables.
Column-Specific Filters to define rules for specific columns within a table.
Type-Specific Filters to define rules based on data types using regular expressions.
Default Filters to apply default rules for columns and types that do not have specific filters defined.
Foreign keys in a database are used to maintain referential integrity between tables. They ensure that the relationships between tables are preserved, such as ensuring that a value in one table corresponds to a valid value in another table.
When anonymizing data, it is crucial to handle foreign keys properly to maintain the integrity of the database. nxs-data-anonymizer provides mechanisms to ensure that linked columns across different tables are anonymized consistently.
Practical Use Cases
The link block in the nxs-data-anonymizer configuration is used to create consistent data in different database cells. It ensures that cells with the same data will have the same data after anonymization. A block can contain multiple tables and columns, and a common rule is applied to create new values in it.
In a database with user information, the link function can ensure that user IDs in different tables (e.g., Orders and Contact Information) remain consistent after anonymization. The same principle can be used to apply block linking to data in any sector: Fintech, Foodtech, Medtech, etc. In healthcare databases, patient IDs can link to multiple tables (e.g., patient histories, appointments, prescriptions). The link function ensures that the anonymized patient ID remains unchanged in all these tables, preserving the links between the data.
Configuring the Link Block
The Link Block ensures data consistency across tables. Here’s an example of a Link Block configuration:
security:
policy:
tables: skip
columns: skip
link:
- rule:
value: "{{ randInt 1 50 }}"
unique: true
with:
authors:
- id
posts:
- author_id
filters:
authors:
columns:
first_name:
value: "{{- randAlphaNum 20 -}}"
last_name:
value: "{{- randAlphaNum 20 -}}"
birthdate:
value: "1999-12-31"
added:
value: "2000-01-01 12:00:00"
posts:
columns:
id:
value: "{{ randInt 1 100 }}"
unique: true
In our example, the ID
column in the authors
table is linked to the author_id
column in the posts table. The sequence of tables in the dump does not affect data replacement. In this case, it means that after anonymizing data from one table, when processing the next table, the linked data will not be generated again. They will be transferred from the corresponding column of the first table.
The security
block allows you to skip the anonymization of tables and columns that are not described. This is useful in cases where we need the original data for further work or if the data is not sensitive.
The rule
block specifies that a random value between 1 and 50 should be generated for the associated columns.
The unique
property ensures that the generated value is unique for all specified columns.
The with
block lists the tables and columns to be linked. In our case, the ID
column in the authors
table and the author_id
column in the posts table will have the same UUID after anonymization.
Further described data in filters do not need to be linked to each other, so anonymization of their values can be set to random, or specific, static values.
Conclusion
Data anonymization is a big part of modern data management, particularly in sectors where privacy and compliance are paramount. Tools like nxs-data-anonymizer are essential for anonymizing large datasets while preserving data integrity and consistency.
As organizations continue to face growing data privacy challenges, adopting such anonymization tools will become increasingly critical for protecting user privacy, ensuring compliance, and enabling secure data usage for development, testing, and analysis.
If you have any ideas, questions, or even suggestions about this case or our tool in general, you can contact us in the Telegram chat, in the comments, or make a pull request on GitHub! Your feature may be next on the list for implementation
Subscribe to my newsletter
Read articles from Nixys directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Nixys
Nixys
We don't just do DevOps — we live it! We have been working in the DevOps industry since 2011 as an IT system integrator with expertise in DevOps, DevSecOps, SRE, and MLOps. We follow cloud native approach, implement best practices, automate CI/CD processes and provide server support for high load projects 24/7.