Performance Battle: C# vs Rust for PII Scrubbing
In the world of software development, performance is often a crucial factor that can make or break an application's success. In this article, we will look at extensively used sub-problem in business i.e. removing Personally Identifiable Information(PII) information from the captured data.
The article highlights the performance comparison between C# and Rust. The underline problem for PII scrubbing is efficient regex matching.
Regex replace using Csharp
We will be using System.Text.RegularExpressions
namespace for regex replace. Regex replace is used to replace the matched pattern from the input text.
public static void RegexTest()
{
// Read the JSON file containing the regex patterns and replacements
var patterns = JsonConvert.DeserializeObject<Patterns>(File.ReadAllText("patterns.json"));
// Read the input text from the TXT file
var inputText = File.ReadAllText("input.txt");
string resultText = inputText;
Dictionary<string, string> replacements = new Dictionary<string, string>();
foreach (Pattern pattern in patterns.patterns)
{
replacements.Add(pattern.pattern, pattern.label);
}
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
// Iterate over the regex patterns and replace the matches in the resultText
foreach (var kvp in replacements)
{
var regex = new Regex(kvp.Key);
resultText = regex.Replace(resultText, kvp.Value);
}
stopwatch.Stop();
var duration = stopwatch.ElapsedMilliseconds;
Console.WriteLine("Original text:");
Console.WriteLine(inputText);
Console.WriteLine("\nReplaced text:");
Console.WriteLine(resultText);
Console.WriteLine("\nTime taken: " + duration);
}
To check the complete working code visit Github
Regex replace using Rust
We will use the regex
crate for regular expressions and use the replace all
method to perform replacements.
There are two versions of code in Rust, we will look at both of them
Unoptimized
fn regex_test() {
// Read the JSON file containing the regex patterns and replacements
let patterns_json = fs::read_to_string("patterns.json").expect("Failed to read patterns.json");
let patterns: Patterns = serde_json::from_str(&patterns_json).expect("Failed to parse JSON");
// Read the input text from the TXT file
let input_text = fs::read_to_string("input.txt").expect("Failed to read input.txt");
let mut replacements = HashMap::new();
for pattern in patterns.patterns {
replacements.insert(pattern.pattern, pattern.label);
}
let mut result_text = input_text.to_string();
// Measure the time taken to perform the regex replacements
let start_time = Instant::now();
// Iterate over the regex patterns and replace the matches in the result_text
for (pattern, replacement) in replacements.iter() {
let regex = Regex::new(pattern).unwrap();
result_text = regex.replace_all(&result_text, replacement.to_string()).to_string();
}
let end_time = Instant::now();
let duration = end_time - start_time;
// println!("Original text: {}", input_text);
// println!("Replaced text: {}", result_text);
println!("Time taken: {:?} and size of text file: {}", duration, (input_text.len() as f64)/1024.0);
}
To check the complete working code visit Github
Optimized
In the optimized version we compile the regular expressions outside of the loop and reuse them for multiple replacements. This avoids redundant compilation of the same regex pattern for each iteration.
fn regex_test() {
// Read the JSON file containing the regex patterns and replacements
let patterns_json = fs::read_to_string("patterns.json").expect("Failed to read patterns.json");
let patterns: Patterns = serde_json::from_str(&patterns_json).expect("Failed to parse JSON");
// Read the input text from the TXT file
let input_text = fs::read_to_string("input.txt").expect("Failed to read input.txt");
let mut replacements = HashMap::new();
for pattern in patterns.patterns {
replacements.insert(pattern.pattern, pattern.label);
}
let mut result_text = input_text.to_string();
// Compile the regex patterns outside the loop
// OPTIMIZATION
let regexes: Vec<Regex> = replacements
.keys()
.map(|pattern| Regex::new(pattern).unwrap())
.collect();
// Measure the time taken to perform the regex replacements
let start_time = Instant::now();
// Iterate over the regex patterns and replace the matches in the result_text
for (regex, replacement) in regexes.iter().zip(replacements.values()) {
result_text = regex.replace_all(&result_text, replacement.to_string()).to_string();
}
let end_time = Instant::now();
let duration = end_time - start_time;
// println!("Original text: {}", input_text);
// println!("Replaced text: {}", result_text);
println!("Time taken: {:?} and size of text file: {}", duration, (input_text.len() as f64)/1024.0);
}
To check the complete working code visit Github
Performance Comparision
Key points used to get these performance numbers
The number of regexes replacing pattern counts used is 10
Each output point was calculated by taking an average of 20runs
These numbers might vary based on system specification
Key Takeaways:
Both variations of the Rust code demonstrate better performance than the C# implementation.
The optimized Rust code excels in performance, consistently outperforming the other implementations.
The initial regex compilation in Rust is responsible for the performance difference between the optimized and unoptimized Rust code.
Summary
We tried to do the pre-compilation of regex in the C# also but that did not make any difference in the performance. Looking at the results we can clearly say Rust is much better in terms of PII scrubbing across various input sizes.
Subscribe to my newsletter
Read articles from Priyank Singh directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by