Performance Battle: C# vs Rust for PII Scrubbing

Priyank SinghPriyank Singh
4 min read

In the world of software development, performance is often a crucial factor that can make or break an application's success. In this article, we will look at extensively used sub-problem in business i.e. removing Personally Identifiable Information(PII) information from the captured data.

The article highlights the performance comparison between C# and Rust. The underline problem for PII scrubbing is efficient regex matching.

Regex replace using Csharp

We will be using System.Text.RegularExpressions namespace for regex replace. Regex replace is used to replace the matched pattern from the input text.

public static void RegexTest()
{
    // Read the JSON file containing the regex patterns and replacements
    var patterns = JsonConvert.DeserializeObject<Patterns>(File.ReadAllText("patterns.json"));

    // Read the input text from the TXT file
    var inputText = File.ReadAllText("input.txt");
    string resultText = inputText;

    Dictionary<string, string> replacements = new Dictionary<string, string>();

    foreach (Pattern pattern in patterns.patterns)
    {
        replacements.Add(pattern.pattern, pattern.label);
    }

    Stopwatch stopwatch = new Stopwatch();
    stopwatch.Start();

    // Iterate over the regex patterns and replace the matches in the resultText
    foreach (var kvp in replacements)
    {
        var regex = new Regex(kvp.Key);
        resultText = regex.Replace(resultText, kvp.Value);
    }

    stopwatch.Stop();
    var duration = stopwatch.ElapsedMilliseconds;

    Console.WriteLine("Original text:");
    Console.WriteLine(inputText);
    Console.WriteLine("\nReplaced text:");
    Console.WriteLine(resultText);
    Console.WriteLine("\nTime taken: " + duration);
}

To check the complete working code visit Github

Regex replace using Rust

We will use the regex crate for regular expressions and use the replace all method to perform replacements.

There are two versions of code in Rust, we will look at both of them

Unoptimized

fn regex_test() {
    // Read the JSON file containing the regex patterns and replacements
    let patterns_json = fs::read_to_string("patterns.json").expect("Failed to read patterns.json");
    let patterns: Patterns = serde_json::from_str(&patterns_json).expect("Failed to parse JSON");

    // Read the input text from the TXT file
    let input_text = fs::read_to_string("input.txt").expect("Failed to read input.txt");

    let mut replacements = HashMap::new();

    for pattern in patterns.patterns {
        replacements.insert(pattern.pattern, pattern.label);
    }

    let mut result_text = input_text.to_string();

    // Measure the time taken to perform the regex replacements
    let start_time = Instant::now();

    // Iterate over the regex patterns and replace the matches in the result_text
    for (pattern, replacement) in replacements.iter() {
        let regex = Regex::new(pattern).unwrap();
        result_text = regex.replace_all(&result_text, replacement.to_string()).to_string();
    }

    let end_time = Instant::now();
    let duration = end_time - start_time;

    // println!("Original text: {}", input_text);
    // println!("Replaced text: {}", result_text);
    println!("Time taken: {:?} and size of text file: {}", duration, (input_text.len() as f64)/1024.0);
}

To check the complete working code visit Github

Optimized

In the optimized version we compile the regular expressions outside of the loop and reuse them for multiple replacements. This avoids redundant compilation of the same regex pattern for each iteration.

fn regex_test() {
    // Read the JSON file containing the regex patterns and replacements
    let patterns_json = fs::read_to_string("patterns.json").expect("Failed to read patterns.json");
    let patterns: Patterns = serde_json::from_str(&patterns_json).expect("Failed to parse JSON");

    // Read the input text from the TXT file
    let input_text = fs::read_to_string("input.txt").expect("Failed to read input.txt");

    let mut replacements = HashMap::new();

    for pattern in patterns.patterns {
        replacements.insert(pattern.pattern, pattern.label);
    }

    let mut result_text = input_text.to_string();

    // Compile the regex patterns outside the loop
    // OPTIMIZATION
    let regexes: Vec<Regex> = replacements
        .keys()
        .map(|pattern| Regex::new(pattern).unwrap())
        .collect();

    // Measure the time taken to perform the regex replacements
    let start_time = Instant::now();

    // Iterate over the regex patterns and replace the matches in the result_text
    for (regex, replacement) in regexes.iter().zip(replacements.values()) {
        result_text = regex.replace_all(&result_text, replacement.to_string()).to_string();
    }

    let end_time = Instant::now();
    let duration = end_time - start_time;

    // println!("Original text: {}", input_text);
    // println!("Replaced text: {}", result_text);
    println!("Time taken: {:?} and size of text file: {}", duration, (input_text.len() as f64)/1024.0);
}

To check the complete working code visit Github

Performance Comparision

Key points used to get these performance numbers

  • The number of regexes replacing pattern counts used is 10

  • Each output point was calculated by taking an average of 20runs

  • These numbers might vary based on system specification

Key Takeaways:

  • Both variations of the Rust code demonstrate better performance than the C# implementation.

  • The optimized Rust code excels in performance, consistently outperforming the other implementations.

  • The initial regex compilation in Rust is responsible for the performance difference between the optimized and unoptimized Rust code.

Summary

We tried to do the pre-compilation of regex in the C# also but that did not make any difference in the performance. Looking at the results we can clearly say Rust is much better in terms of PII scrubbing across various input sizes.

2
Subscribe to my newsletter

Read articles from Priyank Singh directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Priyank Singh
Priyank Singh