Rust & Performance: Tackling the 1BRC Challenge

A little over a year ago, I got curious about the 1 Billion Row Challenge (1BRC). It seemed like the perfect playground to test Rust’s performance chops — 1 billion weather station measurements, aggregate per-city statistics (min, max, average), and do it as fast as possible.

At that time, I went down a rabbit hole of Rust performance research, experimenting with naïve approaches, multithreading, and low-level optimizations. I never wrote about it back then, but looking back, the lessons are worth sharing. So here’s my journey — from 12 minutes → 2 mins → 10 seconds.

Stage 1: The Naïve Rust Approach — 12 Minutes ⏳

I began with a straightforward solution:

Load the file into a string.
Split by newline.
Parse each line into city;temperature.
Aggregate results in a HashMap<String, CityStats>.

It was idiomatic Rust, safe, and simple. But it took 12 minutes to finish.

This stage gave me a baseline, but it was clear that high-level string parsing was eating performance alive.

Stage 2: Embracing Concurrency — 15 Seconds 🚀

My next line of research was parallelism. Rust provides great abstractions like std::thread::scope and Arc<Mutex<T>>, so I divided the file into thread-safe chunks aligned on newline boundaries. Each thread processed its own slice of the file and then merged results into a global HashMap.

The speedup was dramatic — down to ~2 mins.

This was my first “wow” moment: Rust’s fearless concurrency makes scaling across CPU cores approachable and safe. But something was still bothering me — parsing overhead.

Stage 3: Researching Parsing Costs → Working with Bytes — 10 Seconds ⚡️

I dug deeper into how Rust handles strings and UTF-8. My research led me to an important insight:

Strings are expensive. Bytes are cheap.

Every conversion to String or &str was adding overhead. So I restructured my code to work directly on raw u8 arrays. Instead of treating the file as text, I processed byte slices and converted only when strictly necessary.

This optimization cut execution time almost in half — from 2 mins to ~10s.

At this point, profiling showed something surprising:

~4s = actual computation.
~6s = just loading data from the SSD.

That meant I had reached the I/O limit of my hardware. Any further improvement would require tricks like memory-mapped files (mmap), SIMD parsing, or asynchronous I/O.

Lessons Learned 📚

This wasn’t just about solving a coding challenge — it was a research journey into Rust’s performance model.

Naïve is necessary. My 12-min baseline gave me something to measure against.
Concurrency matters, but parsing dominates. Threads gave me my first big win, but eliminating string parsing was the real breakthrough.
I/O is king. Once your code is fast enough, the bottleneck shifts from CPU to hardware.
Rust shines in performance-critical paths. Working with raw bytes in a safe way is exactly where Rust feels both low-level and empowering.

Code Snapshot: Processing Data with Bytes

Here’s the core of my final approach:

fn process_data(data: &[u8]) -> HashMap<String, CityStats> {
    let mut map: HashMap<String, CityStats> = HashMap::new();

    for segment in data.split(|&byte| byte == b'\n') {
        let mut parts = std::str::from_utf8(segment).unwrap().split(';');

        if let (Some(city), Some(value)) = (parts.next(), parts.next()) {
            let val = value.parse::<f32>().unwrap();
            match map.entry(city.to_string()) {
                Entry::Occupied(mut e) => {
                    let s = e.get_mut();
                    s.count += 1.0;
                    s.sum += val;
                    s.min = s.min.min(val);
                    s.max = s.max.max(val);
                }
                Entry::Vacant(e) => {
                    e.insert(CityStats { min: val, max: val, count: 1.0, sum: val });
                }
            }
        }
    }

    map
}

Closing Thoughts 💡

This project was less about “solving 1BRC” and more about understanding Rust at the performance frontier.

I started with high-level Rust (strings, safe iteration) and ended up optimizing down to raw bytes. Along the way, I learned how multithreading, memory access patterns, and I/O limits interact in real-world workloads.

Right now, my solution runs in 10 seconds, where 6 seconds are I/O bound. That means the core algorithm is blazing fast — and any further speedup requires going beyond CPU optimizations into system-level tricks.

This experience has convinced me: Rust isn’t just about safety. It’s about giving you the tools to write code that’s as fast as your hardware will allow.

My Research Journey into Rust & Performance: Solving the 1BRC Challenge ⚡️