My Research Journey into Rust & Performance: Solving the 1BRC Challenge ⚡️

A little over a year ago, I got curious about the 1 Billion Row Challenge (1BRC). It seemed like the perfect playground to test Rust’s performance chops — 1 billion weather station measurements, aggregate per-city statistics (min, max, average), and do it as fast as possible.
At that time, I went down a rabbit hole of Rust performance research, experimenting with naïve approaches, multithreading, and low-level optimizations. I never wrote about it back then, but looking back, the lessons are worth sharing. So here’s my journey — from 12 minutes → 2 mins → 10 seconds.
Stage 1: The Naïve Rust Approach — 12 Minutes ⏳
I began with a straightforward solution:
Load the file into a string.
Split by newline.
Parse each line into
city;temperature
.Aggregate results in a
HashMap<String, CityStats>
.
It was idiomatic Rust, safe, and simple. But it took 12 minutes to finish.
This stage gave me a baseline, but it was clear that high-level string parsing was eating performance alive.
Stage 2: Embracing Concurrency — 15 Seconds 🚀
My next line of research was parallelism. Rust provides great abstractions like std::thread::scope
and Arc<Mutex<T>>
, so I divided the file into thread-safe chunks aligned on newline boundaries. Each thread processed its own slice of the file and then merged results into a global HashMap
.
The speedup was dramatic — down to ~2 mins.
This was my first “wow” moment: Rust’s fearless concurrency makes scaling across CPU cores approachable and safe. But something was still bothering me — parsing overhead.
Stage 3: Researching Parsing Costs → Working with Bytes — 10 Seconds ⚡️
I dug deeper into how Rust handles strings and UTF-8. My research led me to an important insight:
Strings are expensive. Bytes are cheap.
Every conversion to String
or &str
was adding overhead. So I restructured my code to work directly on raw u8
arrays. Instead of treating the file as text, I processed byte slices and converted only when strictly necessary.
This optimization cut execution time almost in half — from 2 mins to ~10s.
At this point, profiling showed something surprising:
~4s = actual computation.
~6s = just loading data from the SSD.
That meant I had reached the I/O limit of my hardware. Any further improvement would require tricks like memory-mapped files (mmap
), SIMD parsing, or asynchronous I/O.
Lessons Learned 📚
This wasn’t just about solving a coding challenge — it was a research journey into Rust’s performance model.
Naïve is necessary. My 12-min baseline gave me something to measure against.
Concurrency matters, but parsing dominates. Threads gave me my first big win, but eliminating string parsing was the real breakthrough.
I/O is king. Once your code is fast enough, the bottleneck shifts from CPU to hardware.
Rust shines in performance-critical paths. Working with raw bytes in a safe way is exactly where Rust feels both low-level and empowering.
Code Snapshot: Processing Data with Bytes
Here’s the core of my final approach:
fn process_data(data: &[u8]) -> HashMap<String, CityStats> {
let mut map: HashMap<String, CityStats> = HashMap::new();
for segment in data.split(|&byte| byte == b'\n') {
let mut parts = std::str::from_utf8(segment).unwrap().split(';');
if let (Some(city), Some(value)) = (parts.next(), parts.next()) {
let val = value.parse::<f32>().unwrap();
match map.entry(city.to_string()) {
Entry::Occupied(mut e) => {
let s = e.get_mut();
s.count += 1.0;
s.sum += val;
s.min = s.min.min(val);
s.max = s.max.max(val);
}
Entry::Vacant(e) => {
e.insert(CityStats { min: val, max: val, count: 1.0, sum: val });
}
}
}
}
map
}
Closing Thoughts 💡
This project was less about “solving 1BRC” and more about understanding Rust at the performance frontier.
I started with high-level Rust (strings, safe iteration) and ended up optimizing down to raw bytes. Along the way, I learned how multithreading, memory access patterns, and I/O limits interact in real-world workloads.
Right now, my solution runs in 10 seconds, where 6 seconds are I/O bound. That means the core algorithm is blazing fast — and any further speedup requires going beyond CPU optimizations into system-level tricks.
This experience has convinced me: Rust isn’t just about safety. It’s about giving you the tools to write code that’s as fast as your hardware will allow.
Subscribe to my newsletter
Read articles from Jagdish Parihar directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Jagdish Parihar
Jagdish Parihar
I am software developer, primarily working on the nodejs, graphql, react and mongoDB.