i made csv-parser 1.3x faster (sometimes)

Johannes NaylorJohannes Naylor
6 min read

why even bother?

A few months ago I was working on a data processing pipeline that needed to churn through some larger-ish CSV files, like 7K+ rows, 1MB+ files. Not "big data" but it was big enough that I’d get impatient.

I was using the csv-parser library, which seems to work. It's been around forever, has a clean API, and does what I want it to. But I kept watching my scripts grinding through these files and thinking "there's gotta be a faster way."

Enter Rust. I'd been wanting an excuse to dive deeper into rust and N-API, so I figured why not try to build a drop-in replacement for csv-parser that's actually faster?

💡
Spoiler alert: it worked, but with some interesting caveats.

the performance story

Here's the thing about performance: it's never as simple as "X is faster than Y." Context matters a lot.

where it actually helps

For the files I was dealing with, the results were pretty solid:

DatasetRowsSizecsv-parserfast-csv-parserSpeedup
large-dataset.csv7,2681.1MB59ms47ms🚀 1.26x faster
option-maxRowBytes.csv4,577700KB36ms29ms🚀 1.24x faster

That's a legit 20-30% speedup on the files that actually matter. Not earth-shattering, but definitely noticeable when you're processing hundreds of these files.

where it doesn't help (and why)

Small files? Not so much. There's about a 0.1ms overhead from the Node.js ↔ Rust boundary that makes tiny files slower to process. It's not a huge deal in absolute terms (we're talking 0.1ms vs 0.2ms), but the ratio looks bad.

The crossover point is around 1KB files. Below that, you're paying the overhead tax. Above that, you start seeing the benefits.

the technical approach

Building this was actually pretty straightforward thanks to some nice tools and new programming models.

llms are surprisingly good at porting

So, Claude did most of the heavy lifting. I was inspired by this post about using LLMs to port C to Rust and figured I'd try a similar approach for JavaScript to Rust (even through there’s no fuzzing at all 😐).

The process was basically:

  1. Give Claude the original csv-parser source code

  2. Ask it to implement the same functionality in Rust with N-API bindings (being really sure to tell it that all the tests need to pass)

  3. Point out compatibility issues and edge cases as I found them

  4. Let it iterate until the tests passed

It did great. It picked up on subtle behaviors around UTF encoding, error handling, and edge cases that would have taken me weeks to discover manually.

This isn't to say it was perfect. I still had to guide it through the compatibility requirements and fix some issues. But having an AI that could read through thousands of lines of JavaScript, understand the intent, and translate it to idiomatic Rust saved me probably months of work.

the rust implementation

I used napi-rs which makes building Node.js addons with Rust simple enough. The code is almost 1:1 with the js version and doesn’t use any other crates to handle the parsing.

The architecture is basically:

  1. JavaScript wrapper maintains 100% API compatibility with csv-parser

  2. N-API bridge handles the Node.js ↔ Rust interface

  3. Rust core does the actual CSV parsing

  4. Cross-platform binaries pre-built for all major platforms

keeping it compatible

The biggest challenge wasn't making it fast but it was making it a true drop-in replacement. csv-parser has a really well-designed API that handles edge cases gracefully. Things like:

  • Custom headers and transformations

  • UTF-16 encoding detection (which I actually improved)

  • Error handling that matches exactly

  • All the same events and options

  • N-API doesn’t support stream/transforms so all that needed to be done by hand

I spent way more time on compatibility than performance. But it was worth it because you can literally just change your import:

// Before
const csv = require('csv-parser')

// After
const csv = require('fast-csv-parser')

// Your existing code works unchanged
const fs = require('fs')
const results = []

fs.createReadStream('data.csv')
  .pipe(csv())
  .on('data', (data) => results.push(data))
  .on('end', () => {
    console.log(results)
  })

real-world usage

I still don’t really use this in production (and idk if I can recommend anyone to) but I have used it for some simple projects. The main benefits I've seen:

  1. ETL pipelines When you're processing hundreds of files, that 20-30% speedup adds up

  2. Data import features Users uploading CSV files get faster feedback

  3. Batch processing Background jobs finish quicker

The overhead on small files hasn't been an issue in practice because most real-world use cases involve files that are at least a few KB.

lessons learned

llms are game-changers for porting

The biggest lesson here wasn't about Rust or performance but it was about how effective LLMs are at code translation. Claude essentially did what would have been a couple weeks of of painstaking manual work in a few hours of back-and-forth.

This matches what I've been seeing in other domains. That fuzzing article talks about using LLMs to automatically port C libraries to Rust by having the AI write fuzz tests to validate behavior. Similar concept here but instead of manually understanding every edge case, you let the AI figure it out and then validate the results.

I think we're going to see a lot more of this. Why manually port libraries between languages when an LLM can do 90% of the work and you just need to clean up the remaining issues?

rust + node.js is pretty great

The napi-rs ecosystem has matured a lot. Building cross-platform binaries, handling different Node.js versions, managing memory between languages. it all mostly just works now.

cross-compiling + distribution isn’t perfect

Getting this project into the npmjs package ecosystem was a bit of a pain in the ass. napi-rs has decent documentation (e.g. https://napi.rs/docs/cross-build/summary + https://napi.rs/docs/deep-dive/release) but I still struggled with getting everything working in CI:

I ended up removing several deployment targets because they were breaking and weren’t working correctly. Perhaps it was something wrong on my end but it would’ve been nice if I could’ve done all of this without CI or something really dumbed down (npm run build-all)

overhead matters

The Node.js ↔ Rust boundary isn't free. For this use case, it's about 0.1ms per invocation. That's nothing for large files but everything for tiny ones.

utf-16 was a bonus

While building this, I ended up adding better UTF-16 support with automatic BOM detection. That wasn't planned but turned out to be useful for some international data sources.

conclusion

Was it worth building? For my use case, absolutely. The performance gains are real and meaningful for the files I process regularly. Plus it was a fun way to learn more about Rust and N-API.

Should you use it? Depends on what you're processing:

  • Large files (>10KB): Probably yes, especially if you're already using csv-parser

  • Lots of small files: Probably not, the overhead adds up

  • Bundle size critical: Stick with csv-parser (it's pure JS)

The best part is that it's truly a drop-in replacement. You can try it risk-free and see if it helps your specific use case.

All the code is on Github and it's published on npm. Give it a shot if you're processing CSV files and let me know how it works for you. PRs are welcome :)


This was a fun side project that scratched both my performance optimization and Rust learning itches. What I cannot build I do not understand.

2
Subscribe to my newsletter

Read articles from Johannes Naylor directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Johannes Naylor
Johannes Naylor