Boost JSON Lines Processing with a High-Performance Rust Library


As a developer working with large datasets, I've often found myself wrestling with JSON Lines (JSONL) files. Whether it's processing log files, handling data exports, or working with streaming APIs, JSONL has become ubiquitous in the data world. However, I noticed a gap in the Rust ecosystem for efficient, async-first JSONL processing. That's why I decided to create async-jsonl - and I'm excited to share what I've built!
What is JSON Lines (JSONL)?
Before diving into my library, let's quickly cover what JSONL actually is. JSON Lines is a text format where each line is a valid JSON object, separated by newline characters. It looks like this:
{"id": 1, "name": "Alice", "age": 30}
{"id": 2, "name": "Bob", "age": 25}
{"id": 3, "name": "Charlie", "age": 35}
Why JSONL is Awesome
Streamable: You can process one line at a time without loading the entire file into memory
Appendable: Easy to add new records to the end of a file
Fault-tolerant: One corrupted line doesn't break the entire dataset
Tool-friendly: Works great with unix tools like
cat
,head
,tail
, andgrep
Language-agnostic: Supported across virtually every programming language
These advantages make JSONL perfect for log files, data pipelines, and any scenario where you're dealing with large volumes of structured data.
Enter async-jsonl: Built for Performance and Simplicity
While working on various Rust projects that needed to process large JSONL files, I kept running into the same issues:
Existing solutions weren't async-first
Memory usage would explode with large files
Error handling was either too strict (one bad line kills everything) or too loose
Type safety was often sacrificed for convenience
So I built async-jsonl to solve these problems. Here's what makes it special:
๐ Async/Await from the Ground Up
Built on Tokio, async-jsonl is designed for high-performance async I/O. No blocking the event loop here!
use async_jsonl::Jsonl;
use futures::StreamExt;
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let jsonl = Jsonl::from_path("data.jsonl").await?;
let lines: Vec<_> = jsonl.collect().await;
for line_result in lines {
let line = line_result?;
println!("Raw JSON: {}", line);
}
Ok(())
}
๐พ Memory Efficient Streaming
Process files of any size without loading everything into memory. The streaming approach means you can handle gigabyte files with minimal RAM usage.
use async_jsonl::{Jsonl, JsonlDeserialize};
use futures::StreamExt;
use serde::Deserialize;
#[derive(Debug, Deserialize)]
struct LogEntry {
timestamp: String,
level: String,
message: String,
}
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let jsonl = Jsonl::from_path("huge_log_file.jsonl").await?;
let mut stream = jsonl.deserialize::<LogEntry>();
// Process one record at a time - O(1) memory usage!
while let Some(entry_result) = stream.next().await {
let entry = entry_result?;
if entry.level == "ERROR" {
println!("๐จ Error at {}: {}", entry.timestamp, entry.message);
}
}
Ok(())
}
๐ Type-Safe with Serde Integration
Full integration with serde means you get compile-time type checking and zero-cost deserialization.
use async_jsonl::{Jsonl, JsonlDeserialize};
use futures::StreamExt;
use serde::Deserialize;
#[derive(Debug, Deserialize)]
struct User {
id: u64,
name: String,
email: String,
#[serde(default)]
is_verified: bool,
}
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let jsonl = Jsonl::from_path("users.jsonl").await?;
let users = jsonl.deserialize::<User>();
let results: Vec<_> = users.collect().await;
for user_result in results {
let user = user_result?;
println!("{:?}", user);
}
Ok(())
}
๐ก๏ธ Resilient Error Handling
One of my favorite features: the library continues processing even when individual lines fail to parse. Perfect for real-world data that's not always perfect.
use async_jsonl::{Jsonl, JsonlValueDeserialize};
use futures::StreamExt;
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let jsonl = Jsonl::from_path("messy_data.jsonl").await?;
let mut stream = jsonl.deserialize_values();
let mut valid_records = 0;
let mut error_count = 0;
while let Some(result) = stream.next().await {
match result {
Ok(value) => {
valid_records += 1;
// Process valid JSON
println!("Valid record: {}", value);
}
Err(e) => {
error_count += 1;
eprintln!("Skipping invalid line: {}", e);
// Keep going!
}
}
}
println!("Processed {} valid records, {} errors", valid_records, error_count);
Ok(())
}
๐ Flexible Input Sources
Whether you're reading from files, memory, or any AsyncRead
source, async-jsonl has you covered.
use async_jsonl::Jsonl;
use std::io::Cursor;
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let data = r#"{"id": 1, "name": "Alice"}
{"id": 2, "name": "Bob"}
{"id": 3, "name": "Charlie"}
"#;
// Read from memory
let reader = Cursor::new(data.as_bytes());
let jsonl = Jsonl::new(reader);
// Or read from a file
let file_jsonl = Jsonl::from_path("data.jsonl").await?;
Ok(())
}
๐ฏ Advanced Features for Power Users
The library also includes some advanced features I found myself needing repeatedly:
Line Counting Without Full Processing:
use async_jsonl::{Jsonl, JsonlReader};
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let jsonl = Jsonl::from_path("massive_file.jsonl").await?;
let count = jsonl.count().await?;
println!("File contains {} records", count);
Ok(())
}
The Technical Journey
Building this library taught me a lot about async Rust and streaming APIs. Some key decisions I made:
Tokio as the Foundation: Built on
tokio::io
for maximum compatibility with the async ecosystemStream-based Architecture: Using futures's
Stream
trait for composable, lazy processingZero-copy Where Possible: Minimizing allocations while maintaining safety
Comprehensive Error Types: Using
anyhow
for ergonomic error handling without sacrificing information
Real-World Performance
In my testing with large datasets (>1GB JSONL files), async-jsonl consistently shows:
Memory usage: Constant regardless of file size
Throughput: Competitive with the fastest JSONL parsers in any language
CPU efficiency: Low overhead thanks to Rust's zero-cost abstractions
Try It Yourself!
The library is available on crates.io and the source is on GitHub. Here's how to get started:
[dependencies]
async-jsonl = "0.3.1"
tokio = { version = "1.0", features = ["full"] }
futures = "0.3"
serde = { version = "1.0", features = ["derive"] }
anyhow = "1.0"
I'd love to hear how you use it in your projects! Whether you're processing logs, handling data pipelines, or working with APIs, async-jsonl is designed to make your life easier.
What do you think? Have you worked with JSONL files before? What features would you find most useful in a JSONL processing library? Let me know in the comments below!
Subscribe to my newsletter
Read articles from Sandipsinh Rathod directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
