Boost JSON Lines Processing with a High-Performance Rust Library

As a developer working with large datasets, I've often found myself wrestling with JSON Lines (JSONL) files. Whether it's processing log files, handling data exports, or working with streaming APIs, JSONL has become ubiquitous in the data world. However, I noticed a gap in the Rust ecosystem for efficient, async-first JSONL processing. That's why I decided to create async-jsonl - and I'm excited to share what I've built!

What is JSON Lines (JSONL)?

Before diving into my library, let's quickly cover what JSONL actually is. JSON Lines is a text format where each line is a valid JSON object, separated by newline characters. It looks like this:

{"id": 1, "name": "Alice", "age": 30}
{"id": 2, "name": "Bob", "age": 25}
{"id": 3, "name": "Charlie", "age": 35}

Why JSONL is Awesome

  1. Streamable: You can process one line at a time without loading the entire file into memory

  2. Appendable: Easy to add new records to the end of a file

  3. Fault-tolerant: One corrupted line doesn't break the entire dataset

  4. Tool-friendly: Works great with unix tools like cat, head, tail, and grep

  5. Language-agnostic: Supported across virtually every programming language

These advantages make JSONL perfect for log files, data pipelines, and any scenario where you're dealing with large volumes of structured data.

Enter async-jsonl: Built for Performance and Simplicity

While working on various Rust projects that needed to process large JSONL files, I kept running into the same issues:

  • Existing solutions weren't async-first

  • Memory usage would explode with large files

  • Error handling was either too strict (one bad line kills everything) or too loose

  • Type safety was often sacrificed for convenience

So I built async-jsonl to solve these problems. Here's what makes it special:

๐Ÿš€ Async/Await from the Ground Up

Built on Tokio, async-jsonl is designed for high-performance async I/O. No blocking the event loop here!

use async_jsonl::Jsonl;
use futures::StreamExt;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let jsonl = Jsonl::from_path("data.jsonl").await?;

    let lines: Vec<_> = jsonl.collect().await;
    for line_result in lines {
        let line = line_result?;
        println!("Raw JSON: {}", line);
    }

    Ok(())
}

๐Ÿ’พ Memory Efficient Streaming

Process files of any size without loading everything into memory. The streaming approach means you can handle gigabyte files with minimal RAM usage.

use async_jsonl::{Jsonl, JsonlDeserialize};
use futures::StreamExt;
use serde::Deserialize;

#[derive(Debug, Deserialize)]
struct LogEntry {
    timestamp: String,
    level: String,
    message: String,
}

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let jsonl = Jsonl::from_path("huge_log_file.jsonl").await?;
    let mut stream = jsonl.deserialize::<LogEntry>();

    // Process one record at a time - O(1) memory usage!
    while let Some(entry_result) = stream.next().await {
        let entry = entry_result?;
        if entry.level == "ERROR" {
            println!("๐Ÿšจ Error at {}: {}", entry.timestamp, entry.message);
        }
    }

    Ok(())
}

๐Ÿ”’ Type-Safe with Serde Integration

Full integration with serde means you get compile-time type checking and zero-cost deserialization.

use async_jsonl::{Jsonl, JsonlDeserialize};
use futures::StreamExt;
use serde::Deserialize;

#[derive(Debug, Deserialize)]
struct User {
    id: u64,
    name: String,
    email: String,
    #[serde(default)]
    is_verified: bool,
}

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let jsonl = Jsonl::from_path("users.jsonl").await?;
    let users = jsonl.deserialize::<User>();

    let results: Vec<_> = users.collect().await;
    for user_result in results {
        let user = user_result?;
        println!("{:?}", user);
    }

    Ok(())
}

๐Ÿ›ก๏ธ Resilient Error Handling

One of my favorite features: the library continues processing even when individual lines fail to parse. Perfect for real-world data that's not always perfect.

use async_jsonl::{Jsonl, JsonlValueDeserialize};
use futures::StreamExt;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let jsonl = Jsonl::from_path("messy_data.jsonl").await?;
    let mut stream = jsonl.deserialize_values();

    let mut valid_records = 0;
    let mut error_count = 0;

    while let Some(result) = stream.next().await {
        match result {
            Ok(value) => {
                valid_records += 1;
                // Process valid JSON
                println!("Valid record: {}", value);
            }
            Err(e) => {
                error_count += 1;
                eprintln!("Skipping invalid line: {}", e);
                // Keep going!
            }
        }
    }

    println!("Processed {} valid records, {} errors", valid_records, error_count);
    Ok(())
}

๐Ÿ“– Flexible Input Sources

Whether you're reading from files, memory, or any AsyncRead source, async-jsonl has you covered.

use async_jsonl::Jsonl;
use std::io::Cursor;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let data = r#"{"id": 1, "name": "Alice"}
{"id": 2, "name": "Bob"}
{"id": 3, "name": "Charlie"}
"#;

    // Read from memory
    let reader = Cursor::new(data.as_bytes());
    let jsonl = Jsonl::new(reader);

    // Or read from a file
    let file_jsonl = Jsonl::from_path("data.jsonl").await?;

    Ok(())
}

๐ŸŽฏ Advanced Features for Power Users

The library also includes some advanced features I found myself needing repeatedly:

Line Counting Without Full Processing:

use async_jsonl::{Jsonl, JsonlReader};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let jsonl = Jsonl::from_path("massive_file.jsonl").await?;
    let count = jsonl.count().await?;

    println!("File contains {} records", count);
    Ok(())
}

The Technical Journey

Building this library taught me a lot about async Rust and streaming APIs. Some key decisions I made:

  1. Tokio as the Foundation: Built on tokio::io for maximum compatibility with the async ecosystem

  2. Stream-based Architecture: Using futures's Stream trait for composable, lazy processing

  3. Zero-copy Where Possible: Minimizing allocations while maintaining safety

  4. Comprehensive Error Types: Using anyhow for ergonomic error handling without sacrificing information

Real-World Performance

In my testing with large datasets (>1GB JSONL files), async-jsonl consistently shows:

  • Memory usage: Constant regardless of file size

  • Throughput: Competitive with the fastest JSONL parsers in any language

  • CPU efficiency: Low overhead thanks to Rust's zero-cost abstractions

Try It Yourself!

The library is available on crates.io and the source is on GitHub. Here's how to get started:

[dependencies]
async-jsonl = "0.3.1"
tokio = { version = "1.0", features = ["full"] }
futures = "0.3"
serde = { version = "1.0", features = ["derive"] }
anyhow = "1.0"

I'd love to hear how you use it in your projects! Whether you're processing logs, handling data pipelines, or working with APIs, async-jsonl is designed to make your life easier.


What do you think? Have you worked with JSONL files before? What features would you find most useful in a JSONL processing library? Let me know in the comments below!

0
Subscribe to my newsletter

Read articles from Sandipsinh Rathod directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Sandipsinh Rathod
Sandipsinh Rathod