Rust DataFrame Alternatives to Polars: Meet Elusion v4.0.0

The Rust ecosystem has seen tremendous growth in data processing libraries, with Polars leading the charge as a blazingly fast DataFrame library.

However, a new contender has emerged that takes a fundamentally different approach to data engineering and analysis: Elusion.

While Polars focuses on pure performance and memory efficiency with its Apache Arrow-based columnar engine, Elusion positions itself as equaly dedicated for performance and memory efficiency, also with Appache Arrow and DataFusion, as well as a comprehensive data engineering platform that prioritizes flexibility, ease of use, and integration capabilities alongside high performance.

Architecture Philosophy: Different Approaches to the Same Goals

Polars: Performance-First Design

Polars is written from scratch in Rust, designed close to the machine and without external dependencies. It's based on Apache Arrow's memory model, providing very cache efficient columnar data structures and focuses on:

Ultra-fast query execution with SIMD optimizations Memory-efficient columnar processing Lazy evaluation with query optimization Streaming for out-of-core processing

Elusion: Flexibility-First Design

Elusion takes a different approach, prioritizing developer experience and integration capabilities:

  • Core Philosophy: "Elusion wants you to be you!"

Unlike traditional DataFrame libraries that enforce specific patterns, Elusion offers flexibility in constructing queries without enforcing specific patterns or chaining orders. You can build your queries in ANY SEQUENCE that makes sense to you, writing functions in ANY ORDER, and Elusion ensures consistent results regardless of the function call order.

Loading files into DataFrames:

Regular Loading: ~4.95 seconds for complex queries on 900k rows

CustomDataFrame::new()

Streaming Loading: ~3.62 seconds for the same operations

CustomDataFrame::new_with_stream()

Performance improvement: 26.9% faster with streaming approach

  • Polars approach:
let df = LazyFrame::scan_csv("data.csv", ScanArgsCSV::default())?
    .filter(col("amount").gt(100))
    .select([col("customer"), col("amount")])
    .collect()?;
  • Elusion approach - flexible ordering:
let df = CustomDataFrame::new("data.csv", "sales").await?
    .filter("amount > 100")           
    .select(["customer", "amount"]) 
    .elusion("result").await?;

// Or reorder as you find fit - same result
let df = CustomDataFrame::new("data.csv", "sales").await?
    .select(["customer", "amount"])   // Select first
    .filter("amount > 100")           // Filter second
    .elusion("result").await?;

Polars Basic file loading:

let df = LazyFrame::scan_csv("data.csv", ScanArgsCSV::default())?
    .collect()?;

// Parquet with options
let df = LazyFrame::scan_parquet("data.parquet", ScanArgsParquet::default())?
    .collect()?;

Elusion Data Loading - Comprehensive Sources:

use elusion::prelude::*;

Local files with auto-recognition

let df = CustomDataFrame::new("data.csv", "sales").await?;
let df = CustomDataFrame::new("data.xlsx", "sales").await?;  // Excel support
let df = CustomDataFrame::new("data.parquet", "sales").await?;

Streaming for large files (currently only supports .csv files)

let df = CustomDataFrame::new_with_stream("large_data.csv", "sales").await?;

Load entire folders

let df = CustomDataFrame::load_folder(
    "/path/to/folder",
    Some(vec!["csv", "xlsx"]), // Filter file types or `None` for all types
    "combined_data"
).await?;

Azure Blob Storage (currently supports csv and json files)

let df = CustomDataFrame::from_azure_with_sas_token(
    "https://account.blob.core.windows.net/container",
    "sas_token",
    Some("folder/file.csv"), //or keep `None` to take everything from folder
    "azure_data"
).await?;

SharePoint

let df = CustomDataFrame::load_from_sharepoint(
    "tenant-id",
    "client-id", 
    "https://company.sharepoint.com/sites/Site",
    "Documents/data.xlsx",
    "sharepoint_data"
).await?;

REST API to DataFrame

let api = ElusionApi::new();

api.from_api_with_headers(
    "https://api.example.com/data",
    headers,
    "/path/to/output.json"
).await?;

let df = CustomDataFrame::new("/path/to/output.json", "api_data").await?;

Database connections

let postgres_df = CustomDataFrame::from_postgres(&conn, query, "pg_data").await?;

let mysql_df = CustomDataFrame::from_mysql(&conn, query, "mysql_data").await?;

Polars: Structured Approach

Polars requires logical ordering

let result = df
    .lazy()
    .filter(col("amount").gt(100))
    .group_by([col("category")])
    .agg([col("amount").sum().alias("total")])
    .sort("total", SortMultipleOptions::default())
    .collect()?;

Elusion: Any-Order Flexibility

All of these produce the same result:

Traditional order:

let result1 = df
    .select(["category", "amount"])
    .filter("amount > 100")
    .agg(["SUM(amount) as total"])
    .group_by(["category"])
    .order_by(["total"], ["DESC"])
    .elusion("result").await?;

Filter first

let result2 = df
    .filter("amount > 100")
    .agg(["SUM(amount) as total"])
    .select(["category", "amount"])
    .group_by(["category"])
    .order_by(["total"], ["DESC"])
    .elusion("result").await?;

Aggregation first

let result3 = df
    .agg(["SUM(amount) as total"])
    .filter("amount > 100")
    .group_by(["category"])
    .select(["category", "amount"])
    .order_by(["total"], ["DESC"])
    .elusion("result").await?;

All produce identical results!

Advanced Features: Where Elusion Shines

  • Built-in Visualization and Reporting Create interactive dashboards
let plots = [
    (&line_plot, "Sales Timeline"),
    (&bar_chart, "Category Performance"),
    (&histogram, "Distribution Analysis"),
];

let tables = [
    (&summary_table, "Summary Stats"),
    (&detail_table, "Transaction Details")
];

CustomDataFrame::create_report(
    Some(&plots),
    Some(&tables),
    "Sales Analysis Dashboard",
    "dashboard.html",
    Some(layout_config),
    Some(table_options)
).await?;
  • Automated Pipeline Scheduling Schedule data engineering pipelines
let scheduler = PipelineScheduler::new("5min", || async {
    // Load from Azure
    let df = CustomDataFrame::from_azure_with_sas_token(
        azure_url, sas_token, Some("folder/"), "raw_data"
    ).await?;

    // Process data
    let processed = df
        .select(["date", "amount", "category"])
        .agg(["SUM(amount) as total", "COUNT(*) as transactions"])
        .group_by(["date", "category"])
        .order_by(["date"], ["ASC"])
        .elusion("processed").await?;

    // Write results
    processed.write_to_parquet(
        "overwrite",
        "output/processed_data.parquet",
        None
    ).await?;

    Ok(())
}).await?;

Advanced JSON Processing Can handle complex JSON structures with Arrays and Objects

let df = CustomDataFrame::new("complex_data.json", "json_data").await?;

If you have json fields/columns in your files you can explode them:

  • Extract simple JSON fields:
let simple = df.json([
    "metadata.'$timestamp' AS event_time",
    "metadata.'$user_id' AS user",
    "data.'$amount' AS transaction_amount"
]);
  • Extract from JSON arrays:
let complex = df.json_array([
    "events.'$value:id=purchase' AS purchase_amount",
    "events.'$timestamp:id=login' AS login_time",
    "events.'$status:type=payment' AS payment_status"
]);

When to Choose Which

  • Choose Polars When: Pure performance is the top priority You prefer structured, optimized query patterns Memory efficiency is critical You need minimal dependencies

  • Choose Elusion When: You need integration flexibility (cloud storage, APIs, databases) Developer experience and query flexibility matter You want built-in visualization and reporting You need automated pipeline scheduling Working with diverse data sources (Excel, SharePoint, REST APIs) You prefer intuitive, any-order query building

Installation and Getting Started

  • Polars
[dependencies]
polars = { version = "0.50.0", features = ["lazy"] }
  • Elusion [dependencies]
elusion = "4.0.0"
tokio = { version = "1.45.0", features = ["rt-multi-thread"] }
  • Elusion With specific features
elusion = { version = "4.0.0", features = ["dashboard", "azure", "postgres"] }

Rust version requirement:

Polars: >= 1.80
Elusion: >= 1.81

Real-World Example: Sales Data Analysis Polars Implementation:

use polars::prelude::*;

let df = LazyFrame::scan_csv("sales.csv", ScanArgsCSV::default())?
    .filter(col("amount").gt(100))
    .group_by([col("category")])
    .agg([
        col("amount").sum().alias("total_sales"),
        col("amount").mean().alias("avg_sale"),
        col("customer_id").n_unique().alias("unique_customers")
    ])
    .sort("total_sales", SortMultipleOptions::default().with_order_descending(true))
    .collect()?;

println!("{}", df);

Elusion Implementation: use elusion::prelude::*;

#[tokio::main]
async fn main() -> ElusionResult<()> {
    // Load data (flexible source)
    let df = CustomDataFrame::new("sales.csv", "sales").await?;

    // Build query in any order that makes sense to you
    let analysis = df
        .filter("amount > 100")                
        .agg([                                    
            "SUM(amount) as total_sales",
            "AVG(amount) as avg_sale", 
            "COUNT(DISTINCT customer_id) as unique_customers"
        ])
        .group_by(["category"])                
        .order_by(["total_sales"], ["DESC"])       
        .elusion("sales_analysis").await?;

    // If you like to display result
       analysis.display().await?;

    // Create visualization
    let bar_chart = analysis.plot_bar(
        "category",
        "total_sales", 
        Some("Sales by Category")
    ).await?;

    // Generate report
    CustomDataFrame::create_report(
        Some(&[(&bar_chart, "Sales Performance")]),
        Some(&[(&analysis, "Summary Table")]),
        "Sales Analysis Report",
        "sales_report.html",
        None,
        None
    ).await?;

    Ok(())
}

Conclusion

Elusion v4.0.0 represents a paradigm shift in DataFrame libraries, prioritizing developer experience, integration flexibility, and comprehensive data engineering capabilities. The choice between Polars and Elusion depends on your priorities:

For raw computational performance and memory efficiency: Polars For comprehensive data engineering with flexible development: Elusion

Elusion's "any-order" query building, extensive integration capabilities, built-in visualization, and automated scheduling make it particularly attractive for teams that need to work with diverse data sources and want a more intuitive development experience. Both libraries showcase the power of Rust in the data processing space, offering developers high-performance alternatives to traditional Python-based solutions. The Rust DataFrame ecosystem is thriving, and having multiple approaches ensures that different use cases and preferences are well-served.

Try Elusion v4.0.0 today:

cargo add elusion@4.0.0

For more information and examples, visit the Elusion github repository: Elusion repository and join the growing community of Rust data engineers who are discovering the flexibility and power of any-order DataFrame operations.

0
Subscribe to my newsletter

Read articles from Borivoj Grujicic directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Borivoj Grujicic
Borivoj Grujicic