Why Cloud Logs Use NDJSON Instead of JSON Arrays

Ramaprasath ARamaprasath A
3 min read

Introduction

A few days ago, I opened my Azure Function App log file. To my surprise, it didn’t look like a typical JSON file with square brackets and commas. Instead, it was line after line of JSON objects, like this:

{ "time": "2025-09-01T10:00:00Z", "level": "INFO", "message": "Function started" }
{ "time": "2025-09-01T10:05:00Z", "level": "ERROR", "message": "Timeout occurred" }

This format is called NDJSON (Newline Delimited JSON), and it’s everywhere in modern cloud platforms - Azure, AWS, GCP, Elastic Search, BigQuery, and more.

In this blog, let’s explore:

  • What NDJSON is.

  • Why cloud platforms use it for logs.

  • How it makes log processing scalable and efficient.

  • How to parse and convert NDJSON in code.

What is NDJSON?

NDJSON = Newline Delimited JSON. It’s just one JSON object per line.

JSON Array Example

[
  { "id": 1, "msg": "First log" },
  { "id": 2, "msg": "Second log" }
]

NDJSON Example

{ "id": 1, "msg": "First log" }
{ "id": 2, "msg": "Second log" }

NDJSON isn’t valid JSON as a whole file, but it’s valid JSON line by line.

Why Logs Use NDJSON Instead of JSON Arrays

Cloud logs aren’t stored as pretty JSON arrays for a reason.

Logs are Naturally Streaming Data

  • Logs are generated continuously, not in batches.

  • Each log event is independent.

Appending a Line is Cheaper & Faster

If logs were a JSON array:

  • You’d need to rewrite the file to add a comma + closing bracket.

  • Very expensive at scale.

With NDJSON:

  • Just append a new line.

  • Super efficient for high-volume logs.

Easy for Downstream Processing

  • Azure Log Analytics: NDJSON → stream-friendly.

  • BigQuery: NDJSON → row per line.

  • Elastic Search: NDJSON → bulk indexing format.

  • Datadog: NDJSON → native support.

Tools can process logs in parallel without parsing giant arrays.

Real-Time Example: Azure Function Logs

Azure Functions generate hourly logs in files like PT1H.json. Although the extension is .json, the content is NDJSON:

{ "time": "2025-09-01T10:00:00Z", "level": "INFO", "message": "Function started" }
{ "time": "2025-09-01T10:01:30Z", "level": "INFO", "message": "Database connected" }
{ "time": "2025-09-01T10:05:00Z", "level": "ERROR", "message": "Timeout error" }

This design choice makes logs stream-friendly and scalable.

Processing NDJSON in Code

Sometimes, you need to convert NDJSON into a valid JSON array.

JavaScript Example

const fs = require('fs');

const jsonArray = "[" + fs.readFileSync("logs.ndjson", "utf-8")
  .split("\n")
  .filter(Boolean)
  .join(",") + "]";

const parsed = JSON.parse(jsonArray);
console.log(parsed);

Python Example

import json

with open("logs.ndjson") as f:
    data = [json.loads(line) for line in f if line.strip()]

print(json.dumps(data, indent=2))

Open-Source Packages for NDJSON

Instead of writing your own parser, you can use existing libraries:

Benefits of NDJSON in Logging

  • Scalable → handle millions of log lines.

  • Streaming-friendly → no need to batch logs.

  • Cost-efficient → append-only, no rewriting files.

  • Tool-friendly → easy to integrate with Elastic, BigQuery, Azure Log Analytics.

Conclusion

Logs are not batch data -- they’re real-time event streams.
That’s why cloud platforms (Azure, AWS, GCP) store them as NDJSON instead of JSON arrays.

NDJSON makes logs faster to write, easier to process, and scalable for modern analytics systems.

So next time you open a .json log file that looks “different,” remember - it’s probably NDJSON in disguise, optimised for scale and streaming.

Thanks for reading!
I’d love to hear your thoughts - have you seen NDJSON in your systems, or do you still work with JSON arrays for logs? Drop your experiences in the comments.

Follow me here on Hashnode (and on LinkedIn) for more blogs about Distributed Systems, Cloud, and Agentic AI.

0
Subscribe to my newsletter

Read articles from Ramaprasath A directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Ramaprasath A
Ramaprasath A

I’m an engineer at heart with 8+ years of experience building distributed, scalable systems. Currently exploring Agentic AI and its potential to power the next generation of intelligent applications. I write to share what I learn, simplify complex concepts, and help others grow along the way - because knowledge shared is knowledge multiplied.