Understanding Serialization and the Need for Data Serialization Formats

RajRaj
4 min read

In the modern world of software development, applications are constantly exchanging data. Whether it's a mobile app fetching user details from a server, a microservice communicating with another service, or a database storing structured information, data transfer is a crucial aspect of how systems function. However, raw data is not always easy to transfer efficiently. This is where serialization comes in.

What is Serialization?

Serialization is the process of converting complex data structures—like objects, lists, or maps—into a format that can be easily stored or transmitted and later reconstructed into its original form. The goal of serialization is to ensure that data can move seamlessly between different systems, regardless of the programming language or platform they use.

Serialization is like taking apart your toy car, packing all the parts into a box with instructions on how to reassemble it, and then sending that box to your friend. When they receive it, they can unpack the box and rebuild the car exactly as it was.

In technical terms:

  • Serialization is converting an object (like a JavaScript object, Python dictionary, or a database entry) into a format that can be easily stored or transmitted (like JSON, XML, or binary data).

  • Deserialization is the process of unpacking that data and rebuilding the original object.

JSON VS PROTOBUF

Here, we'll discuss two widely used data serialization techniques: JSON and Protocol Buffers (ProtoBuf).

1. JSON

JSON (JavaScript Object Notation) is a lightweight, human-readable format commonly used for data exchange. It represents data as key-value pairs, making it easy to understand and debug. However, JSON can be inefficient in terms of storage and processing due to its textual nature and redundant key-value structure.

For example, consider:

package main

import (
    "encoding/json"
    "fmt"
    "log"
)

type User struct {
    ID    int    `json:"id"`
    Name  string `json:"name"`
    Email string `json:"email"`
}

func main() {
    user := User{
        ID:    1,
        Name:  "John Doe",
        Email: "john.doe@example.com",
    }

    jsonData, err := json.Marshal(user)
    if err != nil {
        log.Fatal(err)
    }

    fmt.Println(string(jsonData)) 
}

JSON Output: {"id":1,"name":"John Doe","email":"john.doe@example.com"}

JSON serialization requires converting everything into a string representation before it is transmitted as binary data (0s and 1s). On the receiving end, the client must repeat the process by parsing the JSON back into a struct.

While JSON's readability makes it ideal for scenarios where human interaction is needed, it comes at the cost of performance due to extensive string manipulation and redundant data storage.

Therefore, JSON is best suited for applications where ease of debugging is prioritized over efficiency.


2. PROTOBUFS

ProtoBuf (Protocol Buffers) is a highly efficient data serialization format designed for direct binary transmission. Unlike JSON, which requires conversion into a string format before transmission, ProtoBuf stores data in a compact binary representation, eliminating the need for extra transformation.

This efficiency comes from its schema-based structure, where only field numbers are stored instead of field names, significantly reducing the overall data size compared to JSON or XML. As a result, both serialization and deserialization are much faster, making ProtoBuf an ideal choice for high-performance applications where speed and efficiency are critical.

One of ProtoBuf's key advantages is its ability to work seamlessly across multiple programming languages, including Go, Python, Java, C++, JavaScript, and more. The process begins with defining the data structure in a .proto file, which acts as a schema for the data. This schema specifies message types, field numbers, and data types in a structured format.

For example, a simple ProtoBuf schema for a User message might look like this:

syntax = "proto3";

message User {
  int32 id = 1;
  string name = 2;
  string email = 3;
}

Once the schema is defined, the ProtoBuf compiler (protoc) is used to generate language-specific code. The compiler translates the .proto file into source code that provides automatic serialization, deserialization, and data access methods.

To use ProtoBuf in a specific language, the appropriate compiler plugin must be installed. Here’s how code generation works for some popular languages:

protoc --go_out=. --go_opt=paths=source_relative user.proto

Generates a .pb.go file, which includes Go struct definitions and helper methods for encoding/decoding.

Once compiled, applications can use the generated code to serialize data into binary format for transmission or store it efficiently. Similarly, receiving applications can decode the binary message back into structured objects with minimal processing overhead.

The serialized data is already in a compact binary format, making transmission faster and more efficient. However, unlike JSON, which is human-readable but less optimized for speed, ProtoBuf's binary structure prioritizes performance over readability.

0
Subscribe to my newsletter

Read articles from Raj directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Raj
Raj