Encoding and Evolution: Key Insights

The fourth chapter, the last chapter of part 1 of the DDIA book. Lessgo!

What does software do? It propagates the transfer of data, well, it is too simple of an explanation, there are logical layers in between to make the data go to the right place in the right form. What that right form is on you to decide.

There have been many formats in which one could transfer data, my favourite is JSON, why? Because it is human readable. :')

But data requirements keep changing due to new feature addition or refactor. How do you maintain the consistency of data then? Old code cannot read newer data. Newer code cannot read older data.

There are two keywords for this:

Backward Compatibility - newer code can read older data written by older code
Forward Compatibility - older code can read newer data written by newer code

Back-populating helps. But a programmer needs to make plans before releasing the changes which might break backward/forward compatibility to prod.

How to encode data? (You gotta interchange it right? So how do you encode it?)

in-memory objects - your garden variety data structures (arrays, structs, trees etc)
In-memory data that is converted to bytes so that it can be sent over the server.

Here the conversion of in-memory data to bytes is called encoding, serialisation, marshalling and other way around is called decoding, deserialisation, unmarshalling.

Avoid using language specific encoders like serialisers (java), pickle (python) because they tie you to a single language, they are bad with versioning, and there is a security vulnerability of remote arbitrary code execution.

The most popular encoding formats are JSON, XML and their binary forms.

JSON and XML don't support sending binary data. People use base64 encoding and pass data as a string, which increases the data size by 33%.

The goal is to keep the bytes to a lower size, there are many methods to do that but not everyone agrees to something common. There is thrift, protobuf that decrease the size of the data while maintaining schema. The author then talks about Avro, I won't be explaining it here, info about Avro is plentiful on the internet.

The book talks about modes of dataflow next:

data flow through dbs (Data outlives code)
dataflow through REST, SOAP, RPC
message passing (kafka, rabbitmq)

And I'm going to stop here.

DDIA - Chapter 4 - Encoding and Evolution - Thoughts and notes

Subscribe to my newsletter

Vivek Khatri

Vivek Khatri