Delta Lake(.Parquet) vs JSON Formats for storage
Table of contents
Introduction
Fast storage and retrieval of data are vital for maintaining a competitive edge, enhancing user experience, and facilitating efficient decision-making, especially in a fast-paced digital environment where responsiveness and scalability are paramount.
These are some of my observations while working with these file format's.
Why Json files?
JavaScript Object Notation (JSON) is a file format that uses human-readable text to store and transmit data objects
Easy to use: Json objects can be easily created and used.
Widely accepted: Json objects are used in almost all the services and supported by most of the programming languages.
Lesser complexity: They are less complex than parquet files.
Why delta parquet files?
Parquet format is made by Apache for fast data processing of complex data.
Columner format: It stores data in a columner format, which makes it easier to read.
Easy rollback: It allows for easy rollback in case of wrong data input.
Custom data partitioning: Allows to partition data based on unique column entries makes it easy to know about the data.
Compression : It compresses the data due to columner strorage ability
ACID properties: It follows and implements all the ACID properties.
Scalability: It can handle large amount of data efficently due to its scalable metadata handling and data versioning capabilities.
Below diagram shows how delta lake files and folders are structured.
Json files have data about when new data was added and new parquet file and json files are created on each insertion and the whole folder will be read(my_table).
Conclusion
When it comes to handling large scale data it's always best to go with delta lake table or the parquet format. On the other hand for small data, json is the better option.
Note: for any corrections do reach out to me using my socials present in the navbar.
Subscribe to my newsletter
Read articles from Aaron Jevil Nazareth directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Aaron Jevil Nazareth
Aaron Jevil Nazareth
I am developer from India, Developed multiple websites using nextjs and typescript. Currently exploring the machine learning field . Would love to learn as much as possible, Also take up some freelancing gigs occasionally.