Understanding PostgreSQL JSON Usage and Shared Detoast Optimization

Application Scenarios of JSONB

PostgreSQL has two data types related to JSON: JSON and JSONB. This article introduces JSONB which is also recommended for you.

Suppose that all records in a table have attributes attr1..attr10, but only some records have attributes attr11..attr20, and attributes attr21 and attr22 may be added in the future. Creating a table with a JSON column would be very suitable in this case.

create table t1 (attr1 numeric,  attr2 int, ..  attr10 text, extra jsonb);

The data of attr11..attr20 is stored as key-value pairs in the extra attribute. If attr21 and attr22 need to be added in the future, they can be directly stored in the extra attribute without any modification to the table structure.

In addition to using the JSON data type, other solutions are available. For example, set extra to a text/binary type, and serialize and deserialize on the client. The drawback of this approach is that the data needs to be transferred to the client for computation, which prevents the use of more advanced scanning methods and increases network transmission. However, using JSON/JSONB can avoid these issues.

select extra->'attr11' from t1 where extra->>'att20' = 'apple';

We can filter and project data on the server to reduce network transmission. In addition, we can also create an index in extra->>'att20' to accelerate the filter conditions.

Another extreme design is to store all the attributes in a JSONB type and create a table structure as follows:

create table t2(data jsonb)

This solution is also not optimal, because all records need to record the attributes from attr1 to attr10 repeatedly. For example, for data in the same row, the storage methods of t1 and t2 are as follows:

t1:  1|2|3|..|'20'|{'attr21': 2, 'attr22': 3}
t2:  {'attr1': 1, 'attr2': 2, 'attr3': 3, ... 'attr21': 2, 'attr22': 3}

The design of t2 will result in storage and network transmission overhead. Both of these two problems can be mitigated by compression, but compression/decompression comes at a cost.

PostgreSQL supports the simultaneous use of JSONB data type and traditional data types, which is also a major feature, and leveraging this flexibility can bring significant benefits. Let's continue with the previous example: Suppose that later we find that attr11 exists in all records, so we can modify the table structure to:

create table t1 (attr1 numeric,  attr2 int, ..  attr10 text, attr11 text,  extra jsonb);

Then, just a few minor changes to the application are needed. Compared with the database that only supports document type, this modification does not require the application to add new data sources or manage new connections but only needs to modify the access mode of attr11.

Shared Detoast Datum

For a JSON data type, the toast capability should be used even if the data is slightly larger. Take the data of t1 as an example. The general working principle can be simplified as follows:

t1: 
  1|2|3|..|{pointer-x}

pg_toast_{t1}
  pointer-x| 1 | {attr11:.., attr12: ...}
  pointer-x| 2 | {attr16:.., attr20: ...}

Only when we need to access the real value of the pointer-x, do we "assemble" the value of {attr11: 11, ..., attr20: 20}, and this process is called detoast, which is also relatively resource-consuming.

For the following SQL:

select extra->'attr11'  from t1 where  extra->>'att20'  = 'apple';

Both extra->>'att20' and extra->'attr11' need to access the complete extra data. In the native PostgreSQL, the detoast is performed twice. The following statement will detoast 4 times. When extra is slightly larger, a lot of time will be spent on the detoast process. shared detoast datum is designed to solve this problem. For the same data, we will only detoast once.

select extra->'attr11', extra->'attr12' 
from t1
where extra->>'att20'  = 'apple' 
  and extra->>'attr19' = 'cloud';

Introduction to PolarDB for PostgreSQL

PolarDB for PostgreSQL is a cloud-native relational database service developed by Alibaba Cloud. It is 100% compatible with PostgreSQL and highly compatible with Oracle syntax (supported by the public cloud edition). It utilizes a Shared-Storage architecture, decoupling computing from storage. PolarDB for PostgreSQL offers ultrahigh elasticity, millisecond-level latency, HTAP capabilities, and enterprise-level database features such as high reliability, high availability, and elastic scaling. Additionally, with large-scale parallel computing capability, it can manage mixed OLTP and OLAP workloads.

Discover what and how you can use our products to build >>

Introduction to Application Scenarios of PostgreSQL JSON and Optimization of Shared Detoast

Application Scenarios of JSONB

Shared Detoast Datum

Introduction to PolarDB for PostgreSQL

Subscribe to my newsletter

Lara Lee

Lara Lee