Checking If Delta Table in Fabric is V-order Optimized

Sandeep PawarSandeep Pawar
2 min read

V-Order

V-order is a write time optimization to the parquet file format. When the delta table is created using any of the Fabric engines (Dataflow Gen2, Spark notebooks, Pipelines, DWH), Delta tables are automatically are V-Order'd. This not only helps with size of the table but can significantly improve Direct Lake dataset read performance. While it's not required, Delta tables with V-order are highly recommended for fastest Direct Lake and Delta read performance. For more ob V-order, read this official article.

However, there is no direct way to identify if a table already has V-order or not. There three ways to check but let me show two easy ways. I will cover the third in a future blog post. When a Delt atable is created, the transaction logs have a metadata property related to V-order.

I created two Delta tables in a Fabric lakehouse, one with and the other without the V-order by changing the spark configuration.

Manually

You can manually inspect the transaction logs in the Lakehouse by going to Lakehouse > Table name > Right Click > View Files > _delta_log, inspect the latest .json file. In the transaction logs you will either see VORDER set to false or missing if the table is not V-Order'd. If the Delta table is V-order, the property is set to true .

Programmatically

If you have several Delta tables, many transaction logs or if you need to check it as a part of your validation process for DataOps, you can use pyarrow to check the schema metadata. This only checks the metadata so the table is not required to be read. Below is the Python code I used:

def check_vorder(table_name_path):
    '''
    Author: Sandeep Pawar | fabric.guru |  Jun 6, 2023

    Provide table_name_path as '//lakehouse/default/Tables/<table_name>'
    If the Delta table is V-ordered, returns true; otherwise, false.

    You must first mount the lakehouse to use the local filesystem API.
    '''
    import os 

    if not os.path.exists(table_name_path):
        print(f'{os.path.basename(table_name_path)} does not exist')
        result = None  # Initialize the variable with a default value

    else:
        import pyarrow.dataset as ds
        schema = ds.dataset(table_name_path).schema.metadata
        is_vorder = any(b'vorder' in key for key in schema.keys())
        if is_vorder:
            result = str(schema[b'com.microsoft.parquet.vorder.enabled'])
        else:
            result = "Table is not V-ordered"

    return result

There is another robust method using Spark that can be used to detect parquet files in the Delta table that are non V-Order'd but I will cover that in a future blog post.

2
Subscribe to my newsletter

Read articles from Sandeep Pawar directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Sandeep Pawar
Sandeep Pawar