Create Shortcuts with PySpark in Microsoft Fabric

In today’s fast-evolving data landscape, managing shortcuts efficiently is critical for ensuring smooth access to your data, when you are working with Microsoft Fabric. As your data ecosystem grows, automating processes like MS Fabric Shortcuts creation becomes essential. This guide will explore how you can manage these tasks using Pyspark Notebooks and Semantic Link Labs.

Whether you're moving shortcuts between environments or setting up shortcuts in bulk, you're in the right place. Let’s get started! 🚀

Shoutout to Semantic Link Labs 🙌

Before we dive into the scenarios, I want to give a big shoutout to Michael Kovalsky & Contributing team for creating Semantic Link Labs. Among many other plethora of functions, this has made things much easier to manage. Thanks, Michael! 👏

I'll skip the basics of what shortcuts are and it’s inner workings ,will jump straight to the point i.e. “Bulk creation of Shortcuts”. If you need a fundamental overview of Shortcuts, you can find it here.

Parameters and Configuration

To ensure a smooth shortcut migration or bulk creation process, you'll need the following parameters:

source_lakehouse: The name of the source lakehouse (or environment) where the tables reside.
source_workspace: The workspace associated with the source lakehouse (or environment).
destination_lakehouse: The name of the destination lakehouse where shortcuts will be created.
destination_workspace: The workspace associated with the destination.
prefix: (Optional) A prefix to prepend to shortcut names for better organization and clarity.

These parameters ensure the scripts know exactly where to retrieve the data from and where to create the shortcuts.

Why Shortcuts? 🔄

Shortcuts are an incredibly useful tool when managing data environments like lakehouses. They are essentially metadata pointers that provide access to tables without duplicating data. The advantages of using shortcuts include:

Data Sharing: Shortcuts allow you to share and access data across environments, whether it's lakehouses, workspaces, or other systems.
Organizational Clarity: Shortcuts help you organize your tables better, making large datasets more manageable.
Reducing Redundancy: They prevent unnecessary duplication of data, saving valuable storage resources.

By efficiently managing shortcuts, you can significantly streamline your workflows, especially in complex or large-scale migrations.

Scenario 1 : Automated Shortcut Migration with Validation ✅

Overview

In this scenario, we're migrating shortcuts from a source environment (lakehouse/workspace) to a destination one. It involves making sure that:

No Duplicate Shortcuts: Shortcuts that already exist in the destination are not recreated.
Naming Conventions: You can optionally prepend or append prefixes or suffixes to ensure naming consistency.
Error Handling: The script gracefully handles any errors, ensuring the migration process doesn’t fail if something goes wrong.

In my case, the tables in my source lakehouse had different prefixes, which required me to modify them with a different prefix or suffix when creating the shortcuts in the destination. This ensures consistency across different environments.

Semantic Link Labs Functions Used:

sempy_labs.list_shortcuts
sempy_labs.lakehouse.create_shortcut_onelake
Other functions and code snippets are mainly to handle the failures and code sanitization.

Code Implementation (First install Semantic Link Labs - %pip install semantic-link-labs)

 # Author Nalaka
import sempy_labs
import pandas as pd
from typing import Dict, Tuple
import time

def safe_api_call(func, retries=3, delay=2, **kwargs):
    """
    Safely calls a function with retry logic.
    """
    for attempt in range(retries):
        try:
            return func(**kwargs)
        except Exception as e:
            print(f"Attempt {attempt + 1} failed: {e}")
            if attempt < retries - 1:
                time.sleep(delay)
            else:
                raise

def create_shortcuts_batch(
    source_lakehouse: str,
    source_workspace: str,
    destination_lakehouse: str,
    destination_workspace: str,
    prefix: str = ""
) -> Dict[str, Tuple[bool, str]]:
    """
    Create shortcuts automatically, ensuring the correct table name is used from the OneLake Path column.
    """
    results = {}

    print("Fetching shortcuts from source workspace...")
    shortcuts_df = safe_api_call(
        sempy_labs.list_shortcuts,
        lakehouse=source_lakehouse,
        workspace=source_workspace
    )

    # Ensuring necessary columns
    if shortcuts_df.empty or 'Shortcut Name' not in shortcuts_df.columns or 'OneLake Path' not in shortcuts_df.columns:
        print("Missing required data columns ('Shortcut Name' or 'OneLake Path').")
        return results

    # Fetch existing shortcuts in the destination workspace
    print("Fetching existing shortcuts from destination workspace...")
    destination_shortcuts_df = safe_api_call(
        sempy_labs.list_shortcuts,
        lakehouse=destination_lakehouse,
        workspace=destination_workspace
    )
    existing_shortcuts = set(destination_shortcuts_df['Shortcut Name']) if not destination_shortcuts_df.empty else set()

    # Iterate through shortcut names
    print("Processing shortcuts...")
    for _, row in shortcuts_df.iterrows():
        try:
            # Extract actual table name from OneLake Path (removing 'Tables/' prefix)
            actual_table_name = row['OneLake Path'].removeprefix('Tables/') #Change accordingly

            # Prepare shortcut name with optional prefix
            new_shortcut_name = f"{prefix}{row['Shortcut Name']}"

            # Skip if the shortcut already exists
            if new_shortcut_name in existing_shortcuts:
                results[row['Shortcut Name']] = (True, f"Shortcut '{new_shortcut_name}' already exists. Skipped.")
                continue

            # Create shortcut using details from the row
            safe_api_call(
                sempy_labs.lakehouse.create_shortcut_onelake,
                table_name=actual_table_name, 
                source_lakehouse=row['Source Item Name'],  # Source lakehouse from the row
                source_workspace=row['Source Workspace Name'],  # Source workspace from the row
                destination_lakehouse=destination_lakehouse,
                destination_workspace=destination_workspace,
                shortcut_name=new_shortcut_name
            )

            success_msg = (
                f"Successfully created shortcut: {new_shortcut_name}\n"
                f"Source: {row['Source Workspace Name']}/{row['Source Item Name']}\n"
                f"Destination: {destination_workspace}/{destination_lakehouse}"
            )
            results[row['Shortcut Name']] = (True, success_msg)

        except Exception as e:
            # Log the error with detailed information
            error_details = f"Shortcut: {row['Shortcut Name']}\n" \
                            f"Source Lakehouse: {row['Source Item Name']}\n" \
                            f"Source Workspace: {row['Source Workspace Name']}\n" \
                            f"Error: {str(e)}"
            results[row['Shortcut Name']] = (False, error_details)
            print(f"Failed to process: {error_details}")

    return results

def print_results_summary(results: Dict[str, Tuple[bool, str]]) -> None:
    """
    Print detailed summary of shortcut creation results.
    """
    print("\n=== EXECUTION SUMMARY ===")
    print(f"Total shortcuts processed: {len(results)}")

    successful = sum(1 for status, _ in results.values() if status)
    failed = len(results) - successful

    print(f"Successful: {successful}")
    print(f"Failed: {failed}")

    if failed > 0:
        print("\n=== FAILED OPERATIONS ===")
        for table, (status, message) in results.items():
            if not status:
                print(f"\nTable: {table}")
                print(message)

    if successful > 0:
        print("\n=== SUCCESSFUL OPERATIONS ===")
        for table, (status, message) in results.items():
            if status:
                print(f"\nTable: {table}")
                print(message)

if __name__ == "__main__":
    config = {
        # Source location to list shortcuts
        "source_lakehouse": "SRC_LH_00",
        "source_workspace": "SRC_WS_00",

        # Destination location for new shortcuts
        "destination_lakehouse": "TGT_LH_00",
        "destination_workspace": "TGT_WS_00",

        "prefix": ""  # Optional prefix for destination shortcut name
    }

    # Measure execution time
    start_time = time.time()

    print("Starting shortcut creation process...")
    results = create_shortcuts_batch(**config)

    end_time = time.time()
    print_results_summary(results)

    print(f"\nExecution time: {end_time - start_time:.2f} seconds")

Explanation

This script helps:

Fetch Shortcuts: Retrieves the shortcuts from the source and checks for any that already exist in the destination.
Validation and Renaming: It ensures that no duplicates are created, with the option to add prefixes for easy organization. (Removed prefix named “Tables”)
Error Handling: Uses retry logic with the safe_api_call function to handle temporary issues, ensuring smooth migration.
Create Shortcuts: Finally, it loops through the list of source shortcuts and creates them at the destination.

Use Case

Ideal for migrating shortcuts between environments (e.g., from Dev/Test to production).
Helpful when maintaining naming consistency during the migration process.

What Does This Scenario Solve?

This process is perfect for environments where shortcuts need to be moved between different workspaces while maintaining data integrity and ensuring that shortcuts are not duplicated. It ensures that:

The right shortcuts are created with the right names.
Any existing shortcuts that meet the same criteria are not duplicated.
Errors are handled through retry mechanisms to prevent interruptions.

It’s a flexible way to migrate shortcuts without unnecessary manual intervention.

Sample output of the above code