Sync Multiple Fabric Workspaces With GitHub Using Semantic Link Labs

Sandeep PawarSandeep Pawar
4 min read

I've been living on the edge. I use my personal Fabric trial capacity a lot for testing and learning. As the number of days left in the trial dwindles to single digits, I start praying to the Fabric gods to renew the trial. :D So far they have been renewing my capacity, so thank you for that. I have 80+ workspaces and hundreds, if not thousands, of items so I wanted to automatically create private Github repos for each workspace, sync it with the main branch and commit the items. I chose GitHub because that’s where I keep my personal projects. The process would be the same for Azure DevOps repos.

Pre-requisites:

  • GitHub Personal Access Token

    You will need a PAT to create a connection to the repos. I created a classic token to use it for all repos. Choose whatever is best for your scenario and limit the scope as required.

  • Connection Id

Once you have the PAT, create a cloud connection in Fabric to generate a connection Id. Choose Github -Source control as the connection type. This is under Settings > Manage Connections in Fabric.

  • You need to have an Admin of the workspaces you want to sync.

  • Enable GitHub and ADO in tenant settings.

  • Install Pygihub and Semantic Link Labs in Fabric Python notebook

Code:

Here is the logic, change it as needed:

  • Get a list of Fabric workspaces

  • Get a list of workspaces that are already git enabled

  • Make a list of workspaces that need to be sync’d

  • For each of the above workspaces:

    • create and initialize a repo in GitHub with a readme.md. I name the repo as fabric_lower_case_worskpace_name

    • create a connection to the above repo in Fabric using Semantic Link Labs

    • wait 60s

    • get the latest commit hash

    • get the ids of the items you want to commit. In my case, I only want to sync notebooks. Change this to include other supported items.

    • commit the items if items exist using Semantic Link Labs

Note that there can be delay in sync depending on the number of items, item types etc, so adjust the time.sleep(n) wait period based on your scenario. You can customize this further and make the logic more granular but for my case this works well. I just need it for backup more than anything else.

# !pip install PyGithub semantic-link-labs --q

from github import Github
import sempy.fabric as fabric
import pandas as pd
import time
import uuid
import sempy_labs as labs

# get github PAT
g = Github("github_pat_xxxxxx")
conn_id = "xxxxxxxxxxxxxx"

def create_and_initialize_github_repo(repo_name):
    user = g.get_user()
    try:
        repo = user.get_repo(repo_name)
        print(f"Repository '{repo_name}' already exists.")
    except Exception as e:
        repo = user.create_repo(
            name=repo_name,
            description="Repository for workspace " + repo_name,
            private=True
        )
        repo.create_file("README.md", "Initial commit", "# " + repo_name)
        print(f"Repository '{repo_name}' created and initialized with README.md.")
    return repo

def get_latest_commit_hash(repo_name):
    """ Fetch the latest commit hash from GitHub """
    owner = g.get_user().login
    repo = g.get_repo(f"{owner}/{repo_name}")
    commits = repo.get_commits()
    return commits[0].sha if commits.totalCount > 0 else None

# only Fabric workspaces
workspaces_df = fabric.list_workspaces().query('Type!="AdminInsights" and `Is On Dedicated Capacity`==True')
git_connections_df = labs.admin.list_git_connections()
# workspaces without git enabled
workspaces_without_git = workspaces_df[~workspaces_df["Id"].isin(git_connections_df["Workspace Id"])]

for index, row in workspaces_without_git.iterrows():
    workspace_name = row["Name"]
    formatted_repo_name = "fabric_" + workspace_name.replace(" ", "_").lower()

    # Create (or reuse) GitHub repository
    repo = create_and_initialize_github_repo(formatted_repo_name)

    # Resolve workspace details
    resolved_workspace_name= fabric.resolve_workspace_name(workspace_name)
    workspace_id = fabric.resolve_workspace_id(workspace_name)

    # Connect the workspace to GitHub
    labs.connect_workspace_to_github(
        owner_name=g.get_user().login,
        repository_name=formatted_repo_name,
        branch_name="main",
        directory_name="/",
        connection_id=conn_id,
        workspace=workspace_id
    )
    print(f"🟢 The '{resolved_workspace_name}' workspace has been connected to the '{formatted_repo_name}' GitHub repository.")

    # Wait until the Git connection is detected
    max_retries = 10
    retry_interval = 5  # seconds
    connection_ready = False
    for attempt in range(max_retries):
        current_connections = labs.admin.list_git_connections()
        if workspace_id in current_connections["Workspace Id"].values:
            print(f"Git connection detected for workspace '{resolved_workspace_name}'.")
            connection_ready = True
            break
        else:
            print(f"Waiting for Git connection to initialize for '{resolved_workspace_name}' (Attempt {attempt+1}/{max_retries})...")
            time.sleep(retry_interval)
    if not connection_ready:
        print(f"Git connection not initialized for workspace '{resolved_workspace_name}'. Skipping sync.")
        continue

    time.sleep(60)
    remote_commit_hash = labs.initialize_git_connection(workspace=workspace_id)

    print(f"🟢 Git connection initialized. Remote commit hash: {remote_commit_hash}")

    time.sleep(60)  # Adjust timing as needed

    # Get item ids and commit only the Notebooks
    print("Fetching all item IDs for commit...")
    workspace_items = fabric.list_items(workspace=workspace_id).query('Type=="Notebook"')
    item_ids = list(workspace_items["Id"])

    if not item_ids:
        print(f"❌ No items found in workspace '{resolved_workspace_name}'. Skipping commit.")
        continue

    print(f"🟢 Found {len(item_ids)} items to commit.")

    # Commit specified items to Git
    try:
        labs.commit_to_git(
            comment="Sync workspace with main branch",
            item_ids=item_ids,
            workspace=workspace_id
        )
    except Exception as e:
        print(str(e))

    print(f"✅ Workspace '{resolved_workspace_name}' has been successfully committed to Git.")

Result:

Admittedly, getting this to work wasn’t very straightforward. So, if you run into issues, I won’t be surprised. If you do improve the code, I would love to know. Thanks !

0
Subscribe to my newsletter

Read articles from Sandeep Pawar directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Sandeep Pawar
Sandeep Pawar

Microsoft MVP with expertise in data analytics, data science and generative AI using Microsoft data platform.