Analyzing Semantic Model Logs Using Fabric Workspace Monitoring

Sandeep PawarSandeep Pawar
12 min read

Workspace Monitoring was one of my favorite announcements at MS Ignite ‘24 this week. It logs events from Fabric items such as Semantic Models, Eventhouse, GraphQL to a KQL database that’s automatically provisioned and managed in that workspace. Currently it’s limited to these three items but hopefully other (especially spark and pipelines) will be added soon. Read the announcement by Varun Jain (PM, Microsoft) on this for details. A few additional points/thoughts:

  • Currently it’s by per workspace but it will be available per capacity in the future (reference)

  • By default the retention period is 30 days but will be configurable in the future (reference)

  • The Eventhouse is read-only so you cannot change any policies, turn on OneLake availability or select which item or item types to track. You get all or nothing.

  • During the preview, users will not be charged but billing details will be available later. In my personal opinion, being a SaaS platform, it should be made available as a part of the platform or make it configurable to avoid hefty CU consumption especially for smaller capacities.

  • It only logs the activities and not CU consumption. Hopefully similar to Workspace Monitoring, we will have Capacity Monitoring to get the full picture of events and the capacity consumed at a granular level. This is a great start.

  • I am not aware of any F SKU restrictions.

  • Hopefully there will be more documentation and guidance from Microsoft for understanding the event types etc. In the Ignite session, a Power BI report was shown with all the details, I am looking forward to that. When this report becomes available, I will update my queries below to make sure they are aligned with MSFT’s official metrics.

In this blog, I will be focusing on the semantic model logs only and will cover other items in the future. Use the below queries as a starting point and refer to the official documentation, use your own experience to analyze the logs. You can save these KQL queries as query sets and reuse or build a Real Time Dashboard to track events and create alerts.

Ingestion Latency

Although the official blog says the events arrive within 10-15 min, in my tests I saw majority of the events were ingested <2 min. As the activities increase and the volume increases, it might get slower but it’s already very impressive.

// what's the latency of ingestion
SemanticModelLogs
| extend diff = datetime_diff('second', ingestion_time(), Timestamp)
| project diff
| summarize num_of_events =count() by bin(diff, 30)
| render columnchart

The Base Query

The KQL query below should cover 98.231% of cases to analyze semantic model performance, refresh details, activities, resource consumption, and more. I have parsed the EventText, ApplicationContext, and Identity columns, which are of dynamic type and contain useful details. I won't go into the specifics of these columns and their importance in this blog, but Chris Webb already has many blogs on this topic that are a treasure trove of information and insights. Also look at the column descriptions here.

// All logs for the last 7 days and parsed for extracting performance metrics
let result = SemanticModelLogs
| where Timestamp > ago(7d)
| extend app = tostring(parse_json(ApplicationContext))
| project   Timestamp, 
            ItemName,
            ItemKind,
            DatasetMode,
            OperationName,
            OperationDetailName,
            ApplicationContext,
            ApplicationName,
            EventText,
            Identity,
            ExecutingUser,
            WorkspaceName, 
            CapacityId, 
            datasetid = extract_json("$.DatasetId", app), 
            reportId = extract_json("$.Sources[0].ReportId", app), 
            visualId = extract_json("$.Sources[0].VisualId", app), 
            usersession = extract_json("$.Sources[0].HostProperties.UserSession", app)
| extend parsedEventText = parse_json(EventText)
| extend EventText_timeStart = tostring(parsedEventText.timeStart),
         EventText_timeEnd = tostring(parsedEventText.timeEnd),
         EventText_durationMs = toint(parsedEventText.durationMs),
         EventText_queryProcessingCpuTimeMs = toint(parsedEventText.queryProcessingCpuTimeMs),
         EventText_totalCpuTimeMs = toint(parsedEventText.totalCpuTimeMs),
         EventText_executionDelayMs = toint(parsedEventText.executionDelayMs),
         EventText_approximatePeakMemConsumptionKB = toint(parsedEventText.approximatePeakMemConsumptionKB),
         EventText_commandType = tostring(parsedEventText.commandType),
         EventText_discoverType = toint(parsedEventText.discoverType),
         EventText_PeakMemoryKB = toint(extract(@"PeakMemory:\s*(\d+)\s*KB", 1, EventText)),
         EventText_MashupCPUTime_ms = toint(extract(@"MashupCPUTime:\s*(\d+)\s*ms", 1, EventText)),
         EventText_MashupPeakMemory_KB = toint(extract(@"MashupPeakMemory:\s*(\d+)\s*KB", 1, EventText)),
         EventText_queryResultRows = toint(parsedEventText.queryResultRows),
         EventText_directQueryRequestCount = toint(parsedEventText.directQueryRequestCount),
         EventText_directQueryTotalRows = toint(parsedEventText.directQueryTotalRows),
         EventText_datasourceConnectionThrottleTimeMs = toint(parsedEventText.datasourceConnectionThrottleTimeMs),
         EventText_directQueryConnectionTimeMs = toint(parsedEventText.directQueryConnectionTimeMs),
         EventText_directQueryExecutionTimeMs = toint(parsedEventText.directQueryExecutionTimeMs),
         EventText_directQueryIterationTimeMs = toint(parsedEventText.directQueryIterationTimeMs),
         EventText_directQueryTotalTimeMs = toint(parsedEventText.directQueryTotalTimeMs),
         EventText_directQueryTimeoutMs = toint(parsedEventText.directQueryTimeoutMs),
         EventText_tabularConnectionTimeoutMs = toint(parsedEventText.tabularConnectionTimeoutMs)
| extend WaitTimeMs = toint(extract(@"WaitTime:\s*(\d+)\s*ms", 1, EventText))
| extend upnj = tostring(parse_json(Identity))
| extend claims_upn= extract_json("$.claims.upn", upnj), effectiveclaims_upn= extract_json("$.effectiveClaims.upn", upnj)
| project Timestamp, ItemName, ItemKind, DatasetMode, OperationName, OperationDetailName, ApplicationName, EventText, ExecutingUser,claims_upn, effectiveclaims_upn,  WorkspaceName, CapacityId, datasetid, reportId, visualId, usersession, 
          EventText_timeStart, EventText_timeEnd, EventText_durationMs, EventText_queryProcessingCpuTimeMs, EventText_totalCpuTimeMs, EventText_executionDelayMs, EventText_approximatePeakMemConsumptionKB, EventText_commandType, EventText_discoverType, EventText_PeakMemoryKB, EventText_MashupCPUTime_ms, EventText_MashupPeakMemory_KB, EventText_queryResultRows, EventText_directQueryRequestCount, EventText_directQueryTotalRows, EventText_datasourceConnectionThrottleTimeMs, EventText_directQueryConnectionTimeMs, EventText_directQueryExecutionTimeMs, EventText_directQueryIterationTimeMs, EventText_directQueryTotalTimeMs, EventText_directQueryTimeoutMs, EventText_tabularConnectionTimeoutMs, WaitTimeMs;
result

Semantic Models That Are Refreshed But Never Queried

In Power BI, if a semantic model with scheduled refresh is not queried in 2 months, the refresh is automatically paused. In the below query, I attempt to find the import semantic models that are refreshed (scheduled or on demand) but have not been queried by any users in the last 30 days so we don't have to wait 2 months. I identify that by filtering activities by usersession column.

// queries last 30 days
let result = SemanticModelLogs
| where Timestamp > ago(30d)
| where isnotempty( ItemName)
| extend app = tostring(parse_json(ApplicationContext))
| project   Timestamp, 
            ItemName,
            DatasetMode,
            OperationName,
            ExecutingUser,
            usersession=extract_json("$.Sources[0].HostProperties.UserSession", app);

// no user sessions
let import_no_user_sessions = result 
| summarize count(usersession) by DatasetName=ItemName, usersession, DatasetMode
| where DatasetMode =="Import"
| summarize isEmptyCount = countif(isempty(usersession)), totalCount = count() by DatasetName
| where isEmptyCount == totalCount
| project DatasetName;

// datasets that have ProgressReportEnd indicating a refresh finished (successful or otherwise)
let refreshed_models = result
| where OperationName == "ProgressReportEnd" and isnotempty( ItemName)
| distinct DatasetName = ItemName;

// Inner join of the two above to get refreshed import datasets without user sessions
let refreshed_but_not_used = import_no_user_sessions
| join kind=inner (refreshed_models) on DatasetName
| project DatasetName;
refreshed_but_not_used

Who is querying the semantic model and how?

Using the ApplicationName we can identify how the users are querying, interacting with the model

// last 7 days
SemanticModelLogs
| where Timestamp > ago(7d) 
      and isnotempty(ApplicationName) 
      and isnotempty(ExecutingUser) 
      and isnotempty(ItemName)
| sort by Timestamp
| summarize sessioncount = dcount(XmlaSessionId) by ExecutingUser, datasetName = ItemName, ApplicationName
| sort by sessioncount

As you can expect from me, I queried a model 2439 times using Semantic Link 😁, PowerBIEIM is Power BI Enterprise Information Management, it’s used for sensitivity labels, lineage, monitoring etc. Below are some of the other Application Names:

You can see TE, Q&A, Report Server is ‘Analyze In Excel’, Web Modeling etc. This is excellent for identifying the medium used for querying the model.

Which reports are users accessing, when are they accessing them, and how often?

The logs show reportId of the report used to access the semantic model. Below I extract the reportId for each dataset and when it was accessed in 30 min bins. You can extend this further to get number of concurrent users by time.

// in the last 30 days
let result = SemanticModelLogs
| where Timestamp > ago(30d) and 
        isnotempty( ApplicationContext)
| extend app = tostring(parse_json( ApplicationContext))
| project   Timestamp, 
            WorkspaceName,
            ItemName,
            ItemKind,
            DatasetMode,
            EventText,
            ExecutingUser,
            CapacityId, 
            datasetid=extract_json("$.DatasetId", app), 
            reportId=extract_json("$.Sources[0].ReportId", app), 
            visualId=extract_json("$.Sources[0].VisualId", app), 
            usersession=extract_json("$.Sources[0].HostProperties.UserSession", app);
result 
| where isnotempty( ItemName) and isnotempty( reportId) and isnotempty( usersession)
| summarize  total_unique_sessions = dcount(usersession) by datasetName = ItemName, reportId, bin(Timestamp, 30m)
| sort by datasetName, total_unique_sessions

You can also use this pattern to identify the day of the week, hour of the day when the models/reports are accessed (or refreshed) for load balancing, demand/planning purposes.

// overall activity initiated by users, last 7 days
cluster_url = "https://<>.z0.kusto.fabric.microsoft.com"
database = "Monitoring Eventhouse"

kustoQuery = f"""
let result = SemanticModelLogs
| where Timestamp > ago(7d) and isnotempty( ItemName) and (isnotempty( ExecutingUser) and ExecutingUser != "Power BI Service")
| extend app = tostring(parse_json(ApplicationContext))
| project   Timestamp, 
            WorkspaceName,
            ItemName,
            ItemKind,
            ExecutingUser,
            usersession = extract_json("$.Sources[0].HostProperties.UserSession", app);
result            
| extend localdt = datetime_add('minute', -360, Timestamp)  // Convert UTC to Central
| summarize sessions = count() by actdate = format_datetime(localdt,'yy-MM-dd'), hour = format_datetime(localdt, 'HH')
| sort by actdate, hour, sessions asc  

"""
results = query_kusto(cluster_url, database, kustoQuery, token)
df = kusto_to_df(results)
heatmap_data = df.pivot(index="hour", columns="actdate", values="sessions")
sns.heatmap(heatmap_data, annot=False, fmt=".0f", cmap="coolwarm", cbar=False);

All DAX Queries and its duration

Below, I extract all the unique DAX queries executed (directly or indirectly via report), their median duration and median CPU duration, CPU duration as a %of total duration. You can identify queries by each report and visual, set alerts and optimize before users complain. Use the base query I shared above to get the visual Ids. In my previous blog, I shared how to get the visual ids for each report.

// All DAX queries executed in the last 7 days for a dataset called "Profit Analysis"
SemanticModelLogs
| where Timestamp > ago(7d)
      and ItemKind == "Dataset"
      and ItemName == "Profit Analysis"
      and OperationDetailName == "DAXQuery"
      and OperationName == "QueryEnd"
| extend CleanedDAX = replace_string(EventText, tostring(extract(@"(\[WaitTime:.*?ms\])", 1, EventText)), "")
| project DatasetName = ItemName, WorkspaceId, DurationMs, CpuTimeMs, CleanedDAX
| summarize median_duration = percentile(DurationMs, 50), median_cputime = percentile(CpuTimeMs, 50) by DatasetName, CleanedDAX, WorkspaceId
| extend pct_cpu = round(todouble(median_cputime) / todouble(median_duration),2)
| sort by pct_cpu, median_cputime, median_duration

This can also be used to pinpoint visuals/DAX that need optimization. Remember DAX is only one part of the optimization equation, you will still need to use performance analyzer to see all the other details and performance metrics. This would be one more tool in our toolchest.

Which columns are used ?

Chris Webb published a blog last week where he used Semantic Link Labs and CALCDEPENDENCY function to get all the columns used by a DAX. In the logs we get all the DAX queries. So let’s use Chris’s method and the magic of Semantic Link Labs to find all the columns touched in the last 30 days. Read Chris’s blog for details on how it works before you start removing unused columns.

I will query the monitoring Eventhouse using the API method I blogged about and use Semantic Link Labs to iterate over the DAX queries.

Using API:

Below method can be used in both Pyspark Notebooks and Python Notebooks (when it becomes available).

import requests
import json
import pandas as pd
from datetime import datetime

def query_kusto(cluster_url: str, database: str, query: str, token: str):
    """
    Function to query Fabric Kusto db using REST API

    Args:
        cluster_url: Full cluster URL (e.g., 'https://guid.kusto.fabric.microsoft.com')
        database: Database name
        query: KQL query
        token: Bearer token
    """

    headers = {
        'Authorization': f'Bearer {token}',
        'Content-Type': 'application/json',
        'Accept': 'application/json'
    }

    body = {
        'db': database,
        'csl': query
    }

    response = requests.post(
        f'{cluster_url}/v1/rest/query',
        headers=headers,
        json=body
    )

    return response.json()

# Example for a dataset called Profit Analysis
cluster_url = "https://<>.z0.kusto.fabric.microsoft.com"
database = "Monitoring Eventhouse"
kustoQuery = f"""
// All DAX queries executed in the last 7 days for a dataset called "Profit Analysis"
SemanticModelLogs
| where Timestamp > ago(7d)
      and ItemKind == "Dataset"
      and ItemName == "Profit Analysis"
      and OperationDetailName == "DAXQuery"
      and OperationName == "QueryEnd"
| extend CleanedDAX = replace_string(EventText, tostring(extract(@"(\[WaitTime:.*?ms\])", 1, EventText)), "")
| project DatasetName = ItemName, WorkspaceId, DurationMs, CpuTimeMs, CleanedDAX
| summarize median_duration = percentile(DurationMs, 50), median_cputime = percentile(CpuTimeMs, 50) by DatasetName, CleanedDAX, WorkspaceId
| extend pct_cpu = round(todouble(median_cputime) / todouble(median_duration),2)
| sort by pct_cpu, median_cputime, median_duration

"""
token = notebookutils.credentials.getToken(cluster_url)

try:
    results = query_kusto(cluster_url, database, kustoQuery, token)


except Exception as e:
    print(f"Error querying Kusto: {str(e)}")
# results


def kusto_to_df(results):
    """
    Convert Kusto query results to pandas DataFrame with appropriate data types.
    Unrecognized types default to string.

    Args:
        results (dict): Kusto query results containing Tables, Columns, and Rows

    Returns:
        pandas.DataFrame: DataFrame with properly typed columns
    """
    # Extract columns and data types
    columns_info = results['Tables'][0]['Columns']
    rows = results['Tables'][0]['Rows']

    # Create a df
    df = pd.DataFrame(rows, columns=[col['ColumnName'] for col in columns_info])

    # dtypes, only numeric, datetime and string. Dynamic and other will be cast as string
    for col_info in columns_info:
        col_name = col_info['ColumnName']
        data_type = col_info['DataType']

        try:
            if data_type == 'DateTime':
                df[col_name] = pd.to_datetime(df[col_name])
            elif data_type in ['Int64', 'Int32', 'Long']:
                df[col_name] = pd.to_numeric(df[col_name], errors='coerce').fillna(0).astype('int64')
            elif data_type == 'Real' or data_type == 'Double':
                df[col_name] = pd.to_numeric(df[col_name], errors='coerce')
            else:
                # Convert any other type to string, change as needed
                df[col_name] = df[col_name].astype(str)
        except Exception as e:
            print(f"Warning: Could not convert column {col_name} to {data_type}, defaulting to string: {str(e)}")
            df[col_name] = df[col_name].astype(str)

    return df

df = kusto_to_df(results)

Get a list of all the columns accessed and their memory footprint:

Below I get the DAX dependencies, thanks to Michael Kovalsky’s Semantic Link Labs. Next you would join this with all the columns from the dataset to identify the columns that are unused (be careful with this part). I will leave it upto you to do that. Depending on the number of DAX queries you have, it may take a while because Semantic Link Labs is also scanning the model with Vertipaq Analyzer.

%pip install semantic-link-labs
import pandas as pd
import sempy_labs as labs

df = kusto_to_df(results)
columns_accessed = []

for i, x in df.iterrows():
    df_cols = labs.get_dax_query_dependencies(dataset=x['DatasetName'], workspace=x['WorkspaceId'], dax_string=x["CleanedDAX"])
    columns_accessed.append(df_cols)

result_df = pd.concat(columns_accessed, ignore_index=True)
distinct_result_df = result_df.drop_duplicates().sort_values('Total Size', ascending=False).reset_index(drop=True)
distinct_result_df

We can also get the memory of the columns used by the DAX query as a fraction of the total memory :

import sempy_labs as labs

model_size = labs.get_semantic_model_size(dataset="Profit Analysis")

df.loc[:, "column_memory"] = df.apply(lambda x: labs.get_dax_query_memory_size(dataset=x['DatasetName'], workspace=x['WorkspaceId'], dax_string=x["CleanedDAX"]), axis=1)

df.loc[:, "pct_memory"] = df["column_memory"] / model_size

display(df)

This is helpful for DL models which load columns on demand.

Using Kusto Spark Connector

You can use Spark Connector too but honestly the API method is faster and better.


kustoQuery = f"""
// All DAX queries executed in the last 7 days for a dataset called "Profit Analysis"
SemanticModelLogs
| where Timestamp > ago(7d)
      and ItemKind == "Dataset"
      and ItemName == "Profit Analysis"
      and OperationDetailName == "DAXQuery"
      and OperationName == "QueryEnd"
| extend CleanedDAX = replace_string(EventText, tostring(extract(@"(\[WaitTime:.*?ms\])", 1, EventText)), "")
| project DatasetName = ItemName, WorkspaceId, DurationMs, CpuTimeMs, CleanedDAX
| summarize median_duration = percentile(DurationMs, 50), median_cputime = percentile(CpuTimeMs, 50) by DatasetName, CleanedDAX, WorkspaceId
| extend pct_cpu = round(todouble(median_cputime) / todouble(median_duration),2)
| sort by pct_cpu, median_cputime, median_duration

"""
kustoUri = "https://xxxxxxx.kusto.fabric.microsoft.com"
database = "Monitoring Eventhouse"
accessToken = mssparkutils.credentials.getToken(kustoUri)
kustoDf  = spark.read\
    .format("com.microsoft.kusto.spark.synapse.datasource")\
    .option("accessToken", accessToken)\
    .option("kustoCluster", kustoUri)\
    .option("kustoDatabase", database)\
    .option("kustoQuery", kustoQuery).load()

display(kustoDf)

I want to thank Varun Jain and Anshul Sharma at Microsoft for helping me and answering my questions. Also, thanks to my colleagues Albert Paulraj and Philip Abadir for their help.

I have some great examples on how to use it for optimizing semantic models but that will be some other time.

Notes

  • I am not sure how to filter out default semantic models. If anyone knows, please let me know. These default models are too chatty and generate too many activities.

  • I have mentioned above but above scripts are to the best of my knowledge so before taking any action, be sure to verify. If you see anything incorrect, I would appreciate if you can let me know.

  • Unless I missed something DirectLake did not show up in DatasetMode, I need to confirm again.

  • I have a script to identify if a visual is in DQ in a DL dataset, useful for tracking fallback.

  • You can use Phil Seamark’s Visualize Dataset Refresh method to see partition level details for past refreshes. In the base query above, I extracted the start time and end time.

  • To combine logs from other Workspaces, use cross-cluster queries as I show here

  • I am not 100% sure if and how the CPUtime correlates with the CUs consumed. CPU time is one part of it but not sure if it can be used as a proxy and how.

  • As mentioned above, the retention period is 30 days which cannot be changed. If you need all the data, be sure to copy it to another Eventhouse.

  • Power BI Log Analytics Template by Rui Romano already has a report that’s a great reference.

Other Resources

0
Subscribe to my newsletter

Read articles from Sandeep Pawar directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Sandeep Pawar
Sandeep Pawar