Extracting Spark Event Logs in Fabric for Monitoring & Optimization

I wrote a blog a couple of weeks ago about extracting Spark driver logs using REST API. In this blog, I will share how to call APIs to get the spark event logs, parse the logs for spark performance metrics. This can be used for debugging, optimizing & monitoring spark applications in Fabric. Note that in this case I am retrieving the logs for an executed application. If you want to do real time monitoring, use the spark emitter + Eventstream instead. Workspace Monitoring in Fabric does not yet include spark logs.
Steps:
Get application id and livy id of the application (notebook or SJD)
Retrieve event log zip and save it to a lakehouse
Extract the event log JSON and save it to a lakehouse
Parse the event log for spark metrics
Get Session Info
Use the get_latest_session_info()
function from my previous blog. This will give you the latest application id and the livy id which will be required for further API calls. Note that if you want to get ids for all previous sessions modify the function accordingly.
## Fabric Python or Pyspark notebook
## Function from https://fabric.guru/extracting-fabric-spark-driver-logs-using-api
notebook_id = "996b5d64-xxxxxxxx-ff2ba76d0fc8"
workspace_id = fabric.get_notebook_workspace_id() #or replace with your workspace id
session = get_latest_session_info(notebook_id, workspace_id)
livy_id = session['livyId']
app_id = session['applicationId']
Retrieve Spark Event Log
Spark event log can be several hundred MBs or even GBs so attach a default lakehouse to save the log zip file to a lakehouse.
import os
import zipfile
import glob
import sempy.fabric as fabric
def get_spark_event_log(notebook_id, workspace_id, output_path, livy_id=None, app_id=None):
"""
Sandeep Pawar | fabric.guru | May 19,2025
Gets Spark event logs and saves to specified path in an attached lakehouse.
"""
if not livy_id or not app_id:
session_info = get_latest_session_info(notebook_id, workspace_id)
if not session_info:
return "Error: Could not retrieve session info"
livy_id = session_info.get('livyId')
app_id = session_info.get('applicationId')
if os.path.isdir(output_path) or output_path.endswith('/'):
os.makedirs(output_path, exist_ok=True)
output_path = os.path.join(output_path, f"spark_log_{app_id}.zip")
else:
os.makedirs(os.path.dirname(output_path), exist_ok=True)
client = fabric.FabricRestClient()
#refer to https://learn.microsoft.com/en-us/rest/api/fabric/spark/livy-sessions for API details
#/1/ below is for the first try
endpoint = f"https://api.fabric.microsoft.com/v1/workspaces/{workspace_id}/notebooks/{notebook_id}/livySessions/{livy_id}/applications/{app_id}/1/logs"
try:
print(f"Retrieving logs from endpoint: {endpoint}")
response = client.get(endpoint)
if response.status_code != 200:
return f"Error: Received status code {response.status_code}. Response: {response.text}"
with open(output_path, "wb") as f:
f.write(response.content)
file_size = len(response.content)
return f"Successfully saved event logs ({file_size/1024/1024:.2f} MB) to {output_path}"
except Exception as e:
return f"Error retrieving logs: {str(e)}"
Extract Event Log
The zip file contains the event log which needs to be extracted. Below functions unzips the file and save it to the specified path in the lakehouse.
## Be sure to attach a default lakehouse
# I assumed it's .zip, adjust if other zip compression in the future
def unzip_spark_log(zip_path, extract_path):
"""
Sandeep Pawar | fabric.guru | May 19, 2025
Extracts Spark event log zip to specified directory.
"""
if os.path.isdir(zip_path):
zip_files = glob.glob(os.path.join(zip_path, "*.zip"))
if not zip_files:
return f"Error: {zip_path} is a directory and no zip files were found within it"
zip_path = zip_files[0]
print(f"Using zip file: {zip_path}")
if not os.path.exists(zip_path):
return f"Error: Zip file {zip_path} doesn't exist"
os.makedirs(extract_path, exist_ok=True)
try:
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
file_list = zip_ref.namelist()
zip_ref.extractall(extract_path)
return f"Extracted {len(file_list)} files to {extract_path}: {', '.join(file_list)}"
except Exception as e:
return f"Error extracting logs: {str(e)}"
Example:
##python or pyspark notebook
output_path = "/lakehouse/default/Files/rawlogs"
get_spark_event_log(notebook_id, workspace_id,output_path , livy_id=livy_id, app_id=app_id) #saves the zip file
unzip_spark_log(output_path, output_path) #unzips the zip file
Parse log for spark metrics:
Parse the event log JSON to get the performance metrics. Below I am extracting the stage metrics across all the stages. From this you can get metrics like data spill, GC time, shuffle read/write, CPU time, idle time, memory used etc which can help you optimize the spark application (more on this later).
##pyspark notebook
%%pyspark
df = spark.read.json(f"Files/rawlogs/application_xxxxxxx")
df1 = df.filter("Event='SparkListenerStageCompleted'").select("`Stage Info`.*")
df1.createOrReplaceTempView("t2")
df2 = spark.sql("select 'Submission Time','Completion Time', 'Number of Tasks', 'Stage ID', t3.col.* from t2 lateral view explode(Accumulables) t3")
df2.createOrReplaceTempView("t4")
result = spark.sql("select Name, sum(Value) as value from t4 group by Name order by Name")
display(result)
Similar to stage-level, you can also get task-level metrics, or aggregate metrics for all jobs in a workspace etc. There are other couple of interesting APIs which I will cover in future blogs.
I will share more on how to interpret these metrics and how they can provide insights into the application. Note that Fabric offers many built-in monitoring capabilities. APIs give you the ability to access detailed metrics and create additional custom metrics.
You can download the event log manually as well from the spark history server:
I would like to thank Jenny Jiang from Microsoft for answering my questions.
References:
Subscribe to my newsletter
Read articles from Sandeep Pawar directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Sandeep Pawar
Sandeep Pawar
Microsoft MVP with expertise in data analytics, data science and generative AI using Microsoft data platform.