Quick Test: Finding Power BI Report Pages With Errors Using Semantic Link Labs and LLM

Sandeep PawarSandeep Pawar
5 min read

As the title says, it’s a test. I wanted to experiment with something based on a discussion I had. The user was using Semantic Link Labs’s awesome ReportWrapper to find Power BI report pages with broken visuals. (You can use this notebook to learn more). However, for ReportWrapper you need the report to be in .pbir format. In this case the reports were in pbix. So, I thought of an alternative approach- using LLM 😁

Recipe

  • Export the Power BI report as an image to a lakehouse. I have written about this before.

  • Extract all the pages of the report as png

  • Use an LLM to detect broken visuals

  • Save the pages with errors

Using LLM

There are several ways to do this:

  • Use multimodal embedding to convert the report image to an embedding vector and do similarity search (e.g. I used Jina 2 Clip embedding and did a similarity search with this report has an error message with gray background . It worked decently. It wasn’t very robust. Plus, can’t identify the visual with the error and was prone to errors.)

  • Multimodal AI Search : Similar to above but instead of text, perform similarity search using embedding of error messages. This also wasn’t too accurate.

  • Use multimodal model: Use a multimodal LLM to query the image. This worked well using several different LLMs however gemini-2.5 worked the best without any additional complexities (and it’s free for testing :D). Below I show how to do that.

Using Google Gemini

Gemini >1.5 models are multimodal, i.e. they work with txt, audio, images etc. Unlike many similar models, it’s also a very capable object detection model. You can ask it to return bounding boxes for the objects you are interested in. Below, I use that to identify the error messages. I tested it on several reports, and it worked 100% of times (on the reports I tested). You will need to generate the Gemini API key (free).

💡
As always, do your due diligence before sending any sensitive data to AI services providers. In my case, since I am using the free services, I am aware that Google will use it for their training. It’s written in their TOC. Use caution and be aware of the risks.

Extract Report Pages

You will need to enable the tenant setting to allow exporting reports as images. Attach a lakehouse to Fabric notebook and specify the report, lakehouse details.

%pip install semantic-link-labs google-genai --q
from sempy_labs.report import export_report

(export_report(
    report = "Sales and Returns-error",
    workspace="Sales",
    export_format = "PNG",
    lakehouse="MyLakehouse",
    lakehouse_workspace="Sales")
)

Detect Visuals with Errors

The prompt is simple:

💡
'Analyze this image and return a JSON response with: 1) "has_error_messages": true/false if image contains error messages, 2) "coordinates": array of bounding boxes [ymin, xmin, ymax, xmax] for each error message found. Error messages typically have a black x in a circle with a gray box.'
import json
import re
import os
from google import genai
from PIL import Image, ImageDraw

import json
import re
import os
import zipfile
import glob
from google import genai
from PIL import Image, ImageDraw

def detect_error_messages(zip_path, api_key, output_dir=None):
    if output_dir is None:
        output_dir = os.path.join(os.path.dirname(zip_path), "extracted")

    os.makedirs(output_dir, exist_ok=True)

    with zipfile.ZipFile(zip_path, 'r') as zip_ref:
        zip_ref.extractall(output_dir)


    png_files = glob.glob(os.path.join(output_dir, "*.png"))

    client = genai.Client(api_key=api_key)
    results = []

    for image_path in png_files:
        try:
            image = Image.open(image_path)

            response = client.models.generate_content(
                model='gemini-2.5-pro-preview-06-05',
                contents=[
                    'Analyze this image and return a JSON response with: 1) "has_error_messages": true/false if image contains error messages, 2) "coordinates": array of bounding boxes [ymin, xmin, ymax, xmax] for each error message found. Error messages typically have a black x in a circle with a gray box.',
                    image
                ]
            )

            result = _extract_json_result(response.text)
            result["image_path"] = image_path
            result["filename"] = os.path.basename(image_path)

            if result["has_error_messages"] and result["coordinates"]:
                base_name = os.path.splitext(image_path)[0]
                ext = os.path.splitext(image_path)[1]
                output_path = f"{base_name}-error{ext}"

                _draw_bounding_boxes(image_path, result["coordinates"], output_path)
                result["output_path"] = output_path

            results.append(result)

        except Exception as e:
            results.append({
                "image_path": image_path,
                "filename": os.path.basename(image_path),
                "error": str(e),
                "has_error_messages": False,
                "coordinates": []
            })

    return results

def _extract_json_result(text):
    json_pattern = r'\{[^{}]*"has_error_messages"[^{}]*\}'
    match = re.search(json_pattern, text, re.DOTALL)

    if match:
        try:
            return json.loads(match.group())
        except json.JSONDecodeError:
            pass

    coord_pattern = r'\[\s*\d+\s*,\s*\d+\s*,\s*\d+\s*,\s*\d+\s*\]'
    coordinates = [json.loads(match) for match in re.findall(coord_pattern, text)]

    return {
        "has_error_messages": len(coordinates) > 0,
        "coordinates": coordinates
    }

def _draw_bounding_boxes(image_path, coordinates, output_path):
    image = Image.open(image_path)
    draw = ImageDraw.Draw(image)
    width, height = image.size

    for i, (ymin, xmin, ymax, xmax) in enumerate(coordinates):
        x1 = int(xmin * width / 1000)
        y1 = int(ymin * height / 1000)
        x2 = int(xmax * width / 1000)
        y2 = int(ymax * height / 1000)

        dash_length = 10

        for x in range(x1, x2, dash_length * 2):
            draw.line([(x, y1), (min(x + dash_length, x2), y1)], fill='red', width=5)
            draw.line([(x, y2), (min(x + dash_length, x2), y2)], fill='red', width=5)

        for y in range(y1, y2, dash_length * 2):
            draw.line([(x1, y), (x1, min(y + dash_length, y2))], fill='red', width=5)
            draw.line([(x2, y), (x2, min(y + dash_length, y2))], fill='red', width=5)

    image.save(output_path)



api_key = "AIzaSyAxxxxxxxxxxxxxxxxxxxxxxxx"
image_path = "/lakehouse/default/Files/Sales and Returns-error.png"

result = detect_error_messages(image_path, api_key)

Above function returns a json with keys "has_error_messages" and ”coordinates” . If a visual with error is detected: has_error_messages is true and the coordinates of the visual with error are returned. The function draws a red dotted rectangle around the visuals with errors. I used Claude to refine the function.

Test

  1. You can see that this report has many pages. Some pages have error like below, some don’t.

  1. The function saved the report pages as a PNG and extracted the report pages:

    1. In the extracted folder, any pages with errors were saved with the bounding boxes

    2. Here is what the output looks like:

      You can automate this process and send email alerts if errors are detected - all using semantic link labs.

      I do not expect anyone to use the given the pre-requisites and risks - but I just wanted to test it for my own sake. It works !

Using Semantic Link Labs if you have a pbir report, John Kerski’s solution and Kurt’s solution are more practical.

0
Subscribe to my newsletter

Read articles from Sandeep Pawar directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Sandeep Pawar
Sandeep Pawar

Principal Program Manager, Microsoft Fabric CAT helping users and organizations build scalable, insightful, secure solutions. Blogs, opinions are my own and do not represent my employer.