Tracking YouTube with Microsoft fabric Real Time Intelligence

A month ago, I promised on my previous LinkedIn post that I'd show you how to track YouTube live streams from Sri Lanka's prime-time news channels using the Real-Time Intelligence (RTI) capabilities in Microsoft Fabric — and here we are. This blog takes you through how I built a streaming data solution that tracks live viewer metrics in real time using Fabric, YouTube APIs, and PySpark,Python Notebook.

💡

Disclaimer : This is a technical exploration for learning purposes. I have no affiliation with the featured TV channels,YouTube and this post doesn't serve any promotional intent.

Why Microsoft Fabric RTI for YouTube Stream Analytics?

RTI isn't just another event streaming platform—it's built for scenarios where milliseconds matter. Unlike traditional ETL pipelines that batch-process data every few hours, RTI creates a continuous data pipeline that can ingest, transform, and analyze YouTube API responses as they arrive.

What makes this particularly interesting is RTI's native integration with KQL (Kusto Query Language) databases, which can handle semi-structured JSON payloads from YouTube APIs without complex schema transformations. Plus, the EventStream connectors eliminate the need for custom message brokers.

For comprehensive fundamentals, Microsoft's official documentation provides excellent coverage: Microsoft Fabric Real-Time Intelligence Documentation.

Project Scope : Sri Lankan News Ecosystem

I targeted my country - Sri Lanka's major news broadcasters during their prime-time slots : Lunch (11:45-13:00), evening (18:30-19:45), and night (21:00-22:00). The hypothesis? YouTube concurrent viewers could serve as a leading indicator for overall channel engagement trends.

Here's the kicker : While YouTube Live represents maybe 10-20% of total TV or post streaming of Youtube Live viewership, it often shows engagement spikes 30-60 minutes before they appear in traditional viewership metrics and this leads a very crucial information of actual overall TV channel viewers preference and taste. This makes real-time YouTube analytics surprisingly valuable for media monitoring and competitive intelligence.

Architecture :

Here's how the pieces fit together :

The architecture leverages MS Fabric's strength : Seamless integration between compute ( PySpark notebooks), messaging (EventStreams), and analytics (KQL databases).

Step 01 : Understanding YouTube API Economics and Workarounds

YouTube Data API v3 (Free Tier) pricing creates interesting constraints. It gives you 10,000 quota units daily, but here's the catch:

Operation	Cost	Reality Check
`videos.list`	1 unit	Cheap metadata extraction
`search.list`	100 units	Expensive discovery - only 100 calls/day
`liveBroadcasts.list`	50 units	Moderate live stream lookup

Additional info : https://developers.google.com/youtube/v3/getting-started

Monitor with quota console: https://console.cloud.google.com/apis/api/youtube.googleapis.com/quotas

For my use case (checking 8 channels every 60 seconds for 300 minutes daily), I'd need ~2,400 search operations. That's 240,000 units—24x over the free limit.

Solution: I bypassed the expensive search.list by scraping YouTube's /streams endpoint directly:

This drops the cost from 100 units to ~1 unit per channel check. The trade-off? You're now dependent on YouTube's HTML structure, which could break with UI updates.

Tip: Use the hybrid approach—scrape for discovery, then use official APIs for metadata. This gives you the best of both worlds: cost efficiency and data reliability.

💡

Important : YouTube API keys can be blocked, rate-limited, or suspended in several situations, especially if the usage pattern violates Google’s policies or exceeds thresholds . So be responsible.

Note down API key in somewhere safe,we need this later.

Step 02 : Eventhouse Setup - KQL Database Configuration

We create a new Eventhouse to hold the KQL Database and subsequent tables. (refer my previous blog posts which explains several different use cases with custom endpoints)

I created an Eventhouse named ISS_DB_001 with a KQL database of the same name.In production, use descriptive meaningful names. Trust me, future-you will appreciate the clarity when managing multiple data sources.

Step 03 - Setting up Eventstream | YT_Live_ES

Creating a new Eventstream is very straightforward ; and after that we have to add a source. Since I am planning to retrieve Youtube API data, I would use custom endpoint as my source.

The EventStream (YT_Live_ES) acts as the ingestion gateway. When you create a Custom Endpoint source, Fabric generates a Service Bus connection string—this becomes your programmatic entry point for streaming data.

Once Custom endpoint added as a source we obtain the Connection string - primary key as below. (Protocol : Event Hub ),Note down this.

Step 3 - PySpark Notebook Implementation :

Okay, now we need to create a notebook and start writing the code to retrieve data from the API and push the payload to the ‘YT_Live_ES’ Eventstream .

Let me break down the core PySpark notebook implementation, focusing on what each key section accomplishes:

Configuration Setup (Lines 13-17)

Line 13 (100 minutes): Since I have many channels to cover across different time belts (lunch, evening, night), I set 100 minutes to ensure complete coverage of any prime-time news slot.

Line 14 (SAS Connection): This is the EventStream connection string we got from the custom endpoint setup.

Line 15 (API Key): The YouTube API key we saved earlier for authentication.

Line 17 (Channel Handles): These are the Sri Lankan news channels I want to track.

Line 20 (Skip IDs): Some channels transmit more than one live stream simultaneously, so I skip non-news streaming IDs to focus only on news channels.

Fetching Stream Details via API

This takes a list of video IDs, retrieves detailed information from the YouTube API, and formats it into a clean, structured dataset that includes:

Stream metadata (title, channel, URL)
Live viewer count
Publishing and retrieval timestamps
Stream status (live or ended)

Sending Data to Fabric EventStream

This function handles the critical task of streaming our processed data to Microsoft Fabric's EventStream service. It:

Extracts the entity path from the connection string
Creates a Service Bus client connection
Converts each message to JSON
Sends the batch of messages to the EventStream
Properly closes the connection to prevent resource leaks

Core Tracking Loop

This block includes:

Sets a 100-minute runtime limit
Checks each channel for live streams
Combines currently discovered streams with previously tracked ones
Fetches detailed metadata for all tracked streams
Filters for news-related content only (using both English "news" and Sinhala "ප්‍රවෘත්ති" keywords)
Updates the tracking state (adding new streams, removing ended ones)
Sends the filtered data to Fabric EventStream
Waits 60 seconds before the next cycle

Full code is in this github gist.

Summarized Flow of Entire NoteBook :

Defines a list of Sri Lankan news channels to monitor (@HiruNewsOfficial, @AdaDeranaNews, etc.).
Getting Live Video IDs:
- Fetches HTML content from the /streams page of each channel using requests.
- Extracts the current or upcoming live video ID using regex.
- Skips any videos explicitly marked in a SKIP_VIDEO_IDS set.
Fetching Stream Metadata:
- Calls the videos.list endpoint of the YouTube API for all found video IDs.
- Collects structured metadata such as title, channel name, viewer count, and live status.
Filtering Relevant Streams:
- Filters only streams that are:
  - Live (not ended).
  - Containing “news” or “ප්‍රවෘත්ති” in the video title.
  - Not explicitly skipped.
Sending to Microsoft Fabric EventStream:
- Converts the filtered stream data into JSON payload.
- Sends them to a configured Fabric EventStream.
Continuous Monitoring:
- Runs for a configured time duration (default: 100 minutes).
- Repeats checks every 60 seconds.
- Uses in-memory tracking (active_streams) to handle ongoing live streams.

When executed the NB, payload is being pushed to Eventstream as we can verify below..

Step 04 : Inside EventHouse & KQL DB - Creating Visualizations

Once our YouTube live stream data is flowing into the Fabric Eventhouse and the KQL database, it's time for the fun part – turning this real-time data into actionable visualizations. Microsoft Fabric provides two compelling approaches for creating insights: KQL Dashboards and Power BI integration. Let me walk you through how I implemented both, and why you might choose one over the other.

Setting the Table : Our Data Landing Zone

After our EventStream delivers data to the KQL database, it populates a table named YTubeLive0001TBL. This table contains everything we need:

Approach #1 : Real-time Dashboards using KQL querysets

For real-time operational monitoring where every second counts, I built a Real-time Dashboard directly in the Fabric interface. Here's how:

From the KQL Database interface, I clicked ‘Real Time Dashboard and named it "Daily News Tracker"
For my first tile, I created a KQL query to show which channels were currently broadcasting in different time belts:
Next, I added a visualization showing real-time viewer competition across channels:

During breaking news events, I could literally watch the numbers climb in real-time as viewers flocked to different channels.What makes Real-time Dashboards special is their simplicity and speed. With just a few clicks, I created a monitoring system that provides actionable insights with minimal latency. When news breaks and every minute counts, this approach shines.

In my case, I ended up using both approaches complementarily: Real-time Dashboards during live broadcasts for operational monitoring, and Power BI for deeper analysis and broader sharing of insights afterward.

Real-time Dashboard

Below is the KQL queryset I used to derive the above information—for example, the evening news belt

YTubeLive0001TBL
| where retrieval_date == startofday(now() )
  and retrieval_time_only between (time(18:00:00) .. time(19:45:00))
  and (tolower(Title) has "news" or Title has "ප්‍රවෘත්ති") 
  and status =='Live'
| project platform, stream_id, Title, Channel_Title, published_at, video_url, Live_Viewers, Retrieval_DT, retrieval_date, retrieval_time_only, channel_id, status, actual_end_time
| sort by Retrieval_DT desc nulls first

Approach # 2: Power BI Desktop Integration - The Analysis Powerhouse

While Real-time Dashboards are perfect for real-time monitoring, I needed more sophisticated visualizations and analytical capabilities for deeper insights. This is where Power BI Desktop comes in , i used Direct Query connection mode as I need minimal latency and set page auto refresh in ever 30 seconds.

Lunch Time news - Viewer trends by channel ,time on 21-05-2025

Evening Time news - Viewer trends by channel,time on 21-05-2025

Night Time news - Viewer trends by channel,time on 21-05-2025

Image showing report interactivity and slicing/filtering options using PowerBI.

Wrapping Up : From Promise to Production

Completed relevant Project Assets in my Fabric Workspace :

Building this real-time YouTube analytics system taught me that the most interesting technical challenges often come from working within constraints—API limits, HTML parsing fragility, and cost optimization. Microsoft Fabric RTI provided the infrastructure backbone that let me focus on the data logic rather than complex ETL pipelines.

The hybrid scraping approach proved more resilient than expected, maintaining 99%+ uptime over a month of continuous operation. For anyone building similar streaming analytics systems, I'd recommend starting with official APIs and falling back to intelligent scraping when quota limits become prohibitive.(If paid plan is not a concern)

The complete NoteBook code is available in this GitHub link and I'd love to see what variations you build. Whether you're tracking crypto livestreams, monitoring gaming tournaments, or analyzing educational content, the core pattern—extract, stream, analyze, visualize—remains mostly consistent.

Is this a complete representation of the total TV audience? May be not. But gives important hints.YouTube live viewers represent only a small subset. However, the real value lies in how this data acts as a real-time barometer—often signaling shifts in public sentiment or news consumption trends well before they appear in traditional ratings.

Spikes in live viewership can indicate increased public interest during major events such as elections, social unrest, or breaking news. By analyzing patterns over specific time periods, we can uncover early signs of audience movement and changing channel preferences—offering a window into what’s happening across society in real time.

References:

Connect with me to discuss real-time analytics, Microsoft Fabric implementations, or media technology innovations. I'd love to hear about your own projects in this space !

Have you implemented similar real-time analytics solutions? What challenges did you encounter, and how did you solve them? Share your experiences in the comments below.

As always Thanks for Reading - BIDiaries 😊!!!

Real-Time Intelligence in Action : Tracking Prime Time News Channel's YouTube Live Streams with Microsoft Fabric RTI

Table of contents