Exploring Passive and Active Monitoring for LLMs


Exploring Passive and Active Monitoring for LLMs: A Weekend Adventure! 🚀
This past weekend, I dove into Lesson 5: Passive and Active Monitoring from the fantastic "Quality and Safety for LLM Applications" course by DeepLearning.AI on Coursera. And let me tell you—learning about monitoring language models was an absolute blast! 🔥
With AI applications becoming increasingly powerful, ensuring their quality and safety is not optional—it’s essential. This lesson introduced passive and active monitoring techniques to track model outputs, detect issues like refusals and toxic responses, and build robust validation systems.
Let's break it all down and explore the magic behind the code! 🧙♂️
🔧 Setting Up the Monitoring Environment
First, we need to set up our environment. We’ll be using pandas, whylogs, and helpers for logging and monitoring data, along with langkit for additional LLM metrics.
import pandas as pd
import whylogs as why
import helpers
from langkit import llm_metrics
llm_metrics.init()
What’s happening here?
pandas
helps us manage and analyze tabular data.whylogs
is a powerful logging library designed for data and AI monitoring.helpers
is a utility module that provides helper functions for our tasks.langkit.llm_metrics.init()
initializes our language model monitoring toolkit.
Now, let’s register our dataset schema for logging:
from whylogs.experimental.core.udf_schema import register_dataset_udf
from whylogs.experimental.core.udf_schema import udf_schema
llm_schema = udf_schema()
This sets up a schema that will be used to process and log our dataset features efficiently.
📊 Passive Monitoring: Logging LLM Requests and Responses
Passive monitoring means observing LLM outputs over time without directly interfering. We set up a logger that tracks requests and responses:
llm_logger = why.logger(schema=udf_schema())
llm_logger = why.logger(
model = "rolling",
interval = 1,
when = "H",
schema = udf_schema()
)
Breaking it down:
- We initialize
llm_logger
using whylogs. - We configure it to use a rolling model, meaning logs are saved periodically.
- The
interval = 1
andwhen = "H"
parameters mean logs are updated every hour.
You can check logs in WhyLabs, a powerful observability platform for machine learning monitoring! 🌐
🚀 Building Active Monitoring Guardrails
Active monitoring is a step beyond passive monitoring—it doesn’t just observe; it reacts in real-time to prevent harmful outputs. This is where things get really exciting!
First, we need to connect our application to OpenAI's API:
import openai
openai.api_key = helpers.get_openai_key()
Now, let’s create a request logging function:
def user_request():
request = input("\nEnter your desired item to make a recipe (or 'quit'): ")
if request.lower() == "quit":
raise KeyboardInterrupt()
active_llm_logger.log({"request": request})
return request
Here, we:
- Take user input for a recipe request.
- Log the request using
active_llm_logger
. - Return the request for further processing.
Now, we send the request to OpenAI's GPT-3.5-turbo and log the response:
def prompt_llm(request):
prompt = f"""Please give me a short recipe for creating {request} in up to 6 steps."""
active_llm_logger.log({"prompt": prompt})
response = openai.ChatCompletion.create(
model = "gpt-3.5-turbo",
messages = [{"role": "system", "content": prompt}]
)["choices"][0]["message"]["content"]
active_llm_logger.log({"response": response})
return response
If everything works, we print and log the AI’s response:
def user_reply_success(request, response):
reply = f"\nSuccess! Here is the recipe for {request}:\n{response}"
print(reply)
active_llm_logger.log({"reply": reply})
If the LLM fails to generate a response, we log an error:
def user_reply_failure(request = "your request"):
reply = "\nUnfortunately, we are not able to provide a recipe at this time."
print(reply)
active_llm_logger.log({"reply": reply})
Let’s wrap it all in a loop so we can repeatedly ask for recipes:
while True:
try:
request = user_request()
response = prompt_llm(request)
user_reply_success(request, response)
except KeyboardInterrupt:
break
except LLMApplicationValidationError:
user_reply_failure(request)
break
This continuously:
- Takes user input
- Generates a recipe
- Logs and displays the response
- Handles failures gracefully
🚨 Validation: Catching Toxicity and Refusals
To ensure safety, we need validators that detect toxic responses and refusals:
from whylogs.core.relations import Predicate
from whylogs.core.metrics.condition_count_metric import Condition
from whylogs.core.validators import ConditionValidator
Creating a function that raises an error if conditions aren’t met:
def raise_error(validator_name, condition_name, value):
raise LLMApplicationValidationError(
f"Failed {validator_name} with value {value}."
)
Now, defining conditions:
low_condition = {"<0.3": Condition(Predicate().less_than(0.3))}
toxicity_validator = ConditionValidator(
name = "Toxic",
conditions = low_condition,
actions = [raise_error]
)
refusal_validator = ConditionValidator(
name = "Refusal",
conditions = low_condition,
actions = [raise_error]
)
We attach these validators to our active logger:
llm_validators = {
"prompt.toxicity": [toxicity_validator],
"response.refusal_similarity": [refusal_validator]
}
active_llm_logger = why.logger(
model = "rolling",
interval = 5,
when = "M",
base_name = "active_llm",
schema = udf_schema(validators = llm_validators)
)
Finally, we test our safety net:
active_llm_logger.log({"response":"I'm sorry, but I can't answer that."})
If the response matches refusal patterns, an error is raised! ⚠️
🎯 Final Thoughts
This lesson was an absolute game-changer! 🎮 We learned how to: ✅ Log LLM interactions (passive monitoring) ✅ Create safety guardrails (active monitoring) ✅ Validate outputs using toxicity and refusal detectors
Big shoutout to DeepLearning.AI and Coursera for this incredible course! If you’re building LLM applications, monitoring is not optional—it’s a necessity! 🚀
Happy coding! 😃💡
Subscribe to my newsletter
Read articles from Mojtaba Maleki directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Mojtaba Maleki
Mojtaba Maleki
Hi everyone! My name is Mojtaba Maleki and I was born on the 11th of February 2002. I'm currently a Computer Science student at the University of Debrecen. I'm a jack-of-all-trades when it comes to programming, so if you have a problem, I'm your man! My expertise lies in Machine Learning, Web and Application Development and I have published four books about Computer Science on Amazon. I'm proud to have multiple valuable certificates from top companies, so if you're looking for someone with qualifications, you've come to the right place. If you're not convinced yet, I'm also a great cook, so if you're ever hungry, just let me know!