How to Trace/Monitor an AI application

SUPRABHATSUPRABHAT
6 min read

Introduction

We need AI tracing because AI systems are often like "black boxes": they take in inputs and give outputs, but we don't always understand how they get there. Tracing helps to open that black box for different important reasons.

How to Trace/Monitor an AI application

We have many tools for tracing AI applications. In this blog, we understand 2 tools-

  • LangSmith (by LangChain):

    • Traces end-to-end RAG pipelines (retrieval, generation, tool usage).

    • Logs inputs, outputs, latency, and errors.

    • Visualizes chain/agent execution steps.

    • Website

  • langfuse(Open Source)

    • Open Source: Self-hostable with a free tier (community edition).

    • Developer-Friendly: Easy setup via Python/TypeScript SDKs or REST API.

    • Unified Dashboard: Centralizes logs, traces, and metrics for AI workflows.

    • Website

1. LangSmith (by LangChain)

LangSmith (by LangChain) is a unified platform for developing, monitoring, and debugging LLM applications (e.g., RAG, agents, chains).

  • Features: Traces LLM calls, tool usage, and retrieval steps; debugs hallucination/errors; monitors latency, token usage, and costs; tests prompts/chains via versioning and A/B testing.

  • Integrations: Native support for LangChain, OpenAI, Anthropic, Hugging Face, and custom models/vector DBs.

  • Use Cases: Debugging RAG pipelines, optimizing agent workflows, auditing model outputs, and collaborating on LLM projects.

  • Access: Cloud-hosted with a free tier for experimentation and team plans for scalability.

This is developed by LangChain and used to Trace(Monitor) AI applications widely. Free for one Admin, but if you need to add your team members, you need to pay. They mostly work on the cloud, but they have a self-hosted version for enterprises. (reference - LangSmith docs )

Set up in Python

pip install -U langsmith

Next, make sure you have signed up for a LangSmith account, then create and set your API key. You will also want to sign up for an OpenAI API key to run the code in this tutorial

Without LangChain

and add these in your .env file-

LANGSMITH_TRACING=true
LANGSMITH_ENDPOINT="https://api.smith.langchain.com"
LANGSMITH_API_KEY="<your-api-key>"
LANGSMITH_PROJECT="pr-kindly-dollop-75"
OPENAI_API_KEY="<your-openai-api-key>"

Trace/Monitor an LLM call

from openai import OpenAI
from langsmith.wrappers import wrap_openai

openai_client = wrap_openai(OpenAI())

# This is the retriever we will use in RAG
# This is mocked out, but it could be anything we want
def retriever(query: str):
    results = ["Harrison worked at Kensho"]
    return results

# This is the end-to-end RAG chain.
# It does a retrieval step then calls OpenAI
def rag(question):
    docs = retriever(question)
    system_message = """Answer the users question using only the provided information below:

    {docs}""".format(docs="\n".join(docs))

    return openai_client.chat.completions.create(
        messages=[
            {"role": "system", "content": system_message},
            {"role": "user", "content": question},
        ],
        model="gpt-4o-mini",
    )

Trace the entire application (Also tools/functions)

from openai import OpenAI
from langsmith import traceable
from langsmith.wrappers import wrap_openai

openai_client = wrap_openai(OpenAI())

def retriever(query: str):
    results = ["Harrison worked at Kensho"]
    return results

@traceable
def rag(question):
    docs = retriever(question)
    system_message = """Answer the users question using only the provided information below:

    {docs}""".format(docs="\n".join(docs))

    return openai_client.chat.completions.create(
        messages=[
            {"role": "system", "content": system_message},
            {"role": "user", "content": question},
        ],
        model="gpt-4o-mini",
    )

If you already use LangChain in your Application, the setup is a little different and easy; you just install LangChain, and you're good to go.

pip install -U langchain langchain-openai

env variables

LANGSMITH_TRACING=true
LANGSMITH_ENDPOINT="https://api.smith.langchain.com"
LANGSMITH_API_KEY="<your-api-key>"
LANGSMITH_PROJECT="pr-kindly-dollop-75"
OPENAI_API_KEY="<your-openai-api-key>"

Trace/Monitor an LLM call

from langchain_openai import ChatOpenAI

llm = ChatOpenAI()
llm.invoke("Hello, world!")

You also refer to the LangSmith doc for that.

2. Langfuse(open source)

Langfuse is an open-source LLM observability platform designed to streamline the development and monitoring of AI applications like RAG systems, chatbots, and agents.

  • Features: Traces end-to-end workflows (retrieval, generation, API calls), monitors metrics (latency, cost, token usage), evaluates outputs for accuracy/relevance, and enables prompt versioning/A/B testing.

  • Integrations: Seamlessly works with LangChain, LlamaIndex, OpenAI, Anthropic, Pinecone, and major cloud platforms.

  • Flexibility: Offers self-hosting options, a free tier for small projects, and scalable Pro plans for teams.

  • Use Cases: Debugging hallucination in LLMs, optimizing RAG pipelines, auditing compliance, and reducing operational costs.

We have two ways to use Langfuse

  1. Cloud version

    Work on their cloud, good for the small company, has a big free tier. But if you are a big company or you need to have full access to your data, you need to use their self-hosted version.

Basic Setup

pip install langfuse

env variables

LANGFUSE_SECRET_KEY="sk-lf-..."
LANGFUSE_PUBLIC_KEY="pk-lf-..."
# 🇪🇺 EU region
LANGFUSE_HOST="https://cloud.langfuse.com"
# 🇺🇸 US region
# LANGFUSE_HOST="https://us.cloud.langfuse.com"

Trace/Monitor an LLM call

import openai #❌
from langfuse.openai import openai #✔️

Alternative imports:
 from langfuse.openai import OpenAI, AsyncOpenAI, AzureOpenAI, AsyncAzureOpenAI

completion = openai.chat.completions.create(
  model="gpt-4o",
  messages=[
      {"role": "system", "content": "You are a very accurate calculator."},
      {"role": "user", "content": "1 + 1 = "}],
)
  1. Self-hosted

    For self-hosting Langfuse, we have -

    • Local: Run Langfuse on your own machine in 5 minutes using Docker Compose.

    • VM: Run Langfuse on a single VM using Docker Compose.

    • Docker

    • Kubernetes (Helm): Run Langfuse on a Kubernetes cluster using Helm.

    • Planned: Cloud-specific deployment guides, please upvote and comment on the following threads: AWS, Google Cloud, Azure.

You click on (Local, VM, Docker, Kubernetes, etc) and redirect to the Langfuse website, they have a detailed doc for every type of hosting

Conclusion

Tracing and monitoring AI applications are essential to demystify the "black box" nature of AI systems, enabling developers to debug, optimize, and audit workflows effectively. This blog explored two powerful tools for achieving this: LangSmith and Langfuse.

  • LangSmith (by LangChain) excels in end-to-end tracing for LLM applications like RAG pipelines and agent workflows. Its native integration with LangChain, cloud-hosted platform, and features like latency monitoring and prompt versioning make it ideal for teams already embedded in the LangChain ecosystem. While primarily cloud-based, its enterprise self-hosted option caters to larger organizations.

  • Langfuse, as an open-source alternative, offers unparalleled flexibility with self-hosting capabilities, unified observability, and compatibility across frameworks (LangChain, LlamaIndex, etc.). Its focus on cost monitoring, output evaluation, and scalable deployment options positions it as a strong choice for companies prioritizing data control or budget-conscious experimentation.

Choosing the Right Tool depends on your needs:

  • Opt for LangSmith if you seek seamless LangChain integration, cloud convenience, and advanced debugging features.

  • Choose Langfuse for open-source flexibility, self-hosting, or multi-framework compatibility.

Both tools empower developers to build transparent, efficient, and reliable AI systems, whether you’re auditing model outputs, optimizing RAG pipelines, or reducing operational costs. By leveraging these platforms, teams can transform opaque AI processes into observable, actionable workflows, ensuring robust and trustworthy applications.

  1. What is RAG

  2. Parallel Query Retrieval (Fan Out)

  3. Chain of Thought

  4. Step-Back Prompting

  5. Reciprocal Rank Fusion

  6. Hypothetical Document Embeddings

  1. AI for Amateurs
6
Subscribe to my newsletter

Read articles from SUPRABHAT directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

SUPRABHAT
SUPRABHAT