Building Better AI Applications with LLM Tracing using Opik

blog preview

Introduction

If you're building AI applications with Large Language Models, you've likely encountered some common frustrations: unexpected outputs, high API costs, and the challenge of figuring out why your application isn't performing as expected. These issues become even more complex when your application moves towards agentic workflows, where the AI is expected to interact with other AI systems or humans in a more dynamic and context-aware manner. One of such a more complex example being demonstrated in our blog series about advanced agent building from scratch.

Opik is an open-source debugging tool created by Comet that helps developers tackle these practical challenges. At its core, it's a monitoring and evaluation platform that gives you visibility into how your LLM applications actually behave, both during development and in production.

Opik DashboardOpik Dashboard

Common Development Challenges

When building LLM applications, developers typically face these issues:

  • Not knowing why certain prompts fail while others succeed
  • Difficulty tracking token usage and associated costs
  • Uncertain whether retrieved context (in RAG systems) is actually relevant
  • No clear way to measure application performance over time
  • Limited visibility into production behavior

How Opik Helps

Opik provides practical solutions through three main features:

  1. Tracing: Track every interaction with your LLM, including prompts, responses, and metadata
  2. Evaluation: Test your application systematically with automated checks
  3. Monitoring: Watch your application's behavior in production through easy-to-read dashboards

The tool works with common frameworks like LangChain, OpenAI, and LlamaIndex, integrating directly into your existing development workflow.

In this guide, we'll walk through the concrete steps to use Opik for debugging and improving your LLM applications, with real examples you can implement today.

What is LLM Tracing?

Before we get our hands dirty, let's clarify what we mean by "tracing" in the context of LLM applications, as this is a heavily-used term in the industry. LLM tracing is essentially a way to record and analyze every interaction between your application and a language model. Think of it like logging for traditional applications, but specifically designed for LLM-based systems.

When your AI application isn't working as expected, you need to know:

  • What prompt was sent to the model
  • What response came back
  • What context or history influenced the interaction
  • How long the request took
  • How many tokens were used (and the associated cost)

Without tracing, debugging these issues becomes guesswork. A trace - a log of each interaction - might look as follows:

1# Example trace data
2{
3 "timestamp": "2024-03-15T10:30:45Z",
4 "model": "gpt-3.5-turbo",
5 "prompt": "Summarize this article about...",
6 "completion": "The article discusses...",
7 "tokens_used": 147,
8 "latency_ms": 850,
9 "metadata": {
10 "temperature": 0.7,
11 "context_length": 2048
12 }
13}

So, in summary LLM tracing is nothing more than fancy wording for good old application logging.

Setting up Opik for LLM Tracing

Okay, let's dive into the practical steps to set up Opik for your LLM applications. There are two ways to get to use Opik:

  1. Use the fully managed Comet LLM evaluation platform. It's a fully fledged platform for all your LLM logging and observing needs. They provide a managed version of Opik.

  2. Use the open source, self-hosted, Apache-2 licensed Opik project. This is the version we'll focus on in this guide.

Installing Opik

There are basically two ways to install Opik: Either by using docker compose or kubernetes.

Make sure to have docker installed. If you want to use kubernetes, you'll need a running kubernetes cluster as well as helm and kubectl.

Running Opik using docker compose

  1. Get yourself the Opik docker compose file. Easiest, clone their repo:

    1git clone https://github.com/comet-ml/opik.git
  2. Change into the opik directory and start the services:

    1 cd opik/deployment/docker-compose
    2 docker-compose up -d

Your application should now be running on http://localhost:5173. Either use the tool on the same server as your application or expose the port to the outside world using a webserver like nginx or caddy.

Note: You might want to have a look at the docker-compose.yml file to adjust the configuration to your needs. Especially take note of the volume mounts for clickhouse, mysql and redis. These are docker volumes, depending on your needs you might want to adjust them to folder mounts.

Running Opik using kubernetes

  1. Get yourself the Opik helm chart and install it as follows:

    1 helm repo add opik https://comet-ml.github.io/opik/ && helm repo update
    2
    3 VERSION=latest; helm upgrade --install opik -n opik --create-namespace opik/opik \
    4 --set component.backend.image.tag=$VERSION --set component.frontend.image.tag=$VERSION

Your application will provide a service svc/opik-frontend on port 5173. If you need to forward the service to your local machine, you can use the following command:

1kubectl port-forward svc/opik-frontend 5173:80 -n opik

Note: No matter the method you chose, Opik itself does not provide any means of authentication - the application is freely available. Make sure to add any form of authentication provider in front of the application or make sure to run it in a secured network.

If everything went well, you should now be able to access the Opik frontend on http://localhost:5173 and be greeted by this screen:

Opik welcome screenOpik welcome screen

Now that we have the platform up and running, we need to install the Opik python SDK to start tracing our LLM applications (this needs to be installed in the context of the application you want to trace).

1pip install opik

Using Opik in Development

Enough preparation, let's see how we can use Opik in our development workflow to trace our LLM applications. Start by initializing the Opik SDK on your development machine:

1# Select option 'self-hosted'
2opik configure

Tracing your LLM application with Opik

The most simple use-case for Opik is to trace your LLM applications. Meaning we want to see every interaction with an LLM model, including the prompt, the response, and any metadata associated with the interaction.

For this guide we'll assume we use an OpenAI LLM. Adding Opik tracing in this case is as simple as wrapping the OpenAI client as follows:

1from opik.integrations.openai import track_openai
2from openai import OpenAI
3
4client = OpenAI()
5client = track_openai(client)

That's it. From now on, every call to the OpenAI client will be traced by Opik. If we for example run this code:

1response = client.chat.completions.create(
2 model="gpt-4o",
3 messages=[
4 {"role":"user", "content": "How are you?"}
5 ]
6)

We get this lovely, first trace in Opik:

Our first Opik traceOur first Opik trace

Basically we could end this blog article here, as this is and will be the most important application for any tracing endeavor. We see the question, the answer, the model used, the tokens used, the latency and some metadata.

Furthermore, most frameworks allow to use a custom OpenAI client to be passed as parameter. This means you can use the Opik tracing with any framework that uses the OpenAI client.

For example if you want to use Opik tracing for the new PydanticAI agent framework, simple use their custom OpenAI agent feature and pass the wrapped client to the agent.

1from pydantic_ai import Agent
2from pydantic_ai.models.openai import OpenAIModel
3from opik.integrations.openai import track_openai
4from openai import OpenAI
5
6client = OpenAI()
7client = track_openai(client)
8
9model = OpenAIModel('gpt-4o', openai_client=client)
10agent = Agent(model)

If you have multiple function calls (like tools) that interact with the LLM model, you can use the opik.trace decorator to trace these calls.

1from pydantic_ai import Agent
2from pydantic_ai.models.openai import OpenAIModel
3from opik.integrations.openai import track_openai
4from openai import OpenAI
5
6client = OpenAI()
7client = track_openai(client)
8
9model = OpenAIModel('gpt-4o', openai_client=client)
10agent = Agent(model)
11
12@opik.trace
13def search_wikipedia_tool(query):
14 # Some code to search wikipedia
15 return response
16
17@opik.trace
18def llm_call(message):
19 response = client.chat.completions.create(
20 model="gpt-4o",
21 messages=[{"role": "user", "content": message}]
22 )
23 return response.choices[0].message.content

This will log the function name as well as model details and function timings.

Opik Integrations

However, Opik makes this even more comfortable. They provide a ton of native integrations for many popular tools and frameworks.

For reference, let's see how we can integrate Opik with Langchain:

1from langchain.chains import LLMChain
2from langchain_openai import OpenAI
3from langchain.prompts import PromptTemplate
4from opik.integrations.langchain import OpikTracer
5
6# That's the only additional like we need.
7opik_tracer = OpikTracer()
8
9# Create the LLM Chain using LangChain
10llm = OpenAI(temperature=0)
11prompt_template = PromptTemplate(
12 input_variables=["name"],
13 template="Hi, I'm {name}. How are you today?"
14)
15llm_chain = LLMChain(llm=llm, prompt=prompt_template)
16response = llm_chain.run("Andreas", callbacks=[opik_tracer])

As you can see, all we need is one additional line. We create an OpikTracer and pass it as a callback to the run method of the LLMChain. This will automatically trace the interaction with the LLM model via Langchain.

For a full list of integrations, please visit their outstanding documentation

Manual tracing if none of the integrations fit

Let's say you have a framework that is not supported by Opik, and that does not allow you to pass a custom LLM client. Which might happen with all the new agent frameworks popping up. In this case you can always manually trace the interactions.

Let's consider we have a simple agent application, similar to the one of our previous blog post. Let's assume we use an agent framework that does not allow to pass a custom LLM client. In this case we can add the opik.trace function decorator to any methods that interact with the LLM model.

1import opik
2
3@opik.trace
4def search_wikipedia_tool(query):
5 # Some code to search wikipedia
6 return response
7
8@opik.trace
9def query_database_tool(query):
10 # Some code to query a database
11 return response
12
13@opik.trace
14def chat_with_llm_tool(message):
15 # Some code to chat with an LLM
16 return response

While this will not log the exact prompts and responses as well as the costs incurred by the LLM, we at least get an idea about tool usage and latency. Note that in this case Opik traces the inputs, outputs and timings of the functions.

Annotations

Another great feature of Opik is the ability to annotate LLM traces with feedback scores. In the UI, navigate to the trace you want to annotate and hit the Annotate button at the top-right corner.

If it's the first time you're annotating a trace, you'll be asked to create a Feedback definition. Fill out the feedback definition form and click Create feedback definition.

Creating a feedback definitionCreating a feedback definition

Now you can annotate the trace with the feedback category you just created. Simply select the category from the dropdown.

Annotating a traceAnnotating a trace

Tracking LLM costs

For OpenAI and Gemini models, Opik automatically tracks the costs of each LLM call and provides a breakdown of the costs in the trace view as well as the project metrics overview.

Estimated costs overviewEstimated costs overview

This is especially useful when you're working with a budget or need to optimize your LLM usage.

Note that while this cost tracking is not available for all LLM models, token usage and latency are always tracked.

Setting the project to use

In all our examples so far, we've been using the default project. However, you will most probably create multiple projects. Opik allows you to set the project you want to trace for as simple function parameter, for both the client creation as well as the track decorator.

1client = OpenAI()
2client = track_openai(client, project_name="opiktest")
3
4@opik.track(project_name="opiktest")
5def some_function(input):
6 return input

If you want to change the project for all traces, you can set the project name as environment variable:

1os.environ["OPIK_PROJECT_NAME"] = "opiktest"

LLM Monitoring and Tracing in Production with Opik

Now that we hopefully successfully created a well-performing LLM application we need to make sure it also behaves in production.

Trace Logging

Nothing new here. Opik is designed to be highly scalable, so simply use the same setup as you did for development.

Monitoring Dashboards

All trace metrics (number of traces, tokens and feedback scores) are available in a dashboard, giving you a high level overview of your application. You can spot if your LLM costs run away or your application is not performing as expected.

Opik Dashboard (kindly taken from
https://www.comet.com/docs/opik/production/production_monitoring)Opik Dashboard (kindly taken from https://www.comet.com/docs/opik/production/production_monitoring)

Online Evaluation of LLM Performance

Talking about feedback scores, in production we need to create them for each trace. This can't be done manually, as you most probably have too many of them. So Opik provides a way to automatically create feedback scores using an LLM model themselves. (This is called 'LLM as a judge').

Let's assume you built a RAG system using Azure AI Search, then you would want to know, whether your LLM generates relevant answers - and that it doesn't hallucinate.

Opik provides 3 predefined LLM as a judge validation prompts:

  • Hallucination: This metric checks if the LLM output contains any hallucinated information.
  • Moderation: This metric checks if the LLM output contains any offensive content.
  • Answer Relevance: This metric checks if the LLM output is relevant to the given context.

Alternatively you can also create your own validation prompts.

  1. First, we need to configure an LLM provider (which is used as judge). Navigate to the AI Provider configuration and add a new AI provider.

  2. Then navigate to your project, select the Rules tab and create a new rule.

    Create rule screenCreate rule screen

  3. Fill out the form by selecting the LLM judge model, selecting the prompt you want to use - and most importantly - defining the variable mappings. Text provided in two {{ brackets }} will be replaced by the actual values from the trace.

    Note: You can use any value in your trace as variable. Feel free to experiment with the metadata parameter of the track decorator to add additional information to your traces.

    Last but not least add a score definition.

    Click on Create rule and you're done. From now on every trace will be annotated with the feedback score.

Note: This automatic scoring feature is one of the most powerful ones of Opik. It allows you to automatically validate your LLM output in production and detect potential issues early on.


Interested in how to train your very own Large Language Model?

We prepared a well-researched guide for how to use the latest advancements in Open Source technology to fine-tune your own LLM. This has many advantages like:

  • Cost control
  • Data privacy
  • Excellent performance - adjusted specifically for your intended use

Further reading

More information on our managed RAG solution?
To Pondhouse AI
More tips and tricks on how to work with AI?
To our Blog