Cradicle Explorer

/ docs-website / docs / development / tracing.mdx
tracing.mdx
  1  ---
  2  title: "Tracing"
  3  id: tracing
  4  slug: "/tracing"
  5  description: "This page explains how to use tracing in Haystack. It describes how to set up a tracing backend with OpenTelemetry, Datadog, or your own solution. This can help you monitor your app's performance and optimize it."
  6  ---
  7  
  8  import ClickableImage from "@site/src/components/ClickableImage";
  9  
 10  # Tracing
 11  
 12  This page explains how to use tracing in Haystack. It describes how to set up a tracing backend with OpenTelemetry, Datadog, or your own solution. This can help you monitor your app's performance and optimize it.
 13  
 14  Traces document the flow of requests through your application and are vital for monitoring applications in production. This helps to understand the execution order of your pipeline components and analyze where your pipeline spends the most time.
 15  
 16  ## Configuring a Tracing Backend
 17  
 18  Instrumented applications typically send traces to a trace collector or a tracing backend. Haystack provides out-of-the-box support for [OpenTelemetry](https://opentelemetry.io/) and [Datadog](https://app.datadoghq.eu/dashboard/lists). You can also quickly implement support for additional providers of your choosing.
 19  
 20  ### OpenTelemetry
 21  
 22  To use OpenTelemetry as your tracing backend, follow these steps:
 23  
 24  1. Install the [OpenTelemetry SDK](https://opentelemetry.io/docs/languages/python/):
 25  
 26     ```shell
 27     pip install opentelemetry-sdk
 28     pip install opentelemetry-exporter-otlp
 29     ```
 30  2. To add traces to even deeper levels of your pipelines, we recommend you check out [OpenTelemetry integrations](https://opentelemetry.io/ecosystem/registry/?s=python), such as:
 31     - [`urllib3` instrumentation](https://github.com/open-telemetry/opentelemetry-python-contrib/tree/main/instrumentation/opentelemetry-instrumentation-urllib3) for tracing HTTP requests in your pipeline,
 32     - [OpenAI instrumentation](https://github.com/traceloop/openllmetry/tree/main/packages/opentelemetry-instrumentation-openai) for tracing OpenAI requests.
 33  3. There are two options for how to hook Haystack to the OpenTelemetry SDK.
 34  
 35     - Run your Haystack applications using OpenTelemetry’s [automated instrumentation](https://opentelemetry.io/docs/languages/python/getting-started/#instrumentation). Haystack will automatically detect the configured tracing backend and use it to send traces.
 36  
 37       First, install the `OpenTelemetry` CLI:
 38  
 39       ```shell
 40       pip install opentelemetry-distro
 41       ```
 42  
 43       Then, run your Haystack application using the OpenTelemetry SDK:
 44  
 45       ```shell
 46       opentelemetry-instrument \
 47           --traces_exporter console \
 48           --metrics_exporter console \
 49           --logs_exporter console \
 50           --service_name my-haystack-app \
 51           <command to run your Haystack pipeline>
 52       ```
 53  
 54     — or —
 55  
 56     - Configure the tracing backend in your Python code:
 57  
 58       ```python
 59       from haystack import tracing
 60  
 61       from opentelemetry import trace
 62       from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
 63       from opentelemetry.sdk.trace import TracerProvider
 64       from opentelemetry.sdk.trace.export import BatchSpanProcessor
 65       from opentelemetry.sdk.resources import Resource
 66       from opentelemetry.semconv.resource import ResourceAttributes
 67  
 68       # Service name is required for most backends
 69       resource = Resource(attributes={
 70           ResourceAttributes.SERVICE_NAME: "haystack"  # Correct constant
 71       })
 72  
 73       tracer_provider = TracerProvider(resource=resource)
 74       processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="http://localhost:4318/v1/traces"))
 75       tracer_provider.add_span_processor(processor)
 76       trace.set_tracer_provider(tracer_provider)
 77  
 78       # Tell Haystack to auto-detect the configured tracer
 79       import haystack.tracing
 80       haystack.tracing.auto_enable_tracing()
 81  
 82       # Explicitly tell Haystack to use your tracer
 83       from haystack.tracing import OpenTelemetryTracer
 84  
 85       tracer = tracer_provider.get_tracer("my_application")
 86       tracing.enable_tracing(OpenTelemetryTracer(tracer))
 87       ```
 88  
 89  ### Datadog
 90  
 91  To use Datadog as your tracing backend, follow these steps:
 92  
 93  1. Install [Datadog’s tracing library ddtrace](https://ddtrace.readthedocs.io/en/stable/#).
 94  
 95     ```shell
 96     pip install ddtrace
 97     ```
 98  2. There are two options for how to hook Haystack to ddtrace.
 99  
100     - Run your Haystack application using the `ddtrace`:
101       ```shell
102       ddtrace <command to run your Haystack pipeline
103       ```
104  
105     — or —
106  
107     - Configure the Datadog tracing backend in your Python code:
108  
109       ```python
110       from haystack.tracing import DatadogTracer
111       from haystack import tracing
112       import ddtrace
113  
114       tracer = ddtrace.tracer
115       tracing.enable_tracing(DatadogTracer(tracer))
116       ```
117  
118  ### Langfuse
119  
120  `LangfuseConnector` component allows you to easily trace your Haystack pipelines with the Langfuse UI.
121  
122  Simply install the component with `pip install langfuse-haystack`, then add it to your pipeline.
123  
124  :::info
125  Check out the component's [documentation page](../pipeline-components/connectors/langfuseconnector.mdx) for more details and example usage, or our [blog post](https://haystack.deepset.ai/blog/langfuse-integration) for the complete walkthrough.
126  :::
127  <ClickableImage src="/img/11cec4f-langfuse-generation-span.png" alt="Langfuse trace detail view showing generation span with input prompt, output, metadata, latency, and cost information for a language model call" />
128  
129  ### MLflow
130  
131  [MLflow](https://mlflow.org/) is an open-source platform for managing the end-to-end machine learning and AI lifecycle. MLflow provides native tracing support for Haystack. Simply install MLflow and enable automatic tracing with a single line of code.
132  
133  ```shell
134  pip install mlflow
135  ```
136  
137  ```python
138  import mlflow
139  
140  mlflow.haystack.autolog()
141  # Optionally set an experiment name
142  mlflow.set_experiment("Haystack")
143  ```
144  
145  This automatically captures traces from all Haystack pipelines and components, including latencies, token usage, cost, and any exceptions.
146  
147  :::info
148  Check out the [MLflow Haystack integration guide](https://haystack.deepset.ai/integrations/mlflow) for a full walkthrough with examples.
149  :::
150  
151  ### Weights & Biases Weave
152  
153  The `WeaveConnector` component allows you to trace and visualize your pipeline execution in [Weights & Biases](https://wandb.ai/site/) framework.
154  
155  You will first need to create a free account on Weights & Biases website and get your API key, as well as install the integration with `pip install weights_biases-haystack`.
156  
157  :::info
158  Check out the component's [documentation page](../pipeline-components/connectors/weaveconnector.mdx) for more details and example usage.
159  :::
160  
161  ### Custom Tracing Backend
162  
163  To use your custom tracing backend with Haystack, follow these steps:
164  
165  1. Implement the `Tracer` interface. The following code snippet provides an example using the OpenTelemetry package:
166  
167     ```python
168     import contextlib
169     from typing import Optional, Dict, Any, Iterator
170  
171     from opentelemetry import trace
172     from opentelemetry.trace import NonRecordingSpan
173  
174     from haystack.tracing import Tracer, Span
175     from haystack.tracing import utils as tracing_utils
176     import opentelemetry.trace
177  
178     class OpenTelemetrySpan(Span):
179        def __init__(self, span: opentelemetry.trace.Span) -> None:
180            self._span = span
181  
182        def set_tag(self, key: str, value: Any) -> None:
183     			 # Tracing backends usually don't support any tag value
184     			 # `coerce_tag_value` forces the value to either be a Python
185     			 # primitive (int, float, boolean, str) or tries to dump it as string.
186            coerced_value = tracing_utils.coerce_tag_value(value)
187            self._span.set_attribute(key, coerced_value)
188  
189     class OpenTelemetryTracer(Tracer):
190        def __init__(self, tracer: opentelemetry.trace.Tracer) -> None:
191            self._tracer = tracer
192  
193        @contextlib.contextmanager
194        def trace(self, operation_name: str, tags: Optional[Dict[str, Any]] = None) -> Iterator[Span]:
195            with self._tracer.start_as_current_span(operation_name) as span:
196                span = OpenTelemetrySpan(span)
197                if tags:
198                    span.set_tags(tags)
199  
200                yield span
201  
202        def current_span(self) -> Optional[Span]:
203            current_span = trace.get_current_span()
204            if isinstance(current_span, NonRecordingSpan):
205                return None
206  
207            return OpenTelemetrySpan(current_span)
208     ```
209  
210  2. Tell Haystack to use your custom tracer:
211  
212     ```python
213     from haystack import tracing
214  
215     haystack_tracer = OpenTelemetryTracer(tracer)
216     tracing.enable_tracing(haystack_tracer)
217     ```
218  
219  ## Disabling Auto Tracing
220  
221  Haystack automatically detects and enables tracing under the following circumstances:
222  
223  - If `opentelemetry-sdk` is installed and configured for OpenTelemetry.
224  - If `ddtrace` is installed for Datadog.
225  
226  To disable this behavior, there are two options:
227  
228  - Set the environment variable `HAYSTACK_AUTO_TRACE_ENABLED` to `false` when running your Haystack application
229  
230  — or —
231  
232  - Disable tracing in Python:
233  
234    ```python
235    from haystack.tracing import disable_tracing
236  
237    disable_tracing()
238    ```
239  
240  ## Content Tracing
241  
242  Haystack also allows you to trace your pipeline components' input and output values. This is useful for investigating your pipeline execution step by step.
243  
244  By default, this behavior is disabled to prevent sensitive user information from being sent to your tracing backend.
245  
246  To enable content tracing, there are two options:
247  
248  - Set the environment variable `HAYSTACK_CONTENT_TRACING_ENABLED` to `true` when running your Haystack application
249  
250  — or —
251  
252  - Explicitly enable content tracing in Python:
253  
254    ```python
255    from haystack import tracing
256  
257    tracing.tracer.is_content_tracing_enabled = True
258    ```
259  
260  ## Visualizing Traces During Development
261  
262  Use [Jaeger](https://www.jaegertracing.io/docs/1.6/getting-started/) as a lightweight tracing backend for local pipeline development. This allows you to experiment with tracing without the need for a complex tracing backend.
263  <ClickableImage src="/img/dd906d7-Screenshot_2024-02-22_at_16.51.01.png" alt="Jaeger UI trace timeline displaying haystack pipeline execution with component spans showing duration and nesting of operations" />
264  
265  1. Run the Jaeger container. This creates a tracing backend as well as a UI to visualize the traces:
266  
267     ```shell
268     docker run --rm -d --name jaeger \
269       -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \
270       -p 6831:6831/udp \
271       -p 6832:6832/udp \
272       -p 5778:5778 \
273       -p 16686:16686 \
274       -p 4317:4317 \
275       -p 4318:4318 \
276       -p 14250:14250 \
277       -p 14268:14268 \
278       -p 14269:14269 \
279       -p 9411:9411 \
280       jaegertracing/all-in-one:latest
281     ```
282  2. Install the OpenTelemetry SDK:
283  
284     ```shell
285     pip install opentelemetry-sdk
286     pip install opentelemetry-exporter-otlp
287     ```
288  3. Configure `OpenTelemetry` to use the Jaeger backend:
289  
290     ```python
291     from opentelemetry.sdk.resources import Resource
292     from opentelemetry.semconv.resource import ResourceAttributes
293  
294     from opentelemetry import trace
295     from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
296     from opentelemetry.sdk.trace import TracerProvider
297     from opentelemetry.sdk.trace.export import BatchSpanProcessor
298  
299     # Service name is required for most backends
300     resource = Resource(attributes={
301         ResourceAttributes.SERVICE_NAME: "haystack"
302     })
303  
304     tracer_provider = TracerProvider(resource=resource)
305     processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="http://localhost:4318/v1/traces"))
306     tracer_provider.add_span_processor(processor)
307     trace.set_tracer_provider(tracer_provider)
308     ```
309  4. Tell Haystack to use OpenTelemetry for tracing:
310  
311     ```python
312     import haystack.tracing
313  
314     haystack.tracing.auto_enable_tracing()
315     ```
316  5. Run your pipeline:
317  
318     ```python
319     ...
320     pipeline.run(...)
321     ...
322     ```
323  6. Inspect the traces in the UI provided by Jaeger at [http://localhost:16686](http://localhost:16686/search).
324  
325  ## Real-Time Pipeline Logging
326  
327  Use Haystack's [`LoggingTracer`](https://github.com/deepset-ai/haystack/blob/main/haystack/tracing/logging_tracer.py) logs to inspect the data that's flowing through your pipeline in real-time.
328  
329  This feature is particularly helpful during experimentation and prototyping, as you don’t need to set up any tracing backend beforehand.
330  
331  Here’s how you can enable this tracer. In this example, we are adding color tags (this is optional) to highlight the components' names and inputs:
332  
333  ```python
334  import logging
335  from haystack import tracing
336  from haystack.tracing.logging_tracer import LoggingTracer
337  
338  logging.basicConfig(
339      format="%(levelname)s - %(name)s -  %(message)s",
340      level=logging.WARNING,
341  )
342  logging.getLogger("haystack").setLevel(logging.DEBUG)
343  
344  tracing.tracer.is_content_tracing_enabled = (
345      True  # to enable tracing/logging content (inputs/outputs)
346  )
347  tracing.enable_tracing(
348      LoggingTracer(
349          tags_color_strings={
350              "haystack.component.input": "\x1b[1;31m",
351              "haystack.component.name": "\x1b[1;34m",
352          },
353      ),
354  )
355  ```
356  
357  Here’s what the resulting log would look like when a pipeline is run:
358  <ClickableImage src="/img/55c3d5c84282d726c95fb3350ec36be49a354edca8a6164f5dffdab7121cec58-image_2.png" alt="Console output showing Haystack pipeline execution with DEBUG level tracing logs including component names, types, and input/output specifications" />