tracing.mdx
  1  ---
  2  title: "Tracing"
  3  id: tracing
  4  slug: "/tracing"
  5  description: "This page explains how to use tracing in Haystack. It describes how to set up a tracing backend with OpenTelemetry, Datadog, or your own solution. This can help you monitor your app's performance and optimize it."
  6  ---
  7  
  8  import ClickableImage from "@site/src/components/ClickableImage";
  9  
 10  # Tracing
 11  
 12  This page explains how to use tracing in Haystack. It describes how to set up a tracing backend with OpenTelemetry, Datadog, or your own solution. This can help you monitor your app's performance and optimize it.
 13  
 14  Traces document the flow of requests through your application and are vital for monitoring applications in production. This helps to understand the execution order of your pipeline components and analyze where your pipeline spends the most time.
 15  
 16  ## Configuring a Tracing Backend
 17  
 18  Instrumented applications typically send traces to a trace collector or a tracing backend. Haystack provides out-of-the-box support for [OpenTelemetry](https://opentelemetry.io/) and [Datadog](https://app.datadoghq.eu/dashboard/lists). You can also quickly implement support for additional providers of your choosing.
 19  
 20  ### OpenTelemetry
 21  
 22  To use OpenTelemetry as your tracing backend, follow these steps:
 23  
 24  1. Install the [OpenTelemetry SDK](https://opentelemetry.io/docs/languages/python/):
 25  
 26     ```shell
 27     pip install opentelemetry-sdk
 28     pip install opentelemetry-exporter-otlp
 29     ```
 30  2. To add traces to even deeper levels of your pipelines, we recommend you check out [OpenTelemetry integrations](https://opentelemetry.io/ecosystem/registry/?s=python), such as:
 31     - [`urllib3` instrumentation](https://github.com/open-telemetry/opentelemetry-python-contrib/tree/main/instrumentation/opentelemetry-instrumentation-urllib3) for tracing HTTP requests in your pipeline,
 32     - [OpenAI instrumentation](https://github.com/traceloop/openllmetry/tree/main/packages/opentelemetry-instrumentation-openai) for tracing OpenAI requests.
 33  3. There are two options for how to hook Haystack to the OpenTelemetry SDK.
 34  
 35     - Run your Haystack applications using OpenTelemetry’s [automated instrumentation](https://opentelemetry.io/docs/languages/python/getting-started/#instrumentation). Haystack will automatically detect the configured tracing backend and use it to send traces.
 36  
 37       First, install the `OpenTelemetry` CLI:
 38  
 39       ```shell
 40       pip install opentelemetry-distro
 41       ```
 42  
 43       Then, run your Haystack application using the OpenTelemetry SDK:
 44  
 45       ```shell
 46       opentelemetry-instrument \
 47           --traces_exporter console \
 48           --metrics_exporter console \
 49           --logs_exporter console \
 50           --service_name my-haystack-app \
 51           <command to run your Haystack pipeline>
 52       ```
 53  
 54     — or —
 55  
 56     - Configure the tracing backend in your Python code:
 57  
 58       ```python
 59       from haystack import tracing
 60  
 61       from opentelemetry import trace
 62       from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
 63       from opentelemetry.sdk.trace import TracerProvider
 64       from opentelemetry.sdk.trace.export import BatchSpanProcessor
 65       from opentelemetry.sdk.resources import Resource
 66       from opentelemetry.semconv.resource import ResourceAttributes
 67  
 68       # Service name is required for most backends
 69       resource = Resource(attributes={
 70           ResourceAttributes.SERVICE_NAME: "haystack"  # Correct constant
 71       })
 72  
 73       tracer_provider = TracerProvider(resource=resource)
 74       processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="http://localhost:4318/v1/traces"))
 75       tracer_provider.add_span_processor(processor)
 76       trace.set_tracer_provider(tracer_provider)
 77  
 78       # Tell Haystack to auto-detect the configured tracer
 79       import haystack.tracing
 80       haystack.tracing.auto_enable_tracing()
 81  
 82       # Explicitly tell Haystack to use your tracer
 83       from haystack.tracing import OpenTelemetryTracer
 84  
 85       tracer = tracer_provider.get_tracer("my_application")
 86       tracing.enable_tracing(OpenTelemetryTracer(tracer))
 87       ```
 88  
 89  ### Datadog
 90  
 91  To use Datadog as your tracing backend, follow these steps:
 92  
 93  1. Install [Datadog’s tracing library ddtrace](https://ddtrace.readthedocs.io/en/stable/#).
 94  
 95     ```shell
 96     pip install ddtrace
 97     ```
 98  2. There are two options for how to hook Haystack to ddtrace.
 99  
100     - Run your Haystack application using the `ddtrace`:
101       ```shell
102       ddtrace <command to run your Haystack pipeline
103       ```
104  
105     — or —
106  
107     - Configure the Datadog tracing backend in your Python code:
108  
109       ```python
110       from haystack.tracing import DatadogTracer
111       from haystack import tracing
112       import ddtrace
113  
114       tracer = ddtrace.tracer
115       tracing.enable_tracing(DatadogTracer(tracer))
116       ```
117  
118  ### Langfuse
119  
120  `LangfuseConnector` component allows you to easily trace your Haystack pipelines with the Langfuse UI.
121  
122  Simply install the component with `pip install langfuse-haystack`, then add it to your pipeline.
123  
124  :::note
125  Check out the component's [documentation page](../pipeline-components/connectors/langfuseconnector.mdx) for more details and example usage, or our [blog post](https://haystack.deepset.ai/blog/langfuse-integration) for the complete walkthrough.
126  
127  :::
128  
129  <ClickableImage src="/img/11cec4f-langfuse-generation-span.png" alt="Langfuse trace detail view showing generation span with input prompt, output, metadata, latency, and cost information for a language model call" />
130  
131  ### Weights & Biases Weave
132  
133  The `WeaveConnector` component allows you to trace and visualize your pipeline execution in [Weights & Biases](https://wandb.ai/site/) framework.
134  
135  You will first need to create a free account on Weights & Biases website and get your API key, as well as install the integration with `pip install weights_biases-haystack`.
136  
137  :::note
138  Check out the component's [documentation page](../pipeline-components/connectors/weaveconnector.mdx) for more details and example usage.
139  
140  :::
141  
142  ### Custom Tracing Backend
143  
144  To use your custom tracing backend with Haystack, follow these steps:
145  
146  1. Implement the `Tracer` interface. The following code snippet provides an example using the OpenTelemetry package:
147  
148     ```python
149     import contextlib
150     from typing import Optional, Dict, Any, Iterator
151  
152     from opentelemetry import trace
153     from opentelemetry.trace import NonRecordingSpan
154  
155     from haystack.tracing import Tracer, Span
156     from haystack.tracing import utils as tracing_utils
157     import opentelemetry.trace
158  
159     class OpenTelemetrySpan(Span):
160        def __init__(self, span: opentelemetry.trace.Span) -> None:
161            self._span = span
162  
163        def set_tag(self, key: str, value: Any) -> None:
164     			 # Tracing backends usually don't support any tag value
165     			 # `coerce_tag_value` forces the value to either be a Python
166     			 # primitive (int, float, boolean, str) or tries to dump it as string.
167            coerced_value = tracing_utils.coerce_tag_value(value)
168            self._span.set_attribute(key, coerced_value)
169  
170     class OpenTelemetryTracer(Tracer):
171        def __init__(self, tracer: opentelemetry.trace.Tracer) -> None:
172            self._tracer = tracer
173  
174        @contextlib.contextmanager
175        def trace(self, operation_name: str, tags: Optional[Dict[str, Any]] = None) -> Iterator[Span]:
176            with self._tracer.start_as_current_span(operation_name) as span:
177                span = OpenTelemetrySpan(span)
178                if tags:
179                    span.set_tags(tags)
180  
181                yield span
182  
183        def current_span(self) -> Optional[Span]:
184            current_span = trace.get_current_span()
185            if isinstance(current_span, NonRecordingSpan):
186                return None
187  
188            return OpenTelemetrySpan(current_span)
189     ```
190  
191  2. Tell Haystack to use your custom tracer:
192  
193     ```python
194     from haystack import tracing
195  
196     haystack_tracer = OpenTelemetryTracer(tracer)
197     tracing.enable_tracing(haystack_tracer)
198     ```
199  
200  ## Disabling Auto Tracing
201  
202  Haystack automatically detects and enables tracing under the following circumstances:
203  
204  - If `opentelemetry-sdk` is installed and configured for OpenTelemetry.
205  - If `ddtrace` is installed for Datadog.
206  
207  To disable this behavior, there are two options:
208  
209  - Set the environment variable `HAYSTACK_AUTO_TRACE_ENABLED` to `false` when running your Haystack application
210  
211  — or —
212  
213  - Disable tracing in Python:
214  
215    ```python
216    from haystack.tracing import disable_tracing
217  
218    disable_tracing()
219    ```
220  
221  ## Content Tracing
222  
223  Haystack also allows you to trace your pipeline components' input and output values. This is useful for investigating your pipeline execution step by step.
224  
225  By default, this behavior is disabled to prevent sensitive user information from being sent to your tracing backend.
226  
227  To enable content tracing, there are two options:
228  
229  - Set the environment variable `HAYSTACK_CONTENT_TRACING_ENABLED` to `true` when running your Haystack application
230  
231  — or —
232  
233  - Explicitly enable content tracing in Python:
234  
235    ```python
236    from haystack import tracing
237  
238    tracing.tracer.is_content_tracing_enabled = True
239    ```
240  
241  ## Visualizing Traces During Development
242  
243  Use [Jaeger](https://www.jaegertracing.io/docs/1.6/getting-started/) as a lightweight tracing backend for local pipeline development. This allows you to experiment with tracing without the need for a complex tracing backend.
244  
245  <ClickableImage src="/img/dd906d7-Screenshot_2024-02-22_at_16.51.01.png" alt="Jaeger UI trace timeline displaying haystack pipeline execution with component spans showing duration and nesting of operations" />
246  
247  1. Run the Jaeger container. This creates a tracing backend as well as a UI to visualize the traces:
248  
249     ```shell
250     docker run --rm -d --name jaeger \
251       -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \
252       -p 6831:6831/udp \
253       -p 6832:6832/udp \
254       -p 5778:5778 \
255       -p 16686:16686 \
256       -p 4317:4317 \
257       -p 4318:4318 \
258       -p 14250:14250 \
259       -p 14268:14268 \
260       -p 14269:14269 \
261       -p 9411:9411 \
262       jaegertracing/all-in-one:latest
263     ```
264  2. Install the OpenTelemetry SDK:
265  
266     ```shell
267     pip install opentelemetry-sdk
268     pip install opentelemetry-exporter-otlp
269     ```
270  3. Configure `OpenTelemetry` to use the Jaeger backend:
271  
272     ```python
273     from opentelemetry.sdk.resources import Resource
274     from opentelemetry.semconv.resource import ResourceAttributes
275  
276     from opentelemetry import trace
277     from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
278     from opentelemetry.sdk.trace import TracerProvider
279     from opentelemetry.sdk.trace.export import BatchSpanProcessor
280  
281     # Service name is required for most backends
282     resource = Resource(attributes={
283         ResourceAttributes.SERVICE_NAME: "haystack"
284     })
285  
286     tracer_provider = TracerProvider(resource=resource)
287     processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="http://localhost:4318/v1/traces"))
288     tracer_provider.add_span_processor(processor)
289     trace.set_tracer_provider(tracer_provider)
290     ```
291  4. Tell Haystack to use OpenTelemetry for tracing:
292  
293     ```python
294     import haystack.tracing
295  
296     haystack.tracing.auto_enable_tracing()
297     ```
298  5. Run your pipeline:
299  
300     ```python
301     ...
302     pipeline.run(...)
303     ...
304     ```
305  6. Inspect the traces in the UI provided by Jaeger at [http://localhost:16686](http://localhost:16686/search).
306  
307  ## Real-Time Pipeline Logging
308  
309  Use Haystack's [`LoggingTracer`](https://github.com/deepset-ai/haystack/blob/main/haystack/tracing/logging_tracer.py) logs to inspect the data that's flowing through your pipeline in real-time.
310  
311  This feature is particularly helpful during experimentation and prototyping, as you don’t need to set up any tracing backend beforehand.
312  
313  Here’s how you can enable this tracer. In this example, we are adding color tags (this is optional) to highlight the components' names and inputs:
314  
315  ```python
316  import logging
317  from haystack import tracing
318  from haystack.tracing.logging_tracer import LoggingTracer
319  
320  logging.basicConfig(
321      format="%(levelname)s - %(name)s -  %(message)s",
322      level=logging.WARNING,
323  )
324  logging.getLogger("haystack").setLevel(logging.DEBUG)
325  
326  tracing.tracer.is_content_tracing_enabled = (
327      True  # to enable tracing/logging content (inputs/outputs)
328  )
329  tracing.enable_tracing(
330      LoggingTracer(
331          tags_color_strings={
332              "haystack.component.input": "\x1b[1;31m",
333              "haystack.component.name": "\x1b[1;34m",
334          },
335      ),
336  )
337  ```
338  
339  Here’s what the resulting log would look like when a pipeline is run:
340  
341  <ClickableImage src="/img/55c3d5c84282d726c95fb3350ec36be49a354edca8a6164f5dffdab7121cec58-image_2.png" alt="Console output showing Haystack pipeline execution with DEBUG level tracing logs including component names, types, and input/output specifications" />