Cradicle Explorer

/ docs-website / versioned_docs / version-2.21 / development / tracing.mdx
tracing.mdx
  1  ---
  2  title: "Tracing"
  3  id: tracing
  4  slug: "/tracing"
  5  description: "This page explains how to use tracing in Haystack. It describes how to set up a tracing backend with OpenTelemetry, Datadog, or your own solution. This can help you monitor your app's performance and optimize it."
  6  ---
  7  
  8  import ClickableImage from "@site/src/components/ClickableImage";
  9  
 10  # Tracing
 11  
 12  This page explains how to use tracing in Haystack. It describes how to set up a tracing backend with OpenTelemetry, Datadog, or your own solution. This can help you monitor your app's performance and optimize it.
 13  
 14  Traces document the flow of requests through your application and are vital for monitoring applications in production. This helps to understand the execution order of your pipeline components and analyze where your pipeline spends the most time.
 15  
 16  ## Configuring a Tracing Backend
 17  
 18  Instrumented applications typically send traces to a trace collector or a tracing backend. Haystack provides out-of-the-box support for [OpenTelemetry](https://opentelemetry.io/) and [Datadog](https://app.datadoghq.eu/dashboard/lists). You can also quickly implement support for additional providers of your choosing.
 19  
 20  ### OpenTelemetry
 21  
 22  To use OpenTelemetry as your tracing backend, follow these steps:
 23  
 24  1. Install the [OpenTelemetry SDK](https://opentelemetry.io/docs/languages/python/):
 25  
 26     ```shell
 27     pip install opentelemetry-sdk
 28     pip install opentelemetry-exporter-otlp
 29     ```
 30  2. To add traces to even deeper levels of your pipelines, we recommend you check out [OpenTelemetry integrations](https://opentelemetry.io/ecosystem/registry/?s=python), such as:
 31     - [`urllib3` instrumentation](https://github.com/open-telemetry/opentelemetry-python-contrib/tree/main/instrumentation/opentelemetry-instrumentation-urllib3) for tracing HTTP requests in your pipeline,
 32     - [OpenAI instrumentation](https://github.com/traceloop/openllmetry/tree/main/packages/opentelemetry-instrumentation-openai) for tracing OpenAI requests.
 33  3. There are two options for how to hook Haystack to the OpenTelemetry SDK.
 34  
 35     - Run your Haystack applications using OpenTelemetry’s [automated instrumentation](https://opentelemetry.io/docs/languages/python/getting-started/#instrumentation). Haystack will automatically detect the configured tracing backend and use it to send traces.
 36  
 37       First, install the `OpenTelemetry` CLI:
 38  
 39       ```shell
 40       pip install opentelemetry-distro
 41       ```
 42  
 43       Then, run your Haystack application using the OpenTelemetry SDK:
 44  
 45       ```shell
 46       opentelemetry-instrument \
 47           --traces_exporter console \
 48           --metrics_exporter console \
 49           --logs_exporter console \
 50           --service_name my-haystack-app \
 51           <command to run your Haystack pipeline>
 52       ```
 53  
 54     — or —
 55  
 56     - Configure the tracing backend in your Python code:
 57  
 58       ```python
 59       from haystack import tracing
 60  
 61       from opentelemetry import trace
 62       from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
 63       from opentelemetry.sdk.trace import TracerProvider
 64       from opentelemetry.sdk.trace.export import BatchSpanProcessor
 65       from opentelemetry.sdk.resources import Resource
 66       from opentelemetry.semconv.resource import ResourceAttributes
 67  
 68       # Service name is required for most backends
 69       resource = Resource(attributes={
 70           ResourceAttributes.SERVICE_NAME: "haystack"  # Correct constant
 71       })
 72  
 73       tracer_provider = TracerProvider(resource=resource)
 74       processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="http://localhost:4318/v1/traces"))
 75       tracer_provider.add_span_processor(processor)
 76       trace.set_tracer_provider(tracer_provider)
 77  
 78       # Tell Haystack to auto-detect the configured tracer
 79       import haystack.tracing
 80       haystack.tracing.auto_enable_tracing()
 81  
 82       # Explicitly tell Haystack to use your tracer
 83       from haystack.tracing import OpenTelemetryTracer
 84  
 85       tracer = tracer_provider.get_tracer("my_application")
 86       tracing.enable_tracing(OpenTelemetryTracer(tracer))
 87       ```
 88  
 89  ### Datadog
 90  
 91  To use Datadog as your tracing backend, follow these steps:
 92  
 93  1. Install [Datadog’s tracing library ddtrace](https://ddtrace.readthedocs.io/en/stable/#).
 94  
 95     ```shell
 96     pip install ddtrace
 97     ```
 98  2. There are two options for how to hook Haystack to ddtrace.
 99  
100     - Run your Haystack application using the `ddtrace`:
101       ```shell
102       ddtrace <command to run your Haystack pipeline
103       ```
104  
105     — or —
106  
107     - Configure the Datadog tracing backend in your Python code:
108  
109       ```python
110       from haystack.tracing import DatadogTracer
111       from haystack import tracing
112       import ddtrace
113  
114       tracer = ddtrace.tracer
115       tracing.enable_tracing(DatadogTracer(tracer))
116       ```
117  
118  ### Langfuse
119  
120  `LangfuseConnector` component allows you to easily trace your Haystack pipelines with the Langfuse UI.
121  
122  Simply install the component with `pip install langfuse-haystack`, then add it to your pipeline.
123  
124  :::info
125  Check out the component's [documentation page](../pipeline-components/connectors/langfuseconnector.mdx) for more details and example usage, or our [blog post](https://haystack.deepset.ai/blog/langfuse-integration) for the complete walkthrough.
126  :::
127  <ClickableImage src="/img/11cec4f-langfuse-generation-span.png" alt="Langfuse trace detail view showing generation span with input prompt, output, metadata, latency, and cost information for a language model call" />
128  
129  ### Weights & Biases Weave
130  
131  The `WeaveConnector` component allows you to trace and visualize your pipeline execution in [Weights & Biases](https://wandb.ai/site/) framework.
132  
133  You will first need to create a free account on Weights & Biases website and get your API key, as well as install the integration with `pip install weights_biases-haystack`.
134  
135  :::info
136  Check out the component's [documentation page](../pipeline-components/connectors/weaveconnector.mdx) for more details and example usage.
137  :::
138  
139  ### Custom Tracing Backend
140  
141  To use your custom tracing backend with Haystack, follow these steps:
142  
143  1. Implement the `Tracer` interface. The following code snippet provides an example using the OpenTelemetry package:
144  
145     ```python
146     import contextlib
147     from typing import Optional, Dict, Any, Iterator
148  
149     from opentelemetry import trace
150     from opentelemetry.trace import NonRecordingSpan
151  
152     from haystack.tracing import Tracer, Span
153     from haystack.tracing import utils as tracing_utils
154     import opentelemetry.trace
155  
156     class OpenTelemetrySpan(Span):
157        def __init__(self, span: opentelemetry.trace.Span) -> None:
158            self._span = span
159  
160        def set_tag(self, key: str, value: Any) -> None:
161     			 # Tracing backends usually don't support any tag value
162     			 # `coerce_tag_value` forces the value to either be a Python
163     			 # primitive (int, float, boolean, str) or tries to dump it as string.
164            coerced_value = tracing_utils.coerce_tag_value(value)
165            self._span.set_attribute(key, coerced_value)
166  
167     class OpenTelemetryTracer(Tracer):
168        def __init__(self, tracer: opentelemetry.trace.Tracer) -> None:
169            self._tracer = tracer
170  
171        @contextlib.contextmanager
172        def trace(self, operation_name: str, tags: Optional[Dict[str, Any]] = None) -> Iterator[Span]:
173            with self._tracer.start_as_current_span(operation_name) as span:
174                span = OpenTelemetrySpan(span)
175                if tags:
176                    span.set_tags(tags)
177  
178                yield span
179  
180        def current_span(self) -> Optional[Span]:
181            current_span = trace.get_current_span()
182            if isinstance(current_span, NonRecordingSpan):
183                return None
184  
185            return OpenTelemetrySpan(current_span)
186     ```
187  
188  2. Tell Haystack to use your custom tracer:
189  
190     ```python
191     from haystack import tracing
192  
193     haystack_tracer = OpenTelemetryTracer(tracer)
194     tracing.enable_tracing(haystack_tracer)
195     ```
196  
197  ## Disabling Auto Tracing
198  
199  Haystack automatically detects and enables tracing under the following circumstances:
200  
201  - If `opentelemetry-sdk` is installed and configured for OpenTelemetry.
202  - If `ddtrace` is installed for Datadog.
203  
204  To disable this behavior, there are two options:
205  
206  - Set the environment variable `HAYSTACK_AUTO_TRACE_ENABLED` to `false` when running your Haystack application
207  
208  — or —
209  
210  - Disable tracing in Python:
211  
212    ```python
213    from haystack.tracing import disable_tracing
214  
215    disable_tracing()
216    ```
217  
218  ## Content Tracing
219  
220  Haystack also allows you to trace your pipeline components' input and output values. This is useful for investigating your pipeline execution step by step.
221  
222  By default, this behavior is disabled to prevent sensitive user information from being sent to your tracing backend.
223  
224  To enable content tracing, there are two options:
225  
226  - Set the environment variable `HAYSTACK_CONTENT_TRACING_ENABLED` to `true` when running your Haystack application
227  
228  — or —
229  
230  - Explicitly enable content tracing in Python:
231  
232    ```python
233    from haystack import tracing
234  
235    tracing.tracer.is_content_tracing_enabled = True
236    ```
237  
238  ## Visualizing Traces During Development
239  
240  Use [Jaeger](https://www.jaegertracing.io/docs/1.6/getting-started/) as a lightweight tracing backend for local pipeline development. This allows you to experiment with tracing without the need for a complex tracing backend.
241  <ClickableImage src="/img/dd906d7-Screenshot_2024-02-22_at_16.51.01.png" alt="Jaeger UI trace timeline displaying haystack pipeline execution with component spans showing duration and nesting of operations" />
242  
243  1. Run the Jaeger container. This creates a tracing backend as well as a UI to visualize the traces:
244  
245     ```shell
246     docker run --rm -d --name jaeger \
247       -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \
248       -p 6831:6831/udp \
249       -p 6832:6832/udp \
250       -p 5778:5778 \
251       -p 16686:16686 \
252       -p 4317:4317 \
253       -p 4318:4318 \
254       -p 14250:14250 \
255       -p 14268:14268 \
256       -p 14269:14269 \
257       -p 9411:9411 \
258       jaegertracing/all-in-one:latest
259     ```
260  2. Install the OpenTelemetry SDK:
261  
262     ```shell
263     pip install opentelemetry-sdk
264     pip install opentelemetry-exporter-otlp
265     ```
266  3. Configure `OpenTelemetry` to use the Jaeger backend:
267  
268     ```python
269     from opentelemetry.sdk.resources import Resource
270     from opentelemetry.semconv.resource import ResourceAttributes
271  
272     from opentelemetry import trace
273     from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
274     from opentelemetry.sdk.trace import TracerProvider
275     from opentelemetry.sdk.trace.export import BatchSpanProcessor
276  
277     # Service name is required for most backends
278     resource = Resource(attributes={
279         ResourceAttributes.SERVICE_NAME: "haystack"
280     })
281  
282     tracer_provider = TracerProvider(resource=resource)
283     processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="http://localhost:4318/v1/traces"))
284     tracer_provider.add_span_processor(processor)
285     trace.set_tracer_provider(tracer_provider)
286     ```
287  4. Tell Haystack to use OpenTelemetry for tracing:
288  
289     ```python
290     import haystack.tracing
291  
292     haystack.tracing.auto_enable_tracing()
293     ```
294  5. Run your pipeline:
295  
296     ```python
297     ...
298     pipeline.run(...)
299     ...
300     ```
301  6. Inspect the traces in the UI provided by Jaeger at [http://localhost:16686](http://localhost:16686/search).
302  
303  ## Real-Time Pipeline Logging
304  
305  Use Haystack's [`LoggingTracer`](https://github.com/deepset-ai/haystack/blob/main/haystack/tracing/logging_tracer.py) logs to inspect the data that's flowing through your pipeline in real-time.
306  
307  This feature is particularly helpful during experimentation and prototyping, as you don’t need to set up any tracing backend beforehand.
308  
309  Here’s how you can enable this tracer. In this example, we are adding color tags (this is optional) to highlight the components' names and inputs:
310  
311  ```python
312  import logging
313  from haystack import tracing
314  from haystack.tracing.logging_tracer import LoggingTracer
315  
316  logging.basicConfig(
317      format="%(levelname)s - %(name)s -  %(message)s",
318      level=logging.WARNING,
319  )
320  logging.getLogger("haystack").setLevel(logging.DEBUG)
321  
322  tracing.tracer.is_content_tracing_enabled = (
323      True  # to enable tracing/logging content (inputs/outputs)
324  )
325  tracing.enable_tracing(
326      LoggingTracer(
327          tags_color_strings={
328              "haystack.component.input": "\x1b[1;31m",
329              "haystack.component.name": "\x1b[1;34m",
330          },
331      ),
332  )
333  ```
334  
335  Here’s what the resulting log would look like when a pipeline is run:
336  <ClickableImage src="/img/55c3d5c84282d726c95fb3350ec36be49a354edca8a6164f5dffdab7121cec58-image_2.png" alt="Console output showing Haystack pipeline execution with DEBUG level tracing logs including component names, types, and input/output specifications" />