tracing.mdx
1 --- 2 title: "Tracing" 3 id: tracing 4 slug: "/tracing" 5 description: "This page explains how to use tracing in Haystack. It describes how to set up a tracing backend with OpenTelemetry, Datadog, or your own solution. This can help you monitor your app's performance and optimize it." 6 --- 7 8 import ClickableImage from "@site/src/components/ClickableImage"; 9 10 # Tracing 11 12 This page explains how to use tracing in Haystack. It describes how to set up a tracing backend with OpenTelemetry, Datadog, or your own solution. This can help you monitor your app's performance and optimize it. 13 14 Traces document the flow of requests through your application and are vital for monitoring applications in production. This helps to understand the execution order of your pipeline components and analyze where your pipeline spends the most time. 15 16 ## Configuring a Tracing Backend 17 18 Instrumented applications typically send traces to a trace collector or a tracing backend. Haystack provides out-of-the-box support for [OpenTelemetry](https://opentelemetry.io/) and [Datadog](https://app.datadoghq.eu/dashboard/lists). You can also quickly implement support for additional providers of your choosing. 19 20 ### OpenTelemetry 21 22 To use OpenTelemetry as your tracing backend, follow these steps: 23 24 1. Install the [OpenTelemetry SDK](https://opentelemetry.io/docs/languages/python/): 25 26 ```shell 27 pip install opentelemetry-sdk 28 pip install opentelemetry-exporter-otlp 29 ``` 30 2. To add traces to even deeper levels of your pipelines, we recommend you check out [OpenTelemetry integrations](https://opentelemetry.io/ecosystem/registry/?s=python), such as: 31 - [`urllib3` instrumentation](https://github.com/open-telemetry/opentelemetry-python-contrib/tree/main/instrumentation/opentelemetry-instrumentation-urllib3) for tracing HTTP requests in your pipeline, 32 - [OpenAI instrumentation](https://github.com/traceloop/openllmetry/tree/main/packages/opentelemetry-instrumentation-openai) for tracing OpenAI requests. 33 3. There are two options for how to hook Haystack to the OpenTelemetry SDK. 34 35 - Run your Haystack applications using OpenTelemetry’s [automated instrumentation](https://opentelemetry.io/docs/languages/python/getting-started/#instrumentation). Haystack will automatically detect the configured tracing backend and use it to send traces. 36 37 First, install the `OpenTelemetry` CLI: 38 39 ```shell 40 pip install opentelemetry-distro 41 ``` 42 43 Then, run your Haystack application using the OpenTelemetry SDK: 44 45 ```shell 46 opentelemetry-instrument \ 47 --traces_exporter console \ 48 --metrics_exporter console \ 49 --logs_exporter console \ 50 --service_name my-haystack-app \ 51 <command to run your Haystack pipeline> 52 ``` 53 54 — or — 55 56 - Configure the tracing backend in your Python code: 57 58 ```python 59 from haystack import tracing 60 61 from opentelemetry import trace 62 from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter 63 from opentelemetry.sdk.trace import TracerProvider 64 from opentelemetry.sdk.trace.export import BatchSpanProcessor 65 from opentelemetry.sdk.resources import Resource 66 from opentelemetry.semconv.resource import ResourceAttributes 67 68 # Service name is required for most backends 69 resource = Resource(attributes={ 70 ResourceAttributes.SERVICE_NAME: "haystack" # Correct constant 71 }) 72 73 tracer_provider = TracerProvider(resource=resource) 74 processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="http://localhost:4318/v1/traces")) 75 tracer_provider.add_span_processor(processor) 76 trace.set_tracer_provider(tracer_provider) 77 78 # Tell Haystack to auto-detect the configured tracer 79 import haystack.tracing 80 haystack.tracing.auto_enable_tracing() 81 82 # Explicitly tell Haystack to use your tracer 83 from haystack.tracing import OpenTelemetryTracer 84 85 tracer = tracer_provider.get_tracer("my_application") 86 tracing.enable_tracing(OpenTelemetryTracer(tracer)) 87 ``` 88 89 ### Datadog 90 91 To use Datadog as your tracing backend, follow these steps: 92 93 1. Install [Datadog’s tracing library ddtrace](https://ddtrace.readthedocs.io/en/stable/#). 94 95 ```shell 96 pip install ddtrace 97 ``` 98 2. There are two options for how to hook Haystack to ddtrace. 99 100 - Run your Haystack application using the `ddtrace`: 101 ```shell 102 ddtrace <command to run your Haystack pipeline 103 ``` 104 105 — or — 106 107 - Configure the Datadog tracing backend in your Python code: 108 109 ```python 110 from haystack.tracing import DatadogTracer 111 from haystack import tracing 112 import ddtrace 113 114 tracer = ddtrace.tracer 115 tracing.enable_tracing(DatadogTracer(tracer)) 116 ``` 117 118 ### Langfuse 119 120 `LangfuseConnector` component allows you to easily trace your Haystack pipelines with the Langfuse UI. 121 122 Simply install the component with `pip install langfuse-haystack`, then add it to your pipeline. 123 124 :::info 125 Check out the component's [documentation page](../pipeline-components/connectors/langfuseconnector.mdx) for more details and example usage, or our [blog post](https://haystack.deepset.ai/blog/langfuse-integration) for the complete walkthrough. 126 ::: 127 <ClickableImage src="/img/11cec4f-langfuse-generation-span.png" alt="Langfuse trace detail view showing generation span with input prompt, output, metadata, latency, and cost information for a language model call" /> 128 129 ### MLflow 130 131 [MLflow](https://mlflow.org/) is an open-source platform for managing the end-to-end machine learning and AI lifecycle. MLflow provides native tracing support for Haystack. Simply install MLflow and enable automatic tracing with a single line of code. 132 133 ```shell 134 pip install mlflow 135 ``` 136 137 ```python 138 import mlflow 139 140 mlflow.haystack.autolog() 141 # Optionally set an experiment name 142 mlflow.set_experiment("Haystack") 143 ``` 144 145 This automatically captures traces from all Haystack pipelines and components, including latencies, token usage, cost, and any exceptions. 146 147 :::info 148 Check out the [MLflow Haystack integration guide](https://haystack.deepset.ai/integrations/mlflow) for a full walkthrough with examples. 149 ::: 150 151 ### Weights & Biases Weave 152 153 The `WeaveConnector` component allows you to trace and visualize your pipeline execution in [Weights & Biases](https://wandb.ai/site/) framework. 154 155 You will first need to create a free account on Weights & Biases website and get your API key, as well as install the integration with `pip install weights_biases-haystack`. 156 157 :::info 158 Check out the component's [documentation page](../pipeline-components/connectors/weaveconnector.mdx) for more details and example usage. 159 ::: 160 161 ### Custom Tracing Backend 162 163 To use your custom tracing backend with Haystack, follow these steps: 164 165 1. Implement the `Tracer` interface. The following code snippet provides an example using the OpenTelemetry package: 166 167 ```python 168 import contextlib 169 from typing import Optional, Dict, Any, Iterator 170 171 from opentelemetry import trace 172 from opentelemetry.trace import NonRecordingSpan 173 174 from haystack.tracing import Tracer, Span 175 from haystack.tracing import utils as tracing_utils 176 import opentelemetry.trace 177 178 class OpenTelemetrySpan(Span): 179 def __init__(self, span: opentelemetry.trace.Span) -> None: 180 self._span = span 181 182 def set_tag(self, key: str, value: Any) -> None: 183 # Tracing backends usually don't support any tag value 184 # `coerce_tag_value` forces the value to either be a Python 185 # primitive (int, float, boolean, str) or tries to dump it as string. 186 coerced_value = tracing_utils.coerce_tag_value(value) 187 self._span.set_attribute(key, coerced_value) 188 189 class OpenTelemetryTracer(Tracer): 190 def __init__(self, tracer: opentelemetry.trace.Tracer) -> None: 191 self._tracer = tracer 192 193 @contextlib.contextmanager 194 def trace(self, operation_name: str, tags: Optional[Dict[str, Any]] = None) -> Iterator[Span]: 195 with self._tracer.start_as_current_span(operation_name) as span: 196 span = OpenTelemetrySpan(span) 197 if tags: 198 span.set_tags(tags) 199 200 yield span 201 202 def current_span(self) -> Optional[Span]: 203 current_span = trace.get_current_span() 204 if isinstance(current_span, NonRecordingSpan): 205 return None 206 207 return OpenTelemetrySpan(current_span) 208 ``` 209 210 2. Tell Haystack to use your custom tracer: 211 212 ```python 213 from haystack import tracing 214 215 haystack_tracer = OpenTelemetryTracer(tracer) 216 tracing.enable_tracing(haystack_tracer) 217 ``` 218 219 ## Disabling Auto Tracing 220 221 Haystack automatically detects and enables tracing under the following circumstances: 222 223 - If `opentelemetry-sdk` is installed and configured for OpenTelemetry. 224 - If `ddtrace` is installed for Datadog. 225 226 To disable this behavior, there are two options: 227 228 - Set the environment variable `HAYSTACK_AUTO_TRACE_ENABLED` to `false` when running your Haystack application 229 230 — or — 231 232 - Disable tracing in Python: 233 234 ```python 235 from haystack.tracing import disable_tracing 236 237 disable_tracing() 238 ``` 239 240 ## Content Tracing 241 242 Haystack also allows you to trace your pipeline components' input and output values. This is useful for investigating your pipeline execution step by step. 243 244 By default, this behavior is disabled to prevent sensitive user information from being sent to your tracing backend. 245 246 To enable content tracing, there are two options: 247 248 - Set the environment variable `HAYSTACK_CONTENT_TRACING_ENABLED` to `true` when running your Haystack application 249 250 — or — 251 252 - Explicitly enable content tracing in Python: 253 254 ```python 255 from haystack import tracing 256 257 tracing.tracer.is_content_tracing_enabled = True 258 ``` 259 260 ## Visualizing Traces During Development 261 262 Use [Jaeger](https://www.jaegertracing.io/docs/1.6/getting-started/) as a lightweight tracing backend for local pipeline development. This allows you to experiment with tracing without the need for a complex tracing backend. 263 <ClickableImage src="/img/dd906d7-Screenshot_2024-02-22_at_16.51.01.png" alt="Jaeger UI trace timeline displaying haystack pipeline execution with component spans showing duration and nesting of operations" /> 264 265 1. Run the Jaeger container. This creates a tracing backend as well as a UI to visualize the traces: 266 267 ```shell 268 docker run --rm -d --name jaeger \ 269 -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \ 270 -p 6831:6831/udp \ 271 -p 6832:6832/udp \ 272 -p 5778:5778 \ 273 -p 16686:16686 \ 274 -p 4317:4317 \ 275 -p 4318:4318 \ 276 -p 14250:14250 \ 277 -p 14268:14268 \ 278 -p 14269:14269 \ 279 -p 9411:9411 \ 280 jaegertracing/all-in-one:latest 281 ``` 282 2. Install the OpenTelemetry SDK: 283 284 ```shell 285 pip install opentelemetry-sdk 286 pip install opentelemetry-exporter-otlp 287 ``` 288 3. Configure `OpenTelemetry` to use the Jaeger backend: 289 290 ```python 291 from opentelemetry.sdk.resources import Resource 292 from opentelemetry.semconv.resource import ResourceAttributes 293 294 from opentelemetry import trace 295 from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter 296 from opentelemetry.sdk.trace import TracerProvider 297 from opentelemetry.sdk.trace.export import BatchSpanProcessor 298 299 # Service name is required for most backends 300 resource = Resource(attributes={ 301 ResourceAttributes.SERVICE_NAME: "haystack" 302 }) 303 304 tracer_provider = TracerProvider(resource=resource) 305 processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="http://localhost:4318/v1/traces")) 306 tracer_provider.add_span_processor(processor) 307 trace.set_tracer_provider(tracer_provider) 308 ``` 309 4. Tell Haystack to use OpenTelemetry for tracing: 310 311 ```python 312 import haystack.tracing 313 314 haystack.tracing.auto_enable_tracing() 315 ``` 316 5. Run your pipeline: 317 318 ```python 319 ... 320 pipeline.run(...) 321 ... 322 ``` 323 6. Inspect the traces in the UI provided by Jaeger at [http://localhost:16686](http://localhost:16686/search). 324 325 ## Real-Time Pipeline Logging 326 327 Use Haystack's [`LoggingTracer`](https://github.com/deepset-ai/haystack/blob/main/haystack/tracing/logging_tracer.py) logs to inspect the data that's flowing through your pipeline in real-time. 328 329 This feature is particularly helpful during experimentation and prototyping, as you don’t need to set up any tracing backend beforehand. 330 331 Here’s how you can enable this tracer. In this example, we are adding color tags (this is optional) to highlight the components' names and inputs: 332 333 ```python 334 import logging 335 from haystack import tracing 336 from haystack.tracing.logging_tracer import LoggingTracer 337 338 logging.basicConfig( 339 format="%(levelname)s - %(name)s - %(message)s", 340 level=logging.WARNING, 341 ) 342 logging.getLogger("haystack").setLevel(logging.DEBUG) 343 344 tracing.tracer.is_content_tracing_enabled = ( 345 True # to enable tracing/logging content (inputs/outputs) 346 ) 347 tracing.enable_tracing( 348 LoggingTracer( 349 tags_color_strings={ 350 "haystack.component.input": "\x1b[1;31m", 351 "haystack.component.name": "\x1b[1;34m", 352 }, 353 ), 354 ) 355 ``` 356 357 Here’s what the resulting log would look like when a pipeline is run: 358 <ClickableImage src="/img/55c3d5c84282d726c95fb3350ec36be49a354edca8a6164f5dffdab7121cec58-image_2.png" alt="Console output showing Haystack pipeline execution with DEBUG level tracing logs including component names, types, and input/output specifications" />