tracing.mdx
1 --- 2 title: "Tracing" 3 id: tracing 4 slug: "/tracing" 5 description: "This page explains how to use tracing in Haystack. It describes how to set up a tracing backend with OpenTelemetry, Datadog, or your own solution. This can help you monitor your app's performance and optimize it." 6 --- 7 8 import ClickableImage from "@site/src/components/ClickableImage"; 9 10 # Tracing 11 12 This page explains how to use tracing in Haystack. It describes how to set up a tracing backend with OpenTelemetry, Datadog, or your own solution. This can help you monitor your app's performance and optimize it. 13 14 Traces document the flow of requests through your application and are vital for monitoring applications in production. This helps to understand the execution order of your pipeline components and analyze where your pipeline spends the most time. 15 16 ## Configuring a Tracing Backend 17 18 Instrumented applications typically send traces to a trace collector or a tracing backend. Haystack provides out-of-the-box support for [OpenTelemetry](https://opentelemetry.io/) and [Datadog](https://app.datadoghq.eu/dashboard/lists). You can also quickly implement support for additional providers of your choosing. 19 20 ### OpenTelemetry 21 22 To use OpenTelemetry as your tracing backend, follow these steps: 23 24 1. Install the [OpenTelemetry SDK](https://opentelemetry.io/docs/languages/python/): 25 26 ```shell 27 pip install opentelemetry-sdk 28 pip install opentelemetry-exporter-otlp 29 ``` 30 2. To add traces to even deeper levels of your pipelines, we recommend you check out [OpenTelemetry integrations](https://opentelemetry.io/ecosystem/registry/?s=python), such as: 31 - [`urllib3` instrumentation](https://github.com/open-telemetry/opentelemetry-python-contrib/tree/main/instrumentation/opentelemetry-instrumentation-urllib3) for tracing HTTP requests in your pipeline, 32 - [OpenAI instrumentation](https://github.com/traceloop/openllmetry/tree/main/packages/opentelemetry-instrumentation-openai) for tracing OpenAI requests. 33 3. There are two options for how to hook Haystack to the OpenTelemetry SDK. 34 35 - Run your Haystack applications using OpenTelemetry’s [automated instrumentation](https://opentelemetry.io/docs/languages/python/getting-started/#instrumentation). Haystack will automatically detect the configured tracing backend and use it to send traces. 36 37 First, install the `OpenTelemetry` CLI: 38 39 ```shell 40 pip install opentelemetry-distro 41 ``` 42 43 Then, run your Haystack application using the OpenTelemetry SDK: 44 45 ```shell 46 opentelemetry-instrument \ 47 --traces_exporter console \ 48 --metrics_exporter console \ 49 --logs_exporter console \ 50 --service_name my-haystack-app \ 51 <command to run your Haystack pipeline> 52 ``` 53 54 — or — 55 56 - Configure the tracing backend in your Python code: 57 58 ```python 59 from haystack import tracing 60 61 from opentelemetry import trace 62 from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter 63 from opentelemetry.sdk.trace import TracerProvider 64 from opentelemetry.sdk.trace.export import BatchSpanProcessor 65 from opentelemetry.sdk.resources import Resource 66 from opentelemetry.semconv.resource import ResourceAttributes 67 68 # Service name is required for most backends 69 resource = Resource(attributes={ 70 ResourceAttributes.SERVICE_NAME: "haystack" # Correct constant 71 }) 72 73 tracer_provider = TracerProvider(resource=resource) 74 processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="http://localhost:4318/v1/traces")) 75 tracer_provider.add_span_processor(processor) 76 trace.set_tracer_provider(tracer_provider) 77 78 # Tell Haystack to auto-detect the configured tracer 79 import haystack.tracing 80 haystack.tracing.auto_enable_tracing() 81 82 # Explicitly tell Haystack to use your tracer 83 from haystack.tracing import OpenTelemetryTracer 84 85 tracer = tracer_provider.get_tracer("my_application") 86 tracing.enable_tracing(OpenTelemetryTracer(tracer)) 87 ``` 88 89 ### Datadog 90 91 To use Datadog as your tracing backend, follow these steps: 92 93 1. Install [Datadog’s tracing library ddtrace](https://ddtrace.readthedocs.io/en/stable/#). 94 95 ```shell 96 pip install ddtrace 97 ``` 98 2. There are two options for how to hook Haystack to ddtrace. 99 100 - Run your Haystack application using the `ddtrace`: 101 ```shell 102 ddtrace <command to run your Haystack pipeline 103 ``` 104 105 — or — 106 107 - Configure the Datadog tracing backend in your Python code: 108 109 ```python 110 from haystack.tracing import DatadogTracer 111 from haystack import tracing 112 import ddtrace 113 114 tracer = ddtrace.tracer 115 tracing.enable_tracing(DatadogTracer(tracer)) 116 ``` 117 118 ### Langfuse 119 120 `LangfuseConnector` component allows you to easily trace your Haystack pipelines with the Langfuse UI. 121 122 Simply install the component with `pip install langfuse-haystack`, then add it to your pipeline. 123 124 :::info 125 Check out the component's [documentation page](../pipeline-components/connectors/langfuseconnector.mdx) for more details and example usage, or our [blog post](https://haystack.deepset.ai/blog/langfuse-integration) for the complete walkthrough. 126 ::: 127 <ClickableImage src="/img/11cec4f-langfuse-generation-span.png" alt="Langfuse trace detail view showing generation span with input prompt, output, metadata, latency, and cost information for a language model call" /> 128 129 ### Weights & Biases Weave 130 131 The `WeaveConnector` component allows you to trace and visualize your pipeline execution in [Weights & Biases](https://wandb.ai/site/) framework. 132 133 You will first need to create a free account on Weights & Biases website and get your API key, as well as install the integration with `pip install weights_biases-haystack`. 134 135 :::info 136 Check out the component's [documentation page](../pipeline-components/connectors/weaveconnector.mdx) for more details and example usage. 137 ::: 138 139 ### Custom Tracing Backend 140 141 To use your custom tracing backend with Haystack, follow these steps: 142 143 1. Implement the `Tracer` interface. The following code snippet provides an example using the OpenTelemetry package: 144 145 ```python 146 import contextlib 147 from typing import Optional, Dict, Any, Iterator 148 149 from opentelemetry import trace 150 from opentelemetry.trace import NonRecordingSpan 151 152 from haystack.tracing import Tracer, Span 153 from haystack.tracing import utils as tracing_utils 154 import opentelemetry.trace 155 156 class OpenTelemetrySpan(Span): 157 def __init__(self, span: opentelemetry.trace.Span) -> None: 158 self._span = span 159 160 def set_tag(self, key: str, value: Any) -> None: 161 # Tracing backends usually don't support any tag value 162 # `coerce_tag_value` forces the value to either be a Python 163 # primitive (int, float, boolean, str) or tries to dump it as string. 164 coerced_value = tracing_utils.coerce_tag_value(value) 165 self._span.set_attribute(key, coerced_value) 166 167 class OpenTelemetryTracer(Tracer): 168 def __init__(self, tracer: opentelemetry.trace.Tracer) -> None: 169 self._tracer = tracer 170 171 @contextlib.contextmanager 172 def trace(self, operation_name: str, tags: Optional[Dict[str, Any]] = None) -> Iterator[Span]: 173 with self._tracer.start_as_current_span(operation_name) as span: 174 span = OpenTelemetrySpan(span) 175 if tags: 176 span.set_tags(tags) 177 178 yield span 179 180 def current_span(self) -> Optional[Span]: 181 current_span = trace.get_current_span() 182 if isinstance(current_span, NonRecordingSpan): 183 return None 184 185 return OpenTelemetrySpan(current_span) 186 ``` 187 188 2. Tell Haystack to use your custom tracer: 189 190 ```python 191 from haystack import tracing 192 193 haystack_tracer = OpenTelemetryTracer(tracer) 194 tracing.enable_tracing(haystack_tracer) 195 ``` 196 197 ## Disabling Auto Tracing 198 199 Haystack automatically detects and enables tracing under the following circumstances: 200 201 - If `opentelemetry-sdk` is installed and configured for OpenTelemetry. 202 - If `ddtrace` is installed for Datadog. 203 204 To disable this behavior, there are two options: 205 206 - Set the environment variable `HAYSTACK_AUTO_TRACE_ENABLED` to `false` when running your Haystack application 207 208 — or — 209 210 - Disable tracing in Python: 211 212 ```python 213 from haystack.tracing import disable_tracing 214 215 disable_tracing() 216 ``` 217 218 ## Content Tracing 219 220 Haystack also allows you to trace your pipeline components' input and output values. This is useful for investigating your pipeline execution step by step. 221 222 By default, this behavior is disabled to prevent sensitive user information from being sent to your tracing backend. 223 224 To enable content tracing, there are two options: 225 226 - Set the environment variable `HAYSTACK_CONTENT_TRACING_ENABLED` to `true` when running your Haystack application 227 228 — or — 229 230 - Explicitly enable content tracing in Python: 231 232 ```python 233 from haystack import tracing 234 235 tracing.tracer.is_content_tracing_enabled = True 236 ``` 237 238 ## Visualizing Traces During Development 239 240 Use [Jaeger](https://www.jaegertracing.io/docs/1.6/getting-started/) as a lightweight tracing backend for local pipeline development. This allows you to experiment with tracing without the need for a complex tracing backend. 241 <ClickableImage src="/img/dd906d7-Screenshot_2024-02-22_at_16.51.01.png" alt="Jaeger UI trace timeline displaying haystack pipeline execution with component spans showing duration and nesting of operations" /> 242 243 1. Run the Jaeger container. This creates a tracing backend as well as a UI to visualize the traces: 244 245 ```shell 246 docker run --rm -d --name jaeger \ 247 -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \ 248 -p 6831:6831/udp \ 249 -p 6832:6832/udp \ 250 -p 5778:5778 \ 251 -p 16686:16686 \ 252 -p 4317:4317 \ 253 -p 4318:4318 \ 254 -p 14250:14250 \ 255 -p 14268:14268 \ 256 -p 14269:14269 \ 257 -p 9411:9411 \ 258 jaegertracing/all-in-one:latest 259 ``` 260 2. Install the OpenTelemetry SDK: 261 262 ```shell 263 pip install opentelemetry-sdk 264 pip install opentelemetry-exporter-otlp 265 ``` 266 3. Configure `OpenTelemetry` to use the Jaeger backend: 267 268 ```python 269 from opentelemetry.sdk.resources import Resource 270 from opentelemetry.semconv.resource import ResourceAttributes 271 272 from opentelemetry import trace 273 from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter 274 from opentelemetry.sdk.trace import TracerProvider 275 from opentelemetry.sdk.trace.export import BatchSpanProcessor 276 277 # Service name is required for most backends 278 resource = Resource(attributes={ 279 ResourceAttributes.SERVICE_NAME: "haystack" 280 }) 281 282 tracer_provider = TracerProvider(resource=resource) 283 processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="http://localhost:4318/v1/traces")) 284 tracer_provider.add_span_processor(processor) 285 trace.set_tracer_provider(tracer_provider) 286 ``` 287 4. Tell Haystack to use OpenTelemetry for tracing: 288 289 ```python 290 import haystack.tracing 291 292 haystack.tracing.auto_enable_tracing() 293 ``` 294 5. Run your pipeline: 295 296 ```python 297 ... 298 pipeline.run(...) 299 ... 300 ``` 301 6. Inspect the traces in the UI provided by Jaeger at [http://localhost:16686](http://localhost:16686/search). 302 303 ## Real-Time Pipeline Logging 304 305 Use Haystack's [`LoggingTracer`](https://github.com/deepset-ai/haystack/blob/main/haystack/tracing/logging_tracer.py) logs to inspect the data that's flowing through your pipeline in real-time. 306 307 This feature is particularly helpful during experimentation and prototyping, as you don’t need to set up any tracing backend beforehand. 308 309 Here’s how you can enable this tracer. In this example, we are adding color tags (this is optional) to highlight the components' names and inputs: 310 311 ```python 312 import logging 313 from haystack import tracing 314 from haystack.tracing.logging_tracer import LoggingTracer 315 316 logging.basicConfig( 317 format="%(levelname)s - %(name)s - %(message)s", 318 level=logging.WARNING, 319 ) 320 logging.getLogger("haystack").setLevel(logging.DEBUG) 321 322 tracing.tracer.is_content_tracing_enabled = ( 323 True # to enable tracing/logging content (inputs/outputs) 324 ) 325 tracing.enable_tracing( 326 LoggingTracer( 327 tags_color_strings={ 328 "haystack.component.input": "\x1b[1;31m", 329 "haystack.component.name": "\x1b[1;34m", 330 }, 331 ), 332 ) 333 ``` 334 335 Here’s what the resulting log would look like when a pipeline is run: 336 <ClickableImage src="/img/55c3d5c84282d726c95fb3350ec36be49a354edca8a6164f5dffdab7121cec58-image_2.png" alt="Console output showing Haystack pipeline execution with DEBUG level tracing logs including component names, types, and input/output specifications" />