tracing.mdx
1 --- 2 title: "Tracing" 3 id: tracing 4 slug: "/tracing" 5 description: "This page explains how to use tracing in Haystack. It describes how to set up a tracing backend with OpenTelemetry, Datadog, or your own solution. This can help you monitor your app's performance and optimize it." 6 --- 7 8 import ClickableImage from "@site/src/components/ClickableImage"; 9 10 # Tracing 11 12 This page explains how to use tracing in Haystack. It describes how to set up a tracing backend with OpenTelemetry, Datadog, or your own solution. This can help you monitor your app's performance and optimize it. 13 14 Traces document the flow of requests through your application and are vital for monitoring applications in production. This helps to understand the execution order of your pipeline components and analyze where your pipeline spends the most time. 15 16 ## Configuring a Tracing Backend 17 18 Instrumented applications typically send traces to a trace collector or a tracing backend. Haystack provides out-of-the-box support for [OpenTelemetry](https://opentelemetry.io/) and [Datadog](https://app.datadoghq.eu/dashboard/lists). You can also quickly implement support for additional providers of your choosing. 19 20 ### OpenTelemetry 21 22 To use OpenTelemetry as your tracing backend, follow these steps: 23 24 1. Install the [OpenTelemetry SDK](https://opentelemetry.io/docs/languages/python/): 25 26 ```shell 27 pip install opentelemetry-sdk 28 pip install opentelemetry-exporter-otlp 29 ``` 30 2. To add traces to even deeper levels of your pipelines, we recommend you check out [OpenTelemetry integrations](https://opentelemetry.io/ecosystem/registry/?s=python), such as: 31 - [`urllib3` instrumentation](https://github.com/open-telemetry/opentelemetry-python-contrib/tree/main/instrumentation/opentelemetry-instrumentation-urllib3) for tracing HTTP requests in your pipeline, 32 - [OpenAI instrumentation](https://github.com/traceloop/openllmetry/tree/main/packages/opentelemetry-instrumentation-openai) for tracing OpenAI requests. 33 3. There are two options for how to hook Haystack to the OpenTelemetry SDK. 34 35 - Run your Haystack applications using OpenTelemetry’s [automated instrumentation](https://opentelemetry.io/docs/languages/python/getting-started/#instrumentation). Haystack will automatically detect the configured tracing backend and use it to send traces. 36 37 First, install the `OpenTelemetry` CLI: 38 39 ```shell 40 pip install opentelemetry-distro 41 ``` 42 43 Then, run your Haystack application using the OpenTelemetry SDK: 44 45 ```shell 46 opentelemetry-instrument \ 47 --traces_exporter console \ 48 --metrics_exporter console \ 49 --logs_exporter console \ 50 --service_name my-haystack-app \ 51 <command to run your Haystack pipeline> 52 ``` 53 54 — or — 55 56 - Configure the tracing backend in your Python code: 57 58 ```python 59 from haystack import tracing 60 61 from opentelemetry import trace 62 from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter 63 from opentelemetry.sdk.trace import TracerProvider 64 from opentelemetry.sdk.trace.export import BatchSpanProcessor 65 from opentelemetry.sdk.resources import Resource 66 from opentelemetry.semconv.resource import ResourceAttributes 67 68 # Service name is required for most backends 69 resource = Resource(attributes={ 70 ResourceAttributes.SERVICE_NAME: "haystack" # Correct constant 71 }) 72 73 tracer_provider = TracerProvider(resource=resource) 74 processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="http://localhost:4318/v1/traces")) 75 tracer_provider.add_span_processor(processor) 76 trace.set_tracer_provider(tracer_provider) 77 78 # Tell Haystack to auto-detect the configured tracer 79 import haystack.tracing 80 haystack.tracing.auto_enable_tracing() 81 82 # Explicitly tell Haystack to use your tracer 83 from haystack.tracing import OpenTelemetryTracer 84 85 tracer = tracer_provider.get_tracer("my_application") 86 tracing.enable_tracing(OpenTelemetryTracer(tracer)) 87 ``` 88 89 ### Datadog 90 91 To use Datadog as your tracing backend, follow these steps: 92 93 1. Install [Datadog’s tracing library ddtrace](https://ddtrace.readthedocs.io/en/stable/#). 94 95 ```shell 96 pip install ddtrace 97 ``` 98 2. There are two options for how to hook Haystack to ddtrace. 99 100 - Run your Haystack application using the `ddtrace`: 101 ```shell 102 ddtrace <command to run your Haystack pipeline 103 ``` 104 105 — or — 106 107 - Configure the Datadog tracing backend in your Python code: 108 109 ```python 110 from haystack.tracing import DatadogTracer 111 from haystack import tracing 112 import ddtrace 113 114 tracer = ddtrace.tracer 115 tracing.enable_tracing(DatadogTracer(tracer)) 116 ``` 117 118 ### Langfuse 119 120 `LangfuseConnector` component allows you to easily trace your Haystack pipelines with the Langfuse UI. 121 122 Simply install the component with `pip install langfuse-haystack`, then add it to your pipeline. 123 124 :::note 125 Check out the component's [documentation page](../pipeline-components/connectors/langfuseconnector.mdx) for more details and example usage, or our [blog post](https://haystack.deepset.ai/blog/langfuse-integration) for the complete walkthrough. 126 127 ::: 128 129 <ClickableImage src="/img/11cec4f-langfuse-generation-span.png" alt="Langfuse trace detail view showing generation span with input prompt, output, metadata, latency, and cost information for a language model call" /> 130 131 ### Weights & Biases Weave 132 133 The `WeaveConnector` component allows you to trace and visualize your pipeline execution in [Weights & Biases](https://wandb.ai/site/) framework. 134 135 You will first need to create a free account on Weights & Biases website and get your API key, as well as install the integration with `pip install weights_biases-haystack`. 136 137 :::note 138 Check out the component's [documentation page](../pipeline-components/connectors/weaveconnector.mdx) for more details and example usage. 139 140 ::: 141 142 ### Custom Tracing Backend 143 144 To use your custom tracing backend with Haystack, follow these steps: 145 146 1. Implement the `Tracer` interface. The following code snippet provides an example using the OpenTelemetry package: 147 148 ```python 149 import contextlib 150 from typing import Optional, Dict, Any, Iterator 151 152 from opentelemetry import trace 153 from opentelemetry.trace import NonRecordingSpan 154 155 from haystack.tracing import Tracer, Span 156 from haystack.tracing import utils as tracing_utils 157 import opentelemetry.trace 158 159 class OpenTelemetrySpan(Span): 160 def __init__(self, span: opentelemetry.trace.Span) -> None: 161 self._span = span 162 163 def set_tag(self, key: str, value: Any) -> None: 164 # Tracing backends usually don't support any tag value 165 # `coerce_tag_value` forces the value to either be a Python 166 # primitive (int, float, boolean, str) or tries to dump it as string. 167 coerced_value = tracing_utils.coerce_tag_value(value) 168 self._span.set_attribute(key, coerced_value) 169 170 class OpenTelemetryTracer(Tracer): 171 def __init__(self, tracer: opentelemetry.trace.Tracer) -> None: 172 self._tracer = tracer 173 174 @contextlib.contextmanager 175 def trace(self, operation_name: str, tags: Optional[Dict[str, Any]] = None) -> Iterator[Span]: 176 with self._tracer.start_as_current_span(operation_name) as span: 177 span = OpenTelemetrySpan(span) 178 if tags: 179 span.set_tags(tags) 180 181 yield span 182 183 def current_span(self) -> Optional[Span]: 184 current_span = trace.get_current_span() 185 if isinstance(current_span, NonRecordingSpan): 186 return None 187 188 return OpenTelemetrySpan(current_span) 189 ``` 190 191 2. Tell Haystack to use your custom tracer: 192 193 ```python 194 from haystack import tracing 195 196 haystack_tracer = OpenTelemetryTracer(tracer) 197 tracing.enable_tracing(haystack_tracer) 198 ``` 199 200 ## Disabling Auto Tracing 201 202 Haystack automatically detects and enables tracing under the following circumstances: 203 204 - If `opentelemetry-sdk` is installed and configured for OpenTelemetry. 205 - If `ddtrace` is installed for Datadog. 206 207 To disable this behavior, there are two options: 208 209 - Set the environment variable `HAYSTACK_AUTO_TRACE_ENABLED` to `false` when running your Haystack application 210 211 — or — 212 213 - Disable tracing in Python: 214 215 ```python 216 from haystack.tracing import disable_tracing 217 218 disable_tracing() 219 ``` 220 221 ## Content Tracing 222 223 Haystack also allows you to trace your pipeline components' input and output values. This is useful for investigating your pipeline execution step by step. 224 225 By default, this behavior is disabled to prevent sensitive user information from being sent to your tracing backend. 226 227 To enable content tracing, there are two options: 228 229 - Set the environment variable `HAYSTACK_CONTENT_TRACING_ENABLED` to `true` when running your Haystack application 230 231 — or — 232 233 - Explicitly enable content tracing in Python: 234 235 ```python 236 from haystack import tracing 237 238 tracing.tracer.is_content_tracing_enabled = True 239 ``` 240 241 ## Visualizing Traces During Development 242 243 Use [Jaeger](https://www.jaegertracing.io/docs/1.6/getting-started/) as a lightweight tracing backend for local pipeline development. This allows you to experiment with tracing without the need for a complex tracing backend. 244 245 <ClickableImage src="/img/dd906d7-Screenshot_2024-02-22_at_16.51.01.png" alt="Jaeger UI trace timeline displaying haystack pipeline execution with component spans showing duration and nesting of operations" /> 246 247 1. Run the Jaeger container. This creates a tracing backend as well as a UI to visualize the traces: 248 249 ```shell 250 docker run --rm -d --name jaeger \ 251 -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \ 252 -p 6831:6831/udp \ 253 -p 6832:6832/udp \ 254 -p 5778:5778 \ 255 -p 16686:16686 \ 256 -p 4317:4317 \ 257 -p 4318:4318 \ 258 -p 14250:14250 \ 259 -p 14268:14268 \ 260 -p 14269:14269 \ 261 -p 9411:9411 \ 262 jaegertracing/all-in-one:latest 263 ``` 264 2. Install the OpenTelemetry SDK: 265 266 ```shell 267 pip install opentelemetry-sdk 268 pip install opentelemetry-exporter-otlp 269 ``` 270 3. Configure `OpenTelemetry` to use the Jaeger backend: 271 272 ```python 273 from opentelemetry.sdk.resources import Resource 274 from opentelemetry.semconv.resource import ResourceAttributes 275 276 from opentelemetry import trace 277 from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter 278 from opentelemetry.sdk.trace import TracerProvider 279 from opentelemetry.sdk.trace.export import BatchSpanProcessor 280 281 # Service name is required for most backends 282 resource = Resource(attributes={ 283 ResourceAttributes.SERVICE_NAME: "haystack" 284 }) 285 286 tracer_provider = TracerProvider(resource=resource) 287 processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="http://localhost:4318/v1/traces")) 288 tracer_provider.add_span_processor(processor) 289 trace.set_tracer_provider(tracer_provider) 290 ``` 291 4. Tell Haystack to use OpenTelemetry for tracing: 292 293 ```python 294 import haystack.tracing 295 296 haystack.tracing.auto_enable_tracing() 297 ``` 298 5. Run your pipeline: 299 300 ```python 301 ... 302 pipeline.run(...) 303 ... 304 ``` 305 6. Inspect the traces in the UI provided by Jaeger at [http://localhost:16686](http://localhost:16686/search). 306 307 ## Real-Time Pipeline Logging 308 309 Use Haystack's [`LoggingTracer`](https://github.com/deepset-ai/haystack/blob/main/haystack/tracing/logging_tracer.py) logs to inspect the data that's flowing through your pipeline in real-time. 310 311 This feature is particularly helpful during experimentation and prototyping, as you don’t need to set up any tracing backend beforehand. 312 313 Here’s how you can enable this tracer. In this example, we are adding color tags (this is optional) to highlight the components' names and inputs: 314 315 ```python 316 import logging 317 from haystack import tracing 318 from haystack.tracing.logging_tracer import LoggingTracer 319 320 logging.basicConfig( 321 format="%(levelname)s - %(name)s - %(message)s", 322 level=logging.WARNING, 323 ) 324 logging.getLogger("haystack").setLevel(logging.DEBUG) 325 326 tracing.tracer.is_content_tracing_enabled = ( 327 True # to enable tracing/logging content (inputs/outputs) 328 ) 329 tracing.enable_tracing( 330 LoggingTracer( 331 tags_color_strings={ 332 "haystack.component.input": "\x1b[1;31m", 333 "haystack.component.name": "\x1b[1;34m", 334 }, 335 ), 336 ) 337 ``` 338 339 Here’s what the resulting log would look like when a pipeline is run: 340 341 <ClickableImage src="/img/55c3d5c84282d726c95fb3350ec36be49a354edca8a6164f5dffdab7121cec58-image_2.png" alt="Console output showing Haystack pipeline execution with DEBUG level tracing logs including component names, types, and input/output specifications" />