Cradicle Explorer

/ docs-website / versioned_docs / version-2.27 / concepts / data-classes.mdx
data-classes.mdx
  1  ---
  2  title: "Data Classes"
  3  id: data-classes
  4  slug: "/data-classes"
  5  description: "In Haystack, there are a handful of core classes that are regularly used in many different places. These are classes that carry data through the system and you are likely to interact with these as either the input or output of your pipeline."
  6  ---
  7  
  8  # Data Classes
  9  
 10  In Haystack, there are a handful of core classes that are regularly used in many different places. These are classes that carry data through the system and you are likely to interact with these as either the input or output of your pipeline.
 11  
 12  Haystack uses data classes to help components communicate with each other in a simple and modular way. By doing this, data flows seamlessly through the Haystack pipelines. This page goes over the available data classes in Haystack: ByteStream, Answer (along with its variants ExtractedAnswer and GeneratedAnswer), ChatMessage, Document, and StreamingChunk, explaining how they contribute to the Haystack ecosystem.
 13  
 14  You can check out the detailed parameters in our [Data Classes](/reference/data-classes-api) API reference.
 15  
 16  ### Answer
 17  
 18  #### Overview
 19  
 20  The `Answer` class serves as the base for responses generated within Haystack, containing the answer's data, the originating query, and additional metadata.
 21  
 22  #### Key Features
 23  
 24  - Adaptable data handling, accommodating any data type (`data`).
 25  - Query tracking for contextual relevance (`query`).
 26  - Extensive metadata support for detailed answer description.
 27  
 28  #### Attributes
 29  
 30  ```python
 31  @dataclass
 32  class Answer:
 33      data: Any
 34      query: str
 35      meta: Dict[str, Any]
 36  ```
 37  
 38  ### ExtractedAnswer
 39  
 40  #### Overview
 41  
 42  `ExtractedAnswer` is a subclass of `Answer` that deals explicitly with answers derived from Documents, offering more detailed attributes.
 43  
 44  #### Key Features
 45  
 46  - Includes reference to the originating `Document`.
 47  - Score attribute to quantify the answer's confidence level.
 48  - Optional start and end indices for pinpointing answer location within the source.
 49  
 50  #### Attributes
 51  
 52  ```python
 53  @dataclass
 54  class ExtractedAnswer:
 55      query: str
 56      score: float
 57      data: Optional[str] = None
 58      document: Optional[Document] = None
 59      context: Optional[str] = None
 60      document_offset: Optional["Span"] = None
 61      context_offset: Optional["Span"] = None
 62      meta: Dict[str, Any] = field(default_factory=dict)
 63  ```
 64  
 65  ### GeneratedAnswer
 66  
 67  #### Overview
 68  
 69  `GeneratedAnswer` extends the `Answer` class to accommodate answers generated from multiple Documents.
 70  
 71  #### Key Features
 72  
 73  - Handles string-type data.
 74  - Links to a list of `Document` objects, enhancing answer traceability.
 75  
 76  #### Attributes
 77  
 78  ```python
 79  @dataclass
 80  class GeneratedAnswer:
 81      data: str
 82      query: str
 83      documents: List[Document]
 84      meta: Dict[str, Any] = field(default_factory=dict)
 85  ```
 86  
 87  ### ByteStream
 88  
 89  #### Overview
 90  
 91  `ByteStream` represents binary object abstraction in the Haystack framework and is crucial for handling various binary data formats.
 92  
 93  #### Key Features
 94  
 95  - Holds binary data and associated metadata.
 96  - Optional MIME type specification for flexibility.
 97  - File interaction methods (`to_file`, `from_file_path`, `from_string`) for easy data manipulation.
 98  
 99  #### Attributes
100  
101  ```python
102  @dataclass(repr=False)
103  class ByteStream:
104      data: bytes
105      meta: Dict[str, Any] = field(default_factory=dict, hash=False)
106      mime_type: Optional[str] = field(default=None)
107  ```
108  
109  #### Example
110  
111  ```python
112  from haystack.dataclasses.byte_stream import ByteStream
113  
114  image = ByteStream.from_file_path("dog.jpg")
115  ```
116  
117  ### ChatMessage
118  
119  `ChatMessage` is the central abstraction to represent a message for a LLM. It contains role, metadata and several types of content, including text, tool calls and tool calls results.
120  
121  Read the detailed documentation for the `ChatMessage` data class on a dedicated [ChatMessage](data-classes/chatmessage.mdx) page.
122  
123  ### Document
124  
125  #### Overview
126  
127  `Document` represents a central data abstraction in Haystack, capable of holding text, tables, and binary data.
128  
129  #### Key Features
130  
131  - Unique ID for each document.
132  - Multiple content types are supported: text, binary (`blob`).
133  - Custom metadata and scoring for advanced document management.
134  - Optional embedding for AI-based applications.
135  
136  #### Attributes
137  
138  ```python
139  @dataclass
140  class Document(metaclass=_BackwardCompatible):
141      id: str = field(default="")
142      content: Optional[str] = field(default=None)
143      blob: Optional[ByteStream] = field(default=None)
144      meta: Dict[str, Any] = field(default_factory=dict)
145      score: Optional[float] = field(default=None)
146      embedding: Optional[List[float]] = field(default=None)
147      sparse_embedding: Optional[SparseEmbedding] = field(default=None)
148  ```
149  
150  #### Example
151  
152  ```python
153  from haystack import Document
154  
155  documents = Document(
156      content="Here are the contents of your document",
157      embedding=[0.1] * 768,
158  )
159  ```
160  
161  ### StreamingChunk
162  
163  #### Overview
164  
165  `StreamingChunk` represents a partially streamed LLM response, enabling real-time LLM response processing. It encapsulates a segment of streamed content along with associated metadata and provides comprehensive information about the streaming state.
166  
167  #### Key Features
168  
169  - String-based content representation for text chunks
170  - Support for tool calls and tool call results
171  - Component tracking and metadata management
172  - Streaming state indicators (start, finish reason)
173  - Content block indexing for multi-part responses
174  
175  #### Attributes
176  
177  ```python
178  @dataclass
179  class StreamingChunk:
180      content: str
181      meta: dict[str, Any] = field(default_factory=dict, hash=False)
182      component_info: Optional[ComponentInfo] = field(default=None)
183      index: Optional[int] = field(default=None)
184      tool_calls: Optional[list[ToolCallDelta]] = field(default=None)
185      tool_call_result: Optional[ToolCallResult] = field(default=None)
186      start: bool = field(default=False)
187      finish_reason: Optional[FinishReason] = field(default=None)
188      reasoning: Optional[ReasoningContent] = field(default=None)
189  ```
190  
191  #### Example
192  
193  ```python
194  from haystack.dataclasses import StreamingChunk, ToolCallDelta, ReasoningContent
195  
196  ## Basic text chunk
197  chunk = StreamingChunk(
198      content="Hello world",
199      start=True,
200      meta={"model": "gpt-5-mini"},
201  )
202  
203  ## Tool call chunk
204  tool_chunk = StreamingChunk(
205      content="",
206      tool_calls=[
207          ToolCallDelta(
208              index=0,
209              tool_name="calculator",
210              arguments='{"operation": "add", "a": 2, "b": 3}',
211          ),
212      ],
213      index=0,
214      start=False,
215      finish_reason="tool_calls",
216  )
217  
218  ## Reasoning chunk
219  reasoning_chunk = StreamingChunk(
220      content="",
221      reasoning=ReasoningContent(
222          reasoning_text="Thinking step by step about the answer.",
223      ),
224      index=0,
225      start=True,
226      meta={"model": "gpt-4.1-mini"},
227  )
228  ```
229  
230  ### ToolCallDelta
231  
232  #### Overview
233  
234  `ToolCallDelta` represents a tool call prepared by the model, usually contained in an assistant message during streaming.
235  
236  #### Attributes
237  
238  ```python
239  @dataclass
240  class ToolCallDelta:
241      index: int
242      tool_name: Optional[str] = field(default=None)
243      arguments: Optional[str] = field(default=None)
244      id: Optional[str] = field(default=None)
245      extra: Optional[Dict[str, Any]] = field(default=None)
246  ```
247  
248  ### ComponentInfo
249  
250  #### Overview
251  
252  The `ComponentInfo` class represents information about a component within a Haystack pipeline. It is used to track the type and name of components that generate or process data, aiding in debugging, tracing, and metadata management throughout the pipeline.
253  
254  #### Key Features
255  
256  - Stores the type of the component (including module and class name).
257  - Optionally stores the name assigned to the component in the pipeline.
258  - Provides a convenient class method to create a `ComponentInfo` instance from a `Component` object.
259  
260  #### Attributes
261  
262  ```python
263  @dataclass
264  class ComponentInfo:
265      type: str
266      name: Optional[str] = field(default=None)
267  
268      @classmethod
269      def from_component(cls, component: Component) -> "ComponentInfo": ...
270  ```
271  
272  #### Example
273  
274  ```python
275  from haystack.dataclasses.streaming_chunk import ComponentInfo
276  from haystack.core.component import Component
277  
278  
279  class MyComponent(Component): ...
280  
281  
282  component = MyComponent()
283  info = ComponentInfo.from_component(component)
284  print(info.type)  # e.g., 'my_module.MyComponent'
285  print(info.name)  # Name assigned in the pipeline, if any
286  ```
287  
288  ### SparseEmbedding
289  
290  #### Overview
291  
292  The `SparseEmbedding` class represents a sparse embedding: a vector where most values are zeros.
293  
294  #### Attributes
295  
296  - `indices`: List of indices of non-zero elements in the embedding.
297  - `values`: List of values of non-zero elements in the embedding.
298  
299  ### Tool
300  
301  `Tool` is a data class representing a tool that Language Models can prepare a call for.
302  
303  Read the detailed documentation for the `Tool` data class on a dedicated [Tool](../tools/tool.mdx) page.