Cradicle Explorer

/ docs-website / versioned_docs / version-2.18 / concepts / data-classes.mdx
data-classes.mdx
  1  ---
  2  title: "Data Classes"
  3  id: data-classes
  4  slug: "/data-classes"
  5  description: "In Haystack, there are a handful of core classes that are regularly used in many different places. These are classes that carry data through the system and you are likely to interact with these as either the input or output of your pipeline."
  6  ---
  7  
  8  # Data Classes
  9  
 10  In Haystack, there are a handful of core classes that are regularly used in many different places. These are classes that carry data through the system and you are likely to interact with these as either the input or output of your pipeline.
 11  
 12  Haystack uses data classes to help components communicate with each other in a simple and modular way. By doing this, data flows seamlessly through the Haystack pipelines. This page goes over the available data classes in Haystack: ByteStream, Answer (along with its variants ExtractedAnswer and GeneratedAnswer), ChatMessage, Document, and StreamingChunk, explaining how they contribute to the Haystack ecosystem.
 13  
 14  You can check out the detailed parameters in our [Data Classes](/reference/data-classes-api) API reference.
 15  
 16  ### Answer
 17  
 18  #### Overview
 19  
 20  The `Answer` class serves as the base for responses generated within Haystack, containing the answer's data, the originating query, and additional metadata.
 21  
 22  #### Key Features
 23  
 24  - Adaptable data handling, accommodating any data type (`data`).
 25  - Query tracking for contextual relevance (`query`).
 26  - Extensive metadata support for detailed answer description.
 27  
 28  #### Attributes
 29  
 30  ```python
 31  @dataclass(frozen=True)
 32  class Answer:
 33      data: Any
 34      query: str
 35      meta: Dict[str, Any]
 36  ```
 37  
 38  ### ExtractedAnswer
 39  
 40  #### Overview
 41  
 42  `ExtractedAnswer` is a subclass of `Answer` that deals explicitly with answers derived from Documents, offering more detailed attributes.
 43  
 44  #### Key Features
 45  
 46  - Includes reference to the originating `Document`.
 47  - Score attribute to quantify the answer's confidence level.
 48  - Optional start and end indices for pinpointing answer location within the source.
 49  
 50  #### Attributes
 51  
 52  ```python
 53  @dataclass
 54  class ExtractedAnswer:
 55      query: str
 56      score: float
 57      data: Optional[str] = None
 58      document: Optional[Document] = None
 59      context: Optional[str] = None
 60      document_offset: Optional["Span"] = None
 61      context_offset: Optional["Span"] = None
 62      meta: Dict[str, Any] = field(default_factory=dict)
 63  ```
 64  
 65  ### GeneratedAnswer
 66  
 67  #### Overview
 68  
 69  `GeneratedAnswer` extends the `Answer` class to accommodate answers generated from multiple Documents.
 70  
 71  #### Key Features
 72  
 73  - Handles string-type data.
 74  - Links to a list of `Document` objects, enhancing answer traceability.
 75  
 76  #### Attributes
 77  
 78  ```python
 79  @dataclass
 80  class GeneratedAnswer:
 81      data: str
 82      query: str
 83      documents: List[Document]
 84      meta: Dict[str, Any] = field(default_factory=dict)
 85  ```
 86  
 87  ### ByteStream
 88  
 89  #### Overview
 90  
 91  `ByteStream` represents binary object abstraction in the Haystack framework and is crucial for handling various binary data formats.
 92  
 93  #### Key Features
 94  
 95  - Holds binary data and associated metadata.
 96  - Optional MIME type specification for flexibility.
 97  - File interaction methods (`to_file`, `from_file_path`, `from_string`) for easy data manipulation.
 98  
 99  #### Attributes
100  
101  ```python
102  @dataclass(frozen=True)
103  class ByteStream:
104      data: bytes
105      metadata: Dict[str, Any] = field(default_factory=dict, hash=False)
106      mime_type: Optional[str] = field(default=None)
107  ```
108  
109  #### Example
110  
111  ```python
112  from haystack.dataclasses.byte_stream import ByteStream
113  
114  image = ByteStream.from_file_path("dog.jpg")
115  ```
116  
117  ### ChatMessage
118  
119  `ChatMessage` is the central abstraction to represent a message for a LLM. It contains role, metadata and several types of content, including text, tool calls and tool calls results.
120  
121  Read the detailed documentation for the `ChatMessage` data class on a dedicated [ChatMessage](data-classes/chatmessage.mdx) page.
122  
123  ### Document
124  
125  #### Overview
126  
127  `Document` represents a central data abstraction in Haystack, capable of holding text, tables, and binary data.
128  
129  #### Key Features
130  
131  - Unique ID for each document.
132  - Multiple content types are supported: text, binary (`blob`).
133  - Custom metadata and scoring for advanced document management.
134  - Optional embedding for AI-based applications.
135  
136  #### Attributes
137  
138  ```python
139  @dataclass
140  class Document(metaclass=_BackwardCompatible):
141      id: str = field(default="")
142      content: Optional[str] = field(default=None)
143      blob: Optional[ByteStream] = field(default=None)
144      meta: Dict[str, Any] = field(default_factory=dict)
145      score: Optional[float] = field(default=None)
146      embedding: Optional[List[float]] = field(default=None)
147      sparse_embedding: Optional[SparseEmbedding] = field(default=None)
148  ```
149  
150  #### Example
151  
152  ```python
153  from haystack import Document
154  
155  documents = Document(
156      content="Here are the contents of your document",
157      embedding=[0.1] * 768,
158  )
159  ```
160  
161  ### StreamingChunk
162  
163  #### Overview
164  
165  `StreamingChunk` represents a partially streamed LLM response, enabling real-time LLM response processing. It encapsulates a segment of streamed content along with associated metadata and provides comprehensive information about the streaming state.
166  
167  #### Key Features
168  
169  - String-based content representation for text chunks
170  - Support for tool calls and tool call results
171  - Component tracking and metadata management
172  - Streaming state indicators (start, finish reason)
173  - Content block indexing for multi-part responses
174  
175  #### Attributes
176  
177  ```python
178  @dataclass
179  class StreamingChunk:
180      content: str
181      meta: dict[str, Any] = field(default_factory=dict, hash=False)
182      component_info: Optional[ComponentInfo] = field(default=None)
183      index: Optional[int] = field(default=None)
184      tool_calls: Optional[list[ToolCallDelta]] = field(default=None)
185      tool_call_result: Optional[ToolCallResult] = field(default=None)
186      start: bool = field(default=False)
187      finish_reason: Optional[FinishReason] = field(default=None)
188  ```
189  
190  #### Example
191  
192  ```python
193  from haystack.dataclasses.streaming_chunk import StreamingChunk, ComponentInfo
194  
195  ## Basic text chunk
196  chunk = StreamingChunk(
197      content="Hello world",
198      start=True,
199      meta={"model": "gpt-3.5-turbo"},
200  )
201  
202  ## Tool call chunk
203  tool_chunk = StreamingChunk(
204      tool_calls=[
205          ToolCallDelta(
206              index=0,
207              tool_name="calculator",
208              arguments='{"operation": "add", "a": 2, "b": 3}',
209          ),
210      ],
211      index=0,
212      start=False,
213      finish_reason="tool_calls",
214  )
215  ```
216  
217  ### ToolCallDelta
218  
219  #### Overview
220  
221  `ToolCallDelta` represents a tool call prepared by the model, usually contained in an assistant message during streaming.
222  
223  #### Attributes
224  
225  ```python
226  @dataclass
227  class ToolCallDelta:
228      index: int
229      tool_name: Optional[str] = field(default=None)
230      arguments: Optional[str] = field(default=None)
231      id: Optional[str] = field(default=None)
232  ```
233  
234  ### ComponentInfo
235  
236  #### Overview
237  
238  The `ComponentInfo` class represents information about a component within a Haystack pipeline. It is used to track the type and name of components that generate or process data, aiding in debugging, tracing, and metadata management throughout the pipeline.
239  
240  #### Key Features
241  
242  - Stores the type of the component (including module and class name).
243  - Optionally stores the name assigned to the component in the pipeline.
244  - Provides a convenient class method to create a `ComponentInfo` instance from a `Component` object.
245  
246  #### Attributes
247  
248  ```python
249  @dataclass
250  class ComponentInfo:
251      type: str
252      name: Optional[str] = field(default=None)
253  
254      @classmethod
255      def from_component(cls, component: Component) -> "ComponentInfo": ...
256  ```
257  
258  #### Example
259  
260  ```python
261  from haystack.dataclasses.streaming_chunk import ComponentInfo
262  from haystack.core.component import Component
263  
264  
265  class MyComponent(Component): ...
266  
267  
268  component = MyComponent()
269  info = ComponentInfo.from_component(component)
270  print(info.type)  # e.g., 'my_module.MyComponent'
271  print(info.name)  # Name assigned in the pipeline, if any
272  ```
273  
274  ### SparseEmbedding
275  
276  #### Overview
277  
278  The `SparseEmbedding` class represents a sparse embedding: a vector where most values are zeros.
279  
280  #### Attributes
281  
282  - `indices`: List of indices of non-zero elements in the embedding.
283  - `values`: List of values of non-zero elements in the embedding.
284  
285  ### Tool
286  
287  `Tool` is a data class representing a tool that Language Models can prepare a call for.
288  
289  Read the detailed documentation for the `Tool` data class on a dedicated [Tool](../tools/tool.mdx) page.