data-classes.mdx
1 --- 2 title: "Data Classes" 3 id: data-classes 4 slug: "/data-classes" 5 description: "In Haystack, there are a handful of core classes that are regularly used in many different places. These are classes that carry data through the system and you are likely to interact with these as either the input or output of your pipeline." 6 --- 7 8 # Data Classes 9 10 In Haystack, there are a handful of core classes that are regularly used in many different places. These are classes that carry data through the system and you are likely to interact with these as either the input or output of your pipeline. 11 12 Haystack uses data classes to help components communicate with each other in a simple and modular way. By doing this, data flows seamlessly through the Haystack pipelines. This page goes over the available data classes in Haystack: ByteStream, Answer (along with its variants ExtractedAnswer and GeneratedAnswer), ChatMessage, Document, and StreamingChunk, explaining how they contribute to the Haystack ecosystem. 13 14 You can check out the detailed parameters in our [Data Classes](/reference/data-classes-api) API reference. 15 16 ### Answer 17 18 #### Overview 19 20 The `Answer` class serves as the base for responses generated within Haystack, containing the answer's data, the originating query, and additional metadata. 21 22 #### Key Features 23 24 - Adaptable data handling, accommodating any data type (`data`). 25 - Query tracking for contextual relevance (`query`). 26 - Extensive metadata support for detailed answer description. 27 28 #### Attributes 29 30 ```python 31 @dataclass 32 class Answer: 33 data: Any 34 query: str 35 meta: Dict[str, Any] 36 ``` 37 38 ### ExtractedAnswer 39 40 #### Overview 41 42 `ExtractedAnswer` is a subclass of `Answer` that deals explicitly with answers derived from Documents, offering more detailed attributes. 43 44 #### Key Features 45 46 - Includes reference to the originating `Document`. 47 - Score attribute to quantify the answer's confidence level. 48 - Optional start and end indices for pinpointing answer location within the source. 49 50 #### Attributes 51 52 ```python 53 @dataclass 54 class ExtractedAnswer: 55 query: str 56 score: float 57 data: Optional[str] = None 58 document: Optional[Document] = None 59 context: Optional[str] = None 60 document_offset: Optional["Span"] = None 61 context_offset: Optional["Span"] = None 62 meta: Dict[str, Any] = field(default_factory=dict) 63 ``` 64 65 ### GeneratedAnswer 66 67 #### Overview 68 69 `GeneratedAnswer` extends the `Answer` class to accommodate answers generated from multiple Documents. 70 71 #### Key Features 72 73 - Handles string-type data. 74 - Links to a list of `Document` objects, enhancing answer traceability. 75 76 #### Attributes 77 78 ```python 79 @dataclass 80 class GeneratedAnswer: 81 data: str 82 query: str 83 documents: List[Document] 84 meta: Dict[str, Any] = field(default_factory=dict) 85 ``` 86 87 ### ByteStream 88 89 #### Overview 90 91 `ByteStream` represents binary object abstraction in the Haystack framework and is crucial for handling various binary data formats. 92 93 #### Key Features 94 95 - Holds binary data and associated metadata. 96 - Optional MIME type specification for flexibility. 97 - File interaction methods (`to_file`, `from_file_path`, `from_string`) for easy data manipulation. 98 99 #### Attributes 100 101 ```python 102 @dataclass(repr=False) 103 class ByteStream: 104 data: bytes 105 meta: Dict[str, Any] = field(default_factory=dict, hash=False) 106 mime_type: Optional[str] = field(default=None) 107 ``` 108 109 #### Example 110 111 ```python 112 from haystack.dataclasses.byte_stream import ByteStream 113 114 image = ByteStream.from_file_path("dog.jpg") 115 ``` 116 117 ### ChatMessage 118 119 `ChatMessage` is the central abstraction to represent a message for a LLM. It contains role, metadata and several types of content, including text, tool calls and tool calls results. 120 121 Read the detailed documentation for the `ChatMessage` data class on a dedicated [ChatMessage](data-classes/chatmessage.mdx) page. 122 123 ### Document 124 125 #### Overview 126 127 `Document` represents a central data abstraction in Haystack, capable of holding text, tables, and binary data. 128 129 #### Key Features 130 131 - Unique ID for each document. 132 - Multiple content types are supported: text, binary (`blob`). 133 - Custom metadata and scoring for advanced document management. 134 - Optional embedding for AI-based applications. 135 136 #### Attributes 137 138 ```python 139 @dataclass 140 class Document(metaclass=_BackwardCompatible): 141 id: str = field(default="") 142 content: Optional[str] = field(default=None) 143 blob: Optional[ByteStream] = field(default=None) 144 meta: Dict[str, Any] = field(default_factory=dict) 145 score: Optional[float] = field(default=None) 146 embedding: Optional[List[float]] = field(default=None) 147 sparse_embedding: Optional[SparseEmbedding] = field(default=None) 148 ``` 149 150 #### Example 151 152 ```python 153 from haystack import Document 154 155 documents = Document( 156 content="Here are the contents of your document", 157 embedding=[0.1] * 768, 158 ) 159 ``` 160 161 ### StreamingChunk 162 163 #### Overview 164 165 `StreamingChunk` represents a partially streamed LLM response, enabling real-time LLM response processing. It encapsulates a segment of streamed content along with associated metadata and provides comprehensive information about the streaming state. 166 167 #### Key Features 168 169 - String-based content representation for text chunks 170 - Support for tool calls and tool call results 171 - Component tracking and metadata management 172 - Streaming state indicators (start, finish reason) 173 - Content block indexing for multi-part responses 174 175 #### Attributes 176 177 ```python 178 @dataclass 179 class StreamingChunk: 180 content: str 181 meta: dict[str, Any] = field(default_factory=dict, hash=False) 182 component_info: Optional[ComponentInfo] = field(default=None) 183 index: Optional[int] = field(default=None) 184 tool_calls: Optional[list[ToolCallDelta]] = field(default=None) 185 tool_call_result: Optional[ToolCallResult] = field(default=None) 186 start: bool = field(default=False) 187 finish_reason: Optional[FinishReason] = field(default=None) 188 reasoning: Optional[ReasoningContent] = field(default=None) 189 ``` 190 191 #### Example 192 193 ```python 194 from haystack.dataclasses import StreamingChunk, ToolCallDelta, ReasoningContent 195 196 ## Basic text chunk 197 chunk = StreamingChunk( 198 content="Hello world", 199 start=True, 200 meta={"model": "gpt-5-mini"}, 201 ) 202 203 ## Tool call chunk 204 tool_chunk = StreamingChunk( 205 content="", 206 tool_calls=[ 207 ToolCallDelta( 208 index=0, 209 tool_name="calculator", 210 arguments='{"operation": "add", "a": 2, "b": 3}', 211 ), 212 ], 213 index=0, 214 start=False, 215 finish_reason="tool_calls", 216 ) 217 218 ## Reasoning chunk 219 reasoning_chunk = StreamingChunk( 220 content="", 221 reasoning=ReasoningContent( 222 reasoning_text="Thinking step by step about the answer.", 223 ), 224 index=0, 225 start=True, 226 meta={"model": "gpt-4.1-mini"}, 227 ) 228 ``` 229 230 ### ToolCallDelta 231 232 #### Overview 233 234 `ToolCallDelta` represents a tool call prepared by the model, usually contained in an assistant message during streaming. 235 236 #### Attributes 237 238 ```python 239 @dataclass 240 class ToolCallDelta: 241 index: int 242 tool_name: Optional[str] = field(default=None) 243 arguments: Optional[str] = field(default=None) 244 id: Optional[str] = field(default=None) 245 extra: Optional[Dict[str, Any]] = field(default=None) 246 ``` 247 248 ### ComponentInfo 249 250 #### Overview 251 252 The `ComponentInfo` class represents information about a component within a Haystack pipeline. It is used to track the type and name of components that generate or process data, aiding in debugging, tracing, and metadata management throughout the pipeline. 253 254 #### Key Features 255 256 - Stores the type of the component (including module and class name). 257 - Optionally stores the name assigned to the component in the pipeline. 258 - Provides a convenient class method to create a `ComponentInfo` instance from a `Component` object. 259 260 #### Attributes 261 262 ```python 263 @dataclass 264 class ComponentInfo: 265 type: str 266 name: Optional[str] = field(default=None) 267 268 @classmethod 269 def from_component(cls, component: Component) -> "ComponentInfo": ... 270 ``` 271 272 #### Example 273 274 ```python 275 from haystack.dataclasses.streaming_chunk import ComponentInfo 276 from haystack.core.component import Component 277 278 279 class MyComponent(Component): ... 280 281 282 component = MyComponent() 283 info = ComponentInfo.from_component(component) 284 print(info.type) # e.g., 'my_module.MyComponent' 285 print(info.name) # Name assigned in the pipeline, if any 286 ``` 287 288 ### SparseEmbedding 289 290 #### Overview 291 292 The `SparseEmbedding` class represents a sparse embedding: a vector where most values are zeros. 293 294 #### Attributes 295 296 - `indices`: List of indices of non-zero elements in the embedding. 297 - `values`: List of values of non-zero elements in the embedding. 298 299 ### Tool 300 301 `Tool` is a data class representing a tool that Language Models can prepare a call for. 302 303 Read the detailed documentation for the `Tool` data class on a dedicated [Tool](../tools/tool.mdx) page.