data-classes.mdx
1 --- 2 title: "Data Classes" 3 id: data-classes 4 slug: "/data-classes" 5 description: "In Haystack, there are a handful of core classes that are regularly used in many different places. These are classes that carry data through the system and you are likely to interact with these as either the input or output of your pipeline." 6 --- 7 8 # Data Classes 9 10 In Haystack, there are a handful of core classes that are regularly used in many different places. These are classes that carry data through the system and you are likely to interact with these as either the input or output of your pipeline. 11 12 Haystack uses data classes to help components communicate with each other in a simple and modular way. By doing this, data flows seamlessly through the Haystack pipelines. This page goes over the available data classes in Haystack: ByteStream, Answer (along with its variants ExtractedAnswer and GeneratedAnswer), ChatMessage, Document, and StreamingChunk, explaining how they contribute to the Haystack ecosystem. 13 14 You can check out the detailed parameters in our [Data Classes](/reference/data-classes-api) API reference. 15 16 ### Answer 17 18 #### Overview 19 20 The `Answer` class serves as the base for responses generated within Haystack, containing the answer's data, the originating query, and additional metadata. 21 22 #### Key Features 23 24 - Adaptable data handling, accommodating any data type (`data`). 25 - Query tracking for contextual relevance (`query`). 26 - Extensive metadata support for detailed answer description. 27 28 #### Attributes 29 30 ```python 31 @dataclass(frozen=True) 32 class Answer: 33 data: Any 34 query: str 35 meta: Dict[str, Any] 36 ``` 37 38 ### ExtractedAnswer 39 40 #### Overview 41 42 `ExtractedAnswer` is a subclass of `Answer` that deals explicitly with answers derived from Documents, offering more detailed attributes. 43 44 #### Key Features 45 46 - Includes reference to the originating `Document`. 47 - Score attribute to quantify the answer's confidence level. 48 - Optional start and end indices for pinpointing answer location within the source. 49 50 #### Attributes 51 52 ```python 53 @dataclass 54 class ExtractedAnswer: 55 query: str 56 score: float 57 data: Optional[str] = None 58 document: Optional[Document] = None 59 context: Optional[str] = None 60 document_offset: Optional["Span"] = None 61 context_offset: Optional["Span"] = None 62 meta: Dict[str, Any] = field(default_factory=dict) 63 ``` 64 65 ### GeneratedAnswer 66 67 #### Overview 68 69 `GeneratedAnswer` extends the `Answer` class to accommodate answers generated from multiple Documents. 70 71 #### Key Features 72 73 - Handles string-type data. 74 - Links to a list of `Document` objects, enhancing answer traceability. 75 76 #### Attributes 77 78 ```python 79 @dataclass 80 class GeneratedAnswer: 81 data: str 82 query: str 83 documents: List[Document] 84 meta: Dict[str, Any] = field(default_factory=dict) 85 ``` 86 87 ### ByteStream 88 89 #### Overview 90 91 `ByteStream` represents binary object abstraction in the Haystack framework and is crucial for handling various binary data formats. 92 93 #### Key Features 94 95 - Holds binary data and associated metadata. 96 - Optional MIME type specification for flexibility. 97 - File interaction methods (`to_file`, `from_file_path`, `from_string`) for easy data manipulation. 98 99 #### Attributes 100 101 ```python 102 @dataclass(frozen=True) 103 class ByteStream: 104 data: bytes 105 metadata: Dict[str, Any] = field(default_factory=dict, hash=False) 106 mime_type: Optional[str] = field(default=None) 107 ``` 108 109 #### Example 110 111 ```python 112 from haystack.dataclasses.byte_stream import ByteStream 113 114 image = ByteStream.from_file_path("dog.jpg") 115 ``` 116 117 ### ChatMessage 118 119 `ChatMessage` is the central abstraction to represent a message for a LLM. It contains role, metadata and several types of content, including text, tool calls and tool calls results. 120 121 Read the detailed documentation for the `ChatMessage` data class on a dedicated [ChatMessage](data-classes/chatmessage.mdx) page. 122 123 ### Document 124 125 #### Overview 126 127 `Document` represents a central data abstraction in Haystack, capable of holding text, tables, and binary data. 128 129 #### Key Features 130 131 - Unique ID for each document. 132 - Multiple content types are supported: text, binary (`blob`). 133 - Custom metadata and scoring for advanced document management. 134 - Optional embedding for AI-based applications. 135 136 #### Attributes 137 138 ```python 139 @dataclass 140 class Document(metaclass=_BackwardCompatible): 141 id: str = field(default="") 142 content: Optional[str] = field(default=None) 143 blob: Optional[ByteStream] = field(default=None) 144 meta: Dict[str, Any] = field(default_factory=dict) 145 score: Optional[float] = field(default=None) 146 embedding: Optional[List[float]] = field(default=None) 147 sparse_embedding: Optional[SparseEmbedding] = field(default=None) 148 ``` 149 150 #### Example 151 152 ```python 153 from haystack import Document 154 155 documents = Document( 156 content="Here are the contents of your document", 157 embedding=[0.1] * 768, 158 ) 159 ``` 160 161 ### StreamingChunk 162 163 #### Overview 164 165 `StreamingChunk` represents a partially streamed LLM response, enabling real-time LLM response processing. It encapsulates a segment of streamed content along with associated metadata and provides comprehensive information about the streaming state. 166 167 #### Key Features 168 169 - String-based content representation for text chunks 170 - Support for tool calls and tool call results 171 - Component tracking and metadata management 172 - Streaming state indicators (start, finish reason) 173 - Content block indexing for multi-part responses 174 175 #### Attributes 176 177 ```python 178 @dataclass 179 class StreamingChunk: 180 content: str 181 meta: dict[str, Any] = field(default_factory=dict, hash=False) 182 component_info: Optional[ComponentInfo] = field(default=None) 183 index: Optional[int] = field(default=None) 184 tool_calls: Optional[list[ToolCallDelta]] = field(default=None) 185 tool_call_result: Optional[ToolCallResult] = field(default=None) 186 start: bool = field(default=False) 187 finish_reason: Optional[FinishReason] = field(default=None) 188 ``` 189 190 #### Example 191 192 ```python 193 from haystack.dataclasses.streaming_chunk import StreamingChunk, ComponentInfo 194 195 ## Basic text chunk 196 chunk = StreamingChunk( 197 content="Hello world", 198 start=True, 199 meta={"model": "gpt-3.5-turbo"}, 200 ) 201 202 ## Tool call chunk 203 tool_chunk = StreamingChunk( 204 tool_calls=[ 205 ToolCallDelta( 206 index=0, 207 tool_name="calculator", 208 arguments='{"operation": "add", "a": 2, "b": 3}', 209 ), 210 ], 211 index=0, 212 start=False, 213 finish_reason="tool_calls", 214 ) 215 ``` 216 217 ### ToolCallDelta 218 219 #### Overview 220 221 `ToolCallDelta` represents a tool call prepared by the model, usually contained in an assistant message during streaming. 222 223 #### Attributes 224 225 ```python 226 @dataclass 227 class ToolCallDelta: 228 index: int 229 tool_name: Optional[str] = field(default=None) 230 arguments: Optional[str] = field(default=None) 231 id: Optional[str] = field(default=None) 232 ``` 233 234 ### ComponentInfo 235 236 #### Overview 237 238 The `ComponentInfo` class represents information about a component within a Haystack pipeline. It is used to track the type and name of components that generate or process data, aiding in debugging, tracing, and metadata management throughout the pipeline. 239 240 #### Key Features 241 242 - Stores the type of the component (including module and class name). 243 - Optionally stores the name assigned to the component in the pipeline. 244 - Provides a convenient class method to create a `ComponentInfo` instance from a `Component` object. 245 246 #### Attributes 247 248 ```python 249 @dataclass 250 class ComponentInfo: 251 type: str 252 name: Optional[str] = field(default=None) 253 254 @classmethod 255 def from_component(cls, component: Component) -> "ComponentInfo": ... 256 ``` 257 258 #### Example 259 260 ```python 261 from haystack.dataclasses.streaming_chunk import ComponentInfo 262 from haystack.core.component import Component 263 264 265 class MyComponent(Component): ... 266 267 268 component = MyComponent() 269 info = ComponentInfo.from_component(component) 270 print(info.type) # e.g., 'my_module.MyComponent' 271 print(info.name) # Name assigned in the pipeline, if any 272 ``` 273 274 ### SparseEmbedding 275 276 #### Overview 277 278 The `SparseEmbedding` class represents a sparse embedding: a vector where most values are zeros. 279 280 #### Attributes 281 282 - `indices`: List of indices of non-zero elements in the embedding. 283 - `values`: List of values of non-zero elements in the embedding. 284 285 ### Tool 286 287 `Tool` is a data class representing a tool that Language Models can prepare a call for. 288 289 Read the detailed documentation for the `Tool` data class on a dedicated [Tool](../tools/tool.mdx) page.