image_converters_api.md
1 --- 2 title: "Image Converters" 3 id: image-converters-api 4 description: "Various converters to transform image data from one format to another." 5 slug: "/image-converters-api" 6 --- 7 8 <a id="document_to_image"></a> 9 10 ## Module document\_to\_image 11 12 <a id="document_to_image.DocumentToImageContent"></a> 13 14 ### DocumentToImageContent 15 16 Converts documents sourced from PDF and image files into ImageContents. 17 18 This component processes a list of documents and extracts visual content from supported file formats, converting 19 them into ImageContents that can be used for multimodal AI tasks. It handles both direct image files and PDF 20 documents by extracting specific pages as images. 21 22 Documents are expected to have metadata containing: 23 - The `file_path_meta_field` key with a valid file path that exists when combined with `root_path` 24 - A supported image format (MIME type must be one of the supported image types) 25 - For PDF files, a `page_number` key specifying which page to extract 26 27 ### Usage example 28 ```python 29 from haystack import Document 30 from haystack.components.converters.image.document_to_image import DocumentToImageContent 31 32 converter = DocumentToImageContent( 33 file_path_meta_field="file_path", 34 root_path="/data/files", 35 detail="high", 36 size=(800, 600) 37 ) 38 39 documents = [ 40 Document(content="Optional description of image.jpg", meta={"file_path": "image.jpg"}), 41 Document(content="Text content of page 1 of doc.pdf", meta={"file_path": "doc.pdf", "page_number": 1}) 42 ] 43 44 result = converter.run(documents) 45 image_contents = result["image_contents"] 46 # [ImageContent( 47 # base64_image='/9j/4A...', mime_type='image/jpeg', detail='high', meta={'file_path': 'image.jpg'} 48 # ), 49 # ImageContent( 50 # base64_image='/9j/4A...', mime_type='image/jpeg', detail='high', 51 # meta={'page_number': 1, 'file_path': 'doc.pdf'} 52 # )] 53 ``` 54 55 <a id="document_to_image.DocumentToImageContent.__init__"></a> 56 57 #### DocumentToImageContent.\_\_init\_\_ 58 59 ```python 60 def __init__(*, 61 file_path_meta_field: str = "file_path", 62 root_path: str | None = None, 63 detail: Literal["auto", "high", "low"] | None = None, 64 size: tuple[int, int] | None = None) 65 ``` 66 67 Initialize the DocumentToImageContent component. 68 69 **Arguments**: 70 71 - `file_path_meta_field`: The metadata field in the Document that contains the file path to the image or PDF. 72 - `root_path`: The root directory path where document files are located. If provided, file paths in 73 document metadata will be resolved relative to this path. If None, file paths are treated as absolute paths. 74 - `detail`: Optional detail level of the image (only supported by OpenAI). Can be "auto", "high", or "low". 75 This will be passed to the created ImageContent objects. 76 - `size`: If provided, resizes the image to fit within the specified dimensions (width, height) while 77 maintaining aspect ratio. This reduces file size, memory usage, and processing time, which is beneficial 78 when working with models that have resolution constraints or when transmitting images to remote services. 79 80 <a id="document_to_image.DocumentToImageContent.run"></a> 81 82 #### DocumentToImageContent.run 83 84 ```python 85 @component.output_types(image_contents=list[ImageContent | None]) 86 def run(documents: list[Document]) -> dict[str, list[ImageContent | None]] 87 ``` 88 89 Convert documents with image or PDF sources into ImageContent objects. 90 91 This method processes the input documents, extracting images from supported file formats and converting them 92 into ImageContent objects. 93 94 **Arguments**: 95 96 - `documents`: A list of documents to process. Each document should have metadata containing at minimum 97 a 'file_path_meta_field' key. PDF documents additionally require a 'page_number' key to specify which 98 page to convert. 99 100 **Raises**: 101 102 - `ValueError`: If any document is missing the required metadata keys, has an invalid file path, or has an unsupported 103 MIME type. The error message will specify which document and what information is missing or incorrect. 104 105 **Returns**: 106 107 Dictionary containing one key: 108 - "image_contents": ImageContents created from the processed documents. These contain base64-encoded image 109 data and metadata. The order corresponds to order of input documents. 110 111 <a id="file_to_document"></a> 112 113 ## Module file\_to\_document 114 115 <a id="file_to_document.ImageFileToDocument"></a> 116 117 ### ImageFileToDocument 118 119 Converts image file references into empty Document objects with associated metadata. 120 121 This component is useful in pipelines where image file paths need to be wrapped in `Document` objects to be 122 processed by downstream components such as the `SentenceTransformersImageDocumentEmbedder`. 123 124 It does **not** extract any content from the image files, instead it creates `Document` objects with `None` as 125 their content and attaches metadata such as file path and any user-provided values. 126 127 ### Usage example 128 ```python 129 from haystack.components.converters.image import ImageFileToDocument 130 131 converter = ImageFileToDocument() 132 133 sources = ["image.jpg", "another_image.png"] 134 135 result = converter.run(sources=sources) 136 documents = result["documents"] 137 138 print(documents) 139 140 # [Document(id=..., meta: {'file_path': 'image.jpg'}), 141 # Document(id=..., meta: {'file_path': 'another_image.png'})] 142 ``` 143 144 <a id="file_to_document.ImageFileToDocument.__init__"></a> 145 146 #### ImageFileToDocument.\_\_init\_\_ 147 148 ```python 149 def __init__(*, store_full_path: bool = False) 150 ``` 151 152 Initialize the ImageFileToDocument component. 153 154 **Arguments**: 155 156 - `store_full_path`: If True, the full path of the file is stored in the metadata of the document. 157 If False, only the file name is stored. 158 159 <a id="file_to_document.ImageFileToDocument.run"></a> 160 161 #### ImageFileToDocument.run 162 163 ```python 164 @component.output_types(documents=list[Document]) 165 def run( 166 *, 167 sources: list[str | Path | ByteStream], 168 meta: dict[str, Any] | list[dict[str, Any]] | None = None 169 ) -> dict[str, list[Document]] 170 ``` 171 172 Convert image files into empty Document objects with metadata. 173 174 This method accepts image file references (as file paths or ByteStreams) and creates `Document` objects 175 without content. These documents are enriched with metadata derived from the input source and optional 176 user-provided metadata. 177 178 **Arguments**: 179 180 - `sources`: List of file paths or ByteStream objects to convert. 181 - `meta`: Optional metadata to attach to the documents. 182 This value can be a list of dictionaries or a single dictionary. 183 If it's a single dictionary, its content is added to the metadata of all produced documents. 184 If it's a list, its length must match the number of sources, as they are zipped together. 185 For ByteStream objects, their `meta` is added to the output documents. 186 187 **Returns**: 188 189 A dictionary containing: 190 - `documents`: A list of `Document` objects with empty content and associated metadata. 191 192 <a id="file_to_image"></a> 193 194 ## Module file\_to\_image 195 196 <a id="file_to_image.ImageFileToImageContent"></a> 197 198 ### ImageFileToImageContent 199 200 Converts image files to ImageContent objects. 201 202 ### Usage example 203 ```python 204 from haystack.components.converters.image import ImageFileToImageContent 205 206 converter = ImageFileToImageContent() 207 208 sources = ["image.jpg", "another_image.png"] 209 210 image_contents = converter.run(sources=sources)["image_contents"] 211 print(image_contents) 212 213 # [ImageContent(base64_image='...', 214 # mime_type='image/jpeg', 215 # detail=None, 216 # meta={'file_path': 'image.jpg'}), 217 # ...] 218 ``` 219 220 <a id="file_to_image.ImageFileToImageContent.__init__"></a> 221 222 #### ImageFileToImageContent.\_\_init\_\_ 223 224 ```python 225 def __init__(*, 226 detail: Literal["auto", "high", "low"] | None = None, 227 size: tuple[int, int] | None = None) 228 ``` 229 230 Create the ImageFileToImageContent component. 231 232 **Arguments**: 233 234 - `detail`: Optional detail level of the image (only supported by OpenAI). One of "auto", "high", or "low". 235 This will be passed to the created ImageContent objects. 236 - `size`: If provided, resizes the image to fit within the specified dimensions (width, height) while 237 maintaining aspect ratio. This reduces file size, memory usage, and processing time, which is beneficial 238 when working with models that have resolution constraints or when transmitting images to remote services. 239 240 <a id="file_to_image.ImageFileToImageContent.run"></a> 241 242 #### ImageFileToImageContent.run 243 244 ```python 245 @component.output_types(image_contents=list[ImageContent]) 246 def run(sources: list[str | Path | ByteStream], 247 meta: dict[str, Any] | list[dict[str, Any]] | None = None, 248 *, 249 detail: Literal["auto", "high", "low"] | None = None, 250 size: tuple[int, int] | None = None) -> dict[str, list[ImageContent]] 251 ``` 252 253 Converts files to ImageContent objects. 254 255 **Arguments**: 256 257 - `sources`: List of file paths or ByteStream objects to convert. 258 - `meta`: Optional metadata to attach to the ImageContent objects. 259 This value can be a list of dictionaries or a single dictionary. 260 If it's a single dictionary, its content is added to the metadata of all produced ImageContent objects. 261 If it's a list, its length must match the number of sources as they're zipped together. 262 For ByteStream objects, their `meta` is added to the output ImageContent objects. 263 - `detail`: Optional detail level of the image (only supported by OpenAI). One of "auto", "high", or "low". 264 This will be passed to the created ImageContent objects. 265 If not provided, the detail level will be the one set in the constructor. 266 - `size`: If provided, resizes the image to fit within the specified dimensions (width, height) while 267 maintaining aspect ratio. This reduces file size, memory usage, and processing time, which is beneficial 268 when working with models that have resolution constraints or when transmitting images to remote services. 269 If not provided, the size value will be the one set in the constructor. 270 271 **Returns**: 272 273 A dictionary with the following keys: 274 - `image_contents`: A list of ImageContent objects. 275 276 <a id="pdf_to_image"></a> 277 278 ## Module pdf\_to\_image 279 280 <a id="pdf_to_image.PDFToImageContent"></a> 281 282 ### PDFToImageContent 283 284 Converts PDF files to ImageContent objects. 285 286 ### Usage example 287 ```python 288 from haystack.components.converters.image import PDFToImageContent 289 290 converter = PDFToImageContent() 291 292 sources = ["file.pdf", "another_file.pdf"] 293 294 image_contents = converter.run(sources=sources)["image_contents"] 295 print(image_contents) 296 297 # [ImageContent(base64_image='...', 298 # mime_type='application/pdf', 299 # detail=None, 300 # meta={'file_path': 'file.pdf', 'page_number': 1}), 301 # ...] 302 ``` 303 304 <a id="pdf_to_image.PDFToImageContent.__init__"></a> 305 306 #### PDFToImageContent.\_\_init\_\_ 307 308 ```python 309 def __init__(*, 310 detail: Literal["auto", "high", "low"] | None = None, 311 size: tuple[int, int] | None = None, 312 page_range: list[str | int] | None = None) 313 ``` 314 315 Create the PDFToImageContent component. 316 317 **Arguments**: 318 319 - `detail`: Optional detail level of the image (only supported by OpenAI). One of "auto", "high", or "low". 320 This will be passed to the created ImageContent objects. 321 - `size`: If provided, resizes the image to fit within the specified dimensions (width, height) while 322 maintaining aspect ratio. This reduces file size, memory usage, and processing time, which is beneficial 323 when working with models that have resolution constraints or when transmitting images to remote services. 324 - `page_range`: List of page numbers and/or page ranges to convert to images. Page numbers start at 1. 325 If None, all pages in the PDF will be converted. Pages outside the valid range (1 to number of pages) 326 will be skipped with a warning. For example, page_range=[1, 3] will convert only the first and third 327 pages of the document. It also accepts printable range strings, e.g.: ['1-3', '5', '8', '10-12'] 328 will convert pages 1, 2, 3, 5, 8, 10, 11, 12. 329 330 <a id="pdf_to_image.PDFToImageContent.run"></a> 331 332 #### PDFToImageContent.run 333 334 ```python 335 @component.output_types(image_contents=list[ImageContent]) 336 def run( 337 sources: list[str | Path | ByteStream], 338 meta: dict[str, Any] | list[dict[str, Any]] | None = None, 339 *, 340 detail: Literal["auto", "high", "low"] | None = None, 341 size: tuple[int, int] | None = None, 342 page_range: list[str | int] | None = None 343 ) -> dict[str, list[ImageContent]] 344 ``` 345 346 Converts files to ImageContent objects. 347 348 **Arguments**: 349 350 - `sources`: List of file paths or ByteStream objects to convert. 351 - `meta`: Optional metadata to attach to the ImageContent objects. 352 This value can be a list of dictionaries or a single dictionary. 353 If it's a single dictionary, its content is added to the metadata of all produced ImageContent objects. 354 If it's a list, its length must match the number of sources as they're zipped together. 355 For ByteStream objects, their `meta` is added to the output ImageContent objects. 356 - `detail`: Optional detail level of the image (only supported by OpenAI). One of "auto", "high", or "low". 357 This will be passed to the created ImageContent objects. 358 If not provided, the detail level will be the one set in the constructor. 359 - `size`: If provided, resizes the image to fit within the specified dimensions (width, height) while 360 maintaining aspect ratio. This reduces file size, memory usage, and processing time, which is beneficial 361 when working with models that have resolution constraints or when transmitting images to remote services. 362 If not provided, the size value will be the one set in the constructor. 363 - `page_range`: List of page numbers and/or page ranges to convert to images. Page numbers start at 1. 364 If None, all pages in the PDF will be converted. Pages outside the valid range (1 to number of pages) 365 will be skipped with a warning. For example, page_range=[1, 3] will convert only the first and third 366 pages of the document. It also accepts printable range strings, e.g.: ['1-3', '5', '8', '10-12'] 367 will convert pages 1, 2, 3, 5, 8, 10, 11, 12. 368 If not provided, the page_range value will be the one set in the constructor. 369 370 **Returns**: 371 372 A dictionary with the following keys: 373 - `image_contents`: A list of ImageContent objects. 374