image_converters_api.md
1 --- 2 title: "Image Converters" 3 id: image-converters-api 4 description: "Various converters to transform image data from one format to another." 5 slug: "/image-converters-api" 6 --- 7 8 9 ## document_to_image 10 11 ### DocumentToImageContent 12 13 Converts documents sourced from PDF and image files into ImageContents. 14 15 This component processes a list of documents and extracts visual content from supported file formats, converting 16 them into ImageContents that can be used for multimodal AI tasks. It handles both direct image files and PDF 17 documents by extracting specific pages as images. 18 19 Documents are expected to have metadata containing: 20 21 - The `file_path_meta_field` key with a valid file path that exists when combined with `root_path` 22 - A supported image format (MIME type must be one of the supported image types) 23 - For PDF files, a `page_number` key specifying which page to extract 24 25 ### Usage example 26 27 ```` 28 ```python 29 from haystack import Document 30 from haystack.components.converters.image.document_to_image import DocumentToImageContent 31 32 converter = DocumentToImageContent( 33 file_path_meta_field="file_path", 34 root_path="/data/files", 35 detail="high", 36 size=(800, 600) 37 ) 38 39 documents = [ 40 Document(content="Optional description of image.jpg", meta={"file_path": "image.jpg"}), 41 Document(content="Text content of page 1 of doc.pdf", meta={"file_path": "doc.pdf", "page_number": 1}) 42 ] 43 44 result = converter.run(documents) 45 image_contents = result["image_contents"] 46 # [ImageContent( 47 # base64_image='/9j/4A...', mime_type='image/jpeg', detail='high', meta={'file_path': 'image.jpg'} 48 # ), 49 # ImageContent( 50 # base64_image='/9j/4A...', mime_type='image/jpeg', detail='high', 51 # meta={'page_number': 1, 'file_path': 'doc.pdf'} 52 # )] 53 ``` 54 ```` 55 56 #### __init__ 57 58 ```python 59 __init__( 60 *, 61 file_path_meta_field: str = "file_path", 62 root_path: str | None = None, 63 detail: Literal["auto", "high", "low"] | None = None, 64 size: tuple[int, int] | None = None 65 ) 66 ``` 67 68 Initialize the DocumentToImageContent component. 69 70 **Parameters:** 71 72 - **file_path_meta_field** (<code>str</code>) – The metadata field in the Document that contains the file path to the image or PDF. 73 - **root_path** (<code>str | None</code>) – The root directory path where document files are located. If provided, file paths in 74 document metadata will be resolved relative to this path. If None, file paths are treated as absolute paths. 75 - **detail** (<code>Literal['auto', 'high', 'low'] | None</code>) – Optional detail level of the image (only supported by OpenAI). Can be "auto", "high", or "low". 76 This will be passed to the created ImageContent objects. 77 - **size** (<code>tuple\[int, int\] | None</code>) – If provided, resizes the image to fit within the specified dimensions (width, height) while 78 maintaining aspect ratio. This reduces file size, memory usage, and processing time, which is beneficial 79 when working with models that have resolution constraints or when transmitting images to remote services. 80 81 #### run 82 83 ```python 84 run(documents: list[Document]) -> dict[str, list[ImageContent | None]] 85 ``` 86 87 Convert documents with image or PDF sources into ImageContent objects. 88 89 This method processes the input documents, extracting images from supported file formats and converting them 90 into ImageContent objects. 91 92 **Parameters:** 93 94 - **documents** (<code>list\[Document\]</code>) – A list of documents to process. Each document should have metadata containing at minimum 95 a 'file_path_meta_field' key. PDF documents additionally require a 'page_number' key to specify which 96 page to convert. 97 98 **Returns:** 99 100 - <code>dict\[str, list\[ImageContent | None\]\]</code> – Dictionary containing one key: 101 - "image_contents": ImageContents created from the processed documents. These contain base64-encoded image 102 data and metadata. The order corresponds to order of input documents. 103 104 **Raises:** 105 106 - <code>ValueError</code> – If any document is missing the required metadata keys, has an invalid file path, or has an unsupported 107 MIME type. The error message will specify which document and what information is missing or incorrect. 108 109 ## file_to_document 110 111 ### ImageFileToDocument 112 113 Converts image file references into empty Document objects with associated metadata. 114 115 This component is useful in pipelines where image file paths need to be wrapped in `Document` objects to be 116 processed by downstream components such as the `SentenceTransformersImageDocumentEmbedder`. 117 118 It does **not** extract any content from the image files, instead it creates `Document` objects with `None` as 119 their content and attaches metadata such as file path and any user-provided values. 120 121 ### Usage example 122 123 ```python 124 from haystack.components.converters.image import ImageFileToDocument 125 126 converter = ImageFileToDocument() 127 128 sources = ["image.jpg", "another_image.png"] 129 130 result = converter.run(sources=sources) 131 documents = result["documents"] 132 133 print(documents) 134 135 # [Document(id=..., meta: {'file_path': 'image.jpg'}), 136 # Document(id=..., meta: {'file_path': 'another_image.png'})] 137 ``` 138 139 #### __init__ 140 141 ```python 142 __init__(*, store_full_path: bool = False) 143 ``` 144 145 Initialize the ImageFileToDocument component. 146 147 **Parameters:** 148 149 - **store_full_path** (<code>bool</code>) – If True, the full path of the file is stored in the metadata of the document. 150 If False, only the file name is stored. 151 152 #### run 153 154 ```python 155 run( 156 *, 157 sources: list[str | Path | ByteStream], 158 meta: dict[str, Any] | list[dict[str, Any]] | None = None 159 ) -> dict[str, list[Document]] 160 ``` 161 162 Convert image files into empty Document objects with metadata. 163 164 This method accepts image file references (as file paths or ByteStreams) and creates `Document` objects 165 without content. These documents are enriched with metadata derived from the input source and optional 166 user-provided metadata. 167 168 **Parameters:** 169 170 - **sources** (<code>list\[str | Path | ByteStream\]</code>) – List of file paths or ByteStream objects to convert. 171 - **meta** (<code>dict\[str, Any\] | list\[dict\[str, Any\]\] | None</code>) – Optional metadata to attach to the documents. 172 This value can be a list of dictionaries or a single dictionary. 173 If it's a single dictionary, its content is added to the metadata of all produced documents. 174 If it's a list, its length must match the number of sources, as they are zipped together. 175 For ByteStream objects, their `meta` is added to the output documents. 176 177 **Returns:** 178 179 - <code>dict\[str, list\[Document\]\]</code> – A dictionary containing: 180 - `documents`: A list of `Document` objects with empty content and associated metadata. 181 182 ## file_to_image 183 184 ### ImageFileToImageContent 185 186 Converts image files to ImageContent objects. 187 188 ### Usage example 189 190 ```python 191 from haystack.components.converters.image import ImageFileToImageContent 192 193 converter = ImageFileToImageContent() 194 195 sources = ["image.jpg", "another_image.png"] 196 197 image_contents = converter.run(sources=sources)["image_contents"] 198 print(image_contents) 199 200 # [ImageContent(base64_image='...', 201 # mime_type='image/jpeg', 202 # detail=None, 203 # meta={'file_path': 'image.jpg'}), 204 # ...] 205 ``` 206 207 #### __init__ 208 209 ```python 210 __init__( 211 *, 212 detail: Literal["auto", "high", "low"] | None = None, 213 size: tuple[int, int] | None = None 214 ) 215 ``` 216 217 Create the ImageFileToImageContent component. 218 219 **Parameters:** 220 221 - **detail** (<code>Literal['auto', 'high', 'low'] | None</code>) – Optional detail level of the image (only supported by OpenAI). One of "auto", "high", or "low". 222 This will be passed to the created ImageContent objects. 223 - **size** (<code>tuple\[int, int\] | None</code>) – If provided, resizes the image to fit within the specified dimensions (width, height) while 224 maintaining aspect ratio. This reduces file size, memory usage, and processing time, which is beneficial 225 when working with models that have resolution constraints or when transmitting images to remote services. 226 227 #### run 228 229 ```python 230 run( 231 sources: list[str | Path | ByteStream], 232 meta: dict[str, Any] | list[dict[str, Any]] | None = None, 233 *, 234 detail: Literal["auto", "high", "low"] | None = None, 235 size: tuple[int, int] | None = None 236 ) -> dict[str, list[ImageContent]] 237 ``` 238 239 Converts files to ImageContent objects. 240 241 **Parameters:** 242 243 - **sources** (<code>list\[str | Path | ByteStream\]</code>) – List of file paths or ByteStream objects to convert. 244 - **meta** (<code>dict\[str, Any\] | list\[dict\[str, Any\]\] | None</code>) – Optional metadata to attach to the ImageContent objects. 245 This value can be a list of dictionaries or a single dictionary. 246 If it's a single dictionary, its content is added to the metadata of all produced ImageContent objects. 247 If it's a list, its length must match the number of sources as they're zipped together. 248 For ByteStream objects, their `meta` is added to the output ImageContent objects. 249 - **detail** (<code>Literal['auto', 'high', 'low'] | None</code>) – Optional detail level of the image (only supported by OpenAI). One of "auto", "high", or "low". 250 This will be passed to the created ImageContent objects. 251 If not provided, the detail level will be the one set in the constructor. 252 - **size** (<code>tuple\[int, int\] | None</code>) – If provided, resizes the image to fit within the specified dimensions (width, height) while 253 maintaining aspect ratio. This reduces file size, memory usage, and processing time, which is beneficial 254 when working with models that have resolution constraints or when transmitting images to remote services. 255 If not provided, the size value will be the one set in the constructor. 256 257 **Returns:** 258 259 - <code>dict\[str, list\[ImageContent\]\]</code> – A dictionary with the following keys: 260 - `image_contents`: A list of ImageContent objects. 261 262 ## pdf_to_image 263 264 ### PDFToImageContent 265 266 Converts PDF files to ImageContent objects. 267 268 ### Usage example 269 270 ```python 271 from haystack.components.converters.image import PDFToImageContent 272 273 converter = PDFToImageContent() 274 275 sources = ["file.pdf", "another_file.pdf"] 276 277 image_contents = converter.run(sources=sources)["image_contents"] 278 print(image_contents) 279 280 # [ImageContent(base64_image='...', 281 # mime_type='application/pdf', 282 # detail=None, 283 # meta={'file_path': 'file.pdf', 'page_number': 1}), 284 # ...] 285 ``` 286 287 #### __init__ 288 289 ```python 290 __init__( 291 *, 292 detail: Literal["auto", "high", "low"] | None = None, 293 size: tuple[int, int] | None = None, 294 page_range: list[str | int] | None = None 295 ) 296 ``` 297 298 Create the PDFToImageContent component. 299 300 **Parameters:** 301 302 - **detail** (<code>Literal['auto', 'high', 'low'] | None</code>) – Optional detail level of the image (only supported by OpenAI). One of "auto", "high", or "low". 303 This will be passed to the created ImageContent objects. 304 - **size** (<code>tuple\[int, int\] | None</code>) – If provided, resizes the image to fit within the specified dimensions (width, height) while 305 maintaining aspect ratio. This reduces file size, memory usage, and processing time, which is beneficial 306 when working with models that have resolution constraints or when transmitting images to remote services. 307 - **page_range** (<code>list\[str | int\] | None</code>) – List of page numbers and/or page ranges to convert to images. Page numbers start at 1. 308 If None, all pages in the PDF will be converted. Pages outside the valid range (1 to number of pages) 309 will be skipped with a warning. For example, page_range=[1, 3] will convert only the first and third 310 pages of the document. It also accepts printable range strings, e.g.: ['1-3', '5', '8', '10-12'] 311 will convert pages 1, 2, 3, 5, 8, 10, 11, 12. 312 313 #### run 314 315 ```python 316 run( 317 sources: list[str | Path | ByteStream], 318 meta: dict[str, Any] | list[dict[str, Any]] | None = None, 319 *, 320 detail: Literal["auto", "high", "low"] | None = None, 321 size: tuple[int, int] | None = None, 322 page_range: list[str | int] | None = None 323 ) -> dict[str, list[ImageContent]] 324 ``` 325 326 Converts files to ImageContent objects. 327 328 **Parameters:** 329 330 - **sources** (<code>list\[str | Path | ByteStream\]</code>) – List of file paths or ByteStream objects to convert. 331 - **meta** (<code>dict\[str, Any\] | list\[dict\[str, Any\]\] | None</code>) – Optional metadata to attach to the ImageContent objects. 332 This value can be a list of dictionaries or a single dictionary. 333 If it's a single dictionary, its content is added to the metadata of all produced ImageContent objects. 334 If it's a list, its length must match the number of sources as they're zipped together. 335 For ByteStream objects, their `meta` is added to the output ImageContent objects. 336 - **detail** (<code>Literal['auto', 'high', 'low'] | None</code>) – Optional detail level of the image (only supported by OpenAI). One of "auto", "high", or "low". 337 This will be passed to the created ImageContent objects. 338 If not provided, the detail level will be the one set in the constructor. 339 - **size** (<code>tuple\[int, int\] | None</code>) – If provided, resizes the image to fit within the specified dimensions (width, height) while 340 maintaining aspect ratio. This reduces file size, memory usage, and processing time, which is beneficial 341 when working with models that have resolution constraints or when transmitting images to remote services. 342 If not provided, the size value will be the one set in the constructor. 343 - **page_range** (<code>list\[str | int\] | None</code>) – List of page numbers and/or page ranges to convert to images. Page numbers start at 1. 344 If None, all pages in the PDF will be converted. Pages outside the valid range (1 to number of pages) 345 will be skipped with a warning. For example, page_range=[1, 3] will convert only the first and third 346 pages of the document. It also accepts printable range strings, e.g.: ['1-3', '5', '8', '10-12'] 347 will convert pages 1, 2, 3, 5, 8, 10, 11, 12. 348 If not provided, the page_range value will be the one set in the constructor. 349 350 **Returns:** 351 352 - <code>dict\[str, list\[ImageContent\]\]</code> – A dictionary with the following keys: 353 - `image_contents`: A list of ImageContent objects.