Cradicle Explorer

/ docs-website / reference_versioned_docs / version-2.26 / haystack-api / image_converters_api.md
image_converters_api.md
  1  ---
  2  title: "Image Converters"
  3  id: image-converters-api
  4  description: "Various converters to transform image data from one format to another."
  5  slug: "/image-converters-api"
  6  ---
  7  
  8  
  9  ## document_to_image
 10  
 11  ### DocumentToImageContent
 12  
 13  Converts documents sourced from PDF and image files into ImageContents.
 14  
 15  This component processes a list of documents and extracts visual content from supported file formats, converting
 16  them into ImageContents that can be used for multimodal AI tasks. It handles both direct image files and PDF
 17  documents by extracting specific pages as images.
 18  
 19  Documents are expected to have metadata containing:
 20  
 21  - The `file_path_meta_field` key with a valid file path that exists when combined with `root_path`
 22  - A supported image format (MIME type must be one of the supported image types)
 23  - For PDF files, a `page_number` key specifying which page to extract
 24  
 25  ### Usage example
 26  
 27  ````
 28  ```python
 29  from haystack import Document
 30  from haystack.components.converters.image.document_to_image import DocumentToImageContent
 31  
 32  converter = DocumentToImageContent(
 33      file_path_meta_field="file_path",
 34      root_path="/data/files",
 35      detail="high",
 36      size=(800, 600)
 37  )
 38  
 39  documents = [
 40      Document(content="Optional description of image.jpg", meta={"file_path": "image.jpg"}),
 41      Document(content="Text content of page 1 of doc.pdf", meta={"file_path": "doc.pdf", "page_number": 1})
 42  ]
 43  
 44  result = converter.run(documents)
 45  image_contents = result["image_contents"]
 46  # [ImageContent(
 47  #    base64_image='/9j/4A...', mime_type='image/jpeg', detail='high', meta={'file_path': 'image.jpg'}
 48  #  ),
 49  #  ImageContent(
 50  #    base64_image='/9j/4A...', mime_type='image/jpeg', detail='high',
 51  #    meta={'page_number': 1, 'file_path': 'doc.pdf'}
 52  #  )]
 53  ```
 54  ````
 55  
 56  #### __init__
 57  
 58  ```python
 59  __init__(
 60      *,
 61      file_path_meta_field: str = "file_path",
 62      root_path: str | None = None,
 63      detail: Literal["auto", "high", "low"] | None = None,
 64      size: tuple[int, int] | None = None
 65  )
 66  ```
 67  
 68  Initialize the DocumentToImageContent component.
 69  
 70  **Parameters:**
 71  
 72  - **file_path_meta_field** (<code>str</code>) – The metadata field in the Document that contains the file path to the image or PDF.
 73  - **root_path** (<code>str | None</code>) – The root directory path where document files are located. If provided, file paths in
 74    document metadata will be resolved relative to this path. If None, file paths are treated as absolute paths.
 75  - **detail** (<code>Literal['auto', 'high', 'low'] | None</code>) – Optional detail level of the image (only supported by OpenAI). Can be "auto", "high", or "low".
 76    This will be passed to the created ImageContent objects.
 77  - **size** (<code>tuple\[int, int\] | None</code>) – If provided, resizes the image to fit within the specified dimensions (width, height) while
 78    maintaining aspect ratio. This reduces file size, memory usage, and processing time, which is beneficial
 79    when working with models that have resolution constraints or when transmitting images to remote services.
 80  
 81  #### run
 82  
 83  ```python
 84  run(documents: list[Document]) -> dict[str, list[ImageContent | None]]
 85  ```
 86  
 87  Convert documents with image or PDF sources into ImageContent objects.
 88  
 89  This method processes the input documents, extracting images from supported file formats and converting them
 90  into ImageContent objects.
 91  
 92  **Parameters:**
 93  
 94  - **documents** (<code>list\[Document\]</code>) – A list of documents to process. Each document should have metadata containing at minimum
 95    a 'file_path_meta_field' key. PDF documents additionally require a 'page_number' key to specify which
 96    page to convert.
 97  
 98  **Returns:**
 99  
100  - <code>dict\[str, list\[ImageContent | None\]\]</code> – Dictionary containing one key:
101  - "image_contents": ImageContents created from the processed documents. These contain base64-encoded image
102    data and metadata. The order corresponds to order of input documents.
103  
104  **Raises:**
105  
106  - <code>ValueError</code> – If any document is missing the required metadata keys, has an invalid file path, or has an unsupported
107    MIME type. The error message will specify which document and what information is missing or incorrect.
108  
109  ## file_to_document
110  
111  ### ImageFileToDocument
112  
113  Converts image file references into empty Document objects with associated metadata.
114  
115  This component is useful in pipelines where image file paths need to be wrapped in `Document` objects to be
116  processed by downstream components such as the `SentenceTransformersImageDocumentEmbedder`.
117  
118  It does **not** extract any content from the image files, instead it creates `Document` objects with `None` as
119  their content and attaches metadata such as file path and any user-provided values.
120  
121  ### Usage example
122  
123  ```python
124  from haystack.components.converters.image import ImageFileToDocument
125  
126  converter = ImageFileToDocument()
127  
128  sources = ["image.jpg", "another_image.png"]
129  
130  result = converter.run(sources=sources)
131  documents = result["documents"]
132  
133  print(documents)
134  
135  # [Document(id=..., meta: {'file_path': 'image.jpg'}),
136  # Document(id=..., meta: {'file_path': 'another_image.png'})]
137  ```
138  
139  #### __init__
140  
141  ```python
142  __init__(*, store_full_path: bool = False)
143  ```
144  
145  Initialize the ImageFileToDocument component.
146  
147  **Parameters:**
148  
149  - **store_full_path** (<code>bool</code>) – If True, the full path of the file is stored in the metadata of the document.
150    If False, only the file name is stored.
151  
152  #### run
153  
154  ```python
155  run(
156      *,
157      sources: list[str | Path | ByteStream],
158      meta: dict[str, Any] | list[dict[str, Any]] | None = None
159  ) -> dict[str, list[Document]]
160  ```
161  
162  Convert image files into empty Document objects with metadata.
163  
164  This method accepts image file references (as file paths or ByteStreams) and creates `Document` objects
165  without content. These documents are enriched with metadata derived from the input source and optional
166  user-provided metadata.
167  
168  **Parameters:**
169  
170  - **sources** (<code>list\[str | Path | ByteStream\]</code>) – List of file paths or ByteStream objects to convert.
171  - **meta** (<code>dict\[str, Any\] | list\[dict\[str, Any\]\] | None</code>) – Optional metadata to attach to the documents.
172    This value can be a list of dictionaries or a single dictionary.
173    If it's a single dictionary, its content is added to the metadata of all produced documents.
174    If it's a list, its length must match the number of sources, as they are zipped together.
175    For ByteStream objects, their `meta` is added to the output documents.
176  
177  **Returns:**
178  
179  - <code>dict\[str, list\[Document\]\]</code> – A dictionary containing:
180  - `documents`: A list of `Document` objects with empty content and associated metadata.
181  
182  ## file_to_image
183  
184  ### ImageFileToImageContent
185  
186  Converts image files to ImageContent objects.
187  
188  ### Usage example
189  
190  ```python
191  from haystack.components.converters.image import ImageFileToImageContent
192  
193  converter = ImageFileToImageContent()
194  
195  sources = ["image.jpg", "another_image.png"]
196  
197  image_contents = converter.run(sources=sources)["image_contents"]
198  print(image_contents)
199  
200  # [ImageContent(base64_image='...',
201  #               mime_type='image/jpeg',
202  #               detail=None,
203  #               meta={'file_path': 'image.jpg'}),
204  #  ...]
205  ```
206  
207  #### __init__
208  
209  ```python
210  __init__(
211      *,
212      detail: Literal["auto", "high", "low"] | None = None,
213      size: tuple[int, int] | None = None
214  )
215  ```
216  
217  Create the ImageFileToImageContent component.
218  
219  **Parameters:**
220  
221  - **detail** (<code>Literal['auto', 'high', 'low'] | None</code>) – Optional detail level of the image (only supported by OpenAI). One of "auto", "high", or "low".
222    This will be passed to the created ImageContent objects.
223  - **size** (<code>tuple\[int, int\] | None</code>) – If provided, resizes the image to fit within the specified dimensions (width, height) while
224    maintaining aspect ratio. This reduces file size, memory usage, and processing time, which is beneficial
225    when working with models that have resolution constraints or when transmitting images to remote services.
226  
227  #### run
228  
229  ```python
230  run(
231      sources: list[str | Path | ByteStream],
232      meta: dict[str, Any] | list[dict[str, Any]] | None = None,
233      *,
234      detail: Literal["auto", "high", "low"] | None = None,
235      size: tuple[int, int] | None = None
236  ) -> dict[str, list[ImageContent]]
237  ```
238  
239  Converts files to ImageContent objects.
240  
241  **Parameters:**
242  
243  - **sources** (<code>list\[str | Path | ByteStream\]</code>) – List of file paths or ByteStream objects to convert.
244  - **meta** (<code>dict\[str, Any\] | list\[dict\[str, Any\]\] | None</code>) – Optional metadata to attach to the ImageContent objects.
245    This value can be a list of dictionaries or a single dictionary.
246    If it's a single dictionary, its content is added to the metadata of all produced ImageContent objects.
247    If it's a list, its length must match the number of sources as they're zipped together.
248    For ByteStream objects, their `meta` is added to the output ImageContent objects.
249  - **detail** (<code>Literal['auto', 'high', 'low'] | None</code>) – Optional detail level of the image (only supported by OpenAI). One of "auto", "high", or "low".
250    This will be passed to the created ImageContent objects.
251    If not provided, the detail level will be the one set in the constructor.
252  - **size** (<code>tuple\[int, int\] | None</code>) – If provided, resizes the image to fit within the specified dimensions (width, height) while
253    maintaining aspect ratio. This reduces file size, memory usage, and processing time, which is beneficial
254    when working with models that have resolution constraints or when transmitting images to remote services.
255    If not provided, the size value will be the one set in the constructor.
256  
257  **Returns:**
258  
259  - <code>dict\[str, list\[ImageContent\]\]</code> – A dictionary with the following keys:
260  - `image_contents`: A list of ImageContent objects.
261  
262  ## pdf_to_image
263  
264  ### PDFToImageContent
265  
266  Converts PDF files to ImageContent objects.
267  
268  ### Usage example
269  
270  ```python
271  from haystack.components.converters.image import PDFToImageContent
272  
273  converter = PDFToImageContent()
274  
275  sources = ["file.pdf", "another_file.pdf"]
276  
277  image_contents = converter.run(sources=sources)["image_contents"]
278  print(image_contents)
279  
280  # [ImageContent(base64_image='...',
281  #               mime_type='application/pdf',
282  #               detail=None,
283  #               meta={'file_path': 'file.pdf', 'page_number': 1}),
284  #  ...]
285  ```
286  
287  #### __init__
288  
289  ```python
290  __init__(
291      *,
292      detail: Literal["auto", "high", "low"] | None = None,
293      size: tuple[int, int] | None = None,
294      page_range: list[str | int] | None = None
295  )
296  ```
297  
298  Create the PDFToImageContent component.
299  
300  **Parameters:**
301  
302  - **detail** (<code>Literal['auto', 'high', 'low'] | None</code>) – Optional detail level of the image (only supported by OpenAI). One of "auto", "high", or "low".
303    This will be passed to the created ImageContent objects.
304  - **size** (<code>tuple\[int, int\] | None</code>) – If provided, resizes the image to fit within the specified dimensions (width, height) while
305    maintaining aspect ratio. This reduces file size, memory usage, and processing time, which is beneficial
306    when working with models that have resolution constraints or when transmitting images to remote services.
307  - **page_range** (<code>list\[str | int\] | None</code>) – List of page numbers and/or page ranges to convert to images. Page numbers start at 1.
308    If None, all pages in the PDF will be converted. Pages outside the valid range (1 to number of pages)
309    will be skipped with a warning. For example, page_range=[1, 3] will convert only the first and third
310    pages of the document. It also accepts printable range strings, e.g.: ['1-3', '5', '8', '10-12']
311    will convert pages 1, 2, 3, 5, 8, 10, 11, 12.
312  
313  #### run
314  
315  ```python
316  run(
317      sources: list[str | Path | ByteStream],
318      meta: dict[str, Any] | list[dict[str, Any]] | None = None,
319      *,
320      detail: Literal["auto", "high", "low"] | None = None,
321      size: tuple[int, int] | None = None,
322      page_range: list[str | int] | None = None
323  ) -> dict[str, list[ImageContent]]
324  ```
325  
326  Converts files to ImageContent objects.
327  
328  **Parameters:**
329  
330  - **sources** (<code>list\[str | Path | ByteStream\]</code>) – List of file paths or ByteStream objects to convert.
331  - **meta** (<code>dict\[str, Any\] | list\[dict\[str, Any\]\] | None</code>) – Optional metadata to attach to the ImageContent objects.
332    This value can be a list of dictionaries or a single dictionary.
333    If it's a single dictionary, its content is added to the metadata of all produced ImageContent objects.
334    If it's a list, its length must match the number of sources as they're zipped together.
335    For ByteStream objects, their `meta` is added to the output ImageContent objects.
336  - **detail** (<code>Literal['auto', 'high', 'low'] | None</code>) – Optional detail level of the image (only supported by OpenAI). One of "auto", "high", or "low".
337    This will be passed to the created ImageContent objects.
338    If not provided, the detail level will be the one set in the constructor.
339  - **size** (<code>tuple\[int, int\] | None</code>) – If provided, resizes the image to fit within the specified dimensions (width, height) while
340    maintaining aspect ratio. This reduces file size, memory usage, and processing time, which is beneficial
341    when working with models that have resolution constraints or when transmitting images to remote services.
342    If not provided, the size value will be the one set in the constructor.
343  - **page_range** (<code>list\[str | int\] | None</code>) – List of page numbers and/or page ranges to convert to images. Page numbers start at 1.
344    If None, all pages in the PDF will be converted. Pages outside the valid range (1 to number of pages)
345    will be skipped with a warning. For example, page_range=[1, 3] will convert only the first and third
346    pages of the document. It also accepts printable range strings, e.g.: ['1-3', '5', '8', '10-12']
347    will convert pages 1, 2, 3, 5, 8, 10, 11, 12.
348    If not provided, the page_range value will be the one set in the constructor.
349  
350  **Returns:**
351  
352  - <code>dict\[str, list\[ImageContent\]\]</code> – A dictionary with the following keys:
353  - `image_contents`: A list of ImageContent objects.