Cradicle Explorer

/ docs-website / reference_versioned_docs / version-2.21 / haystack-api / image_converters_api.md
image_converters_api.md
  1  ---
  2  title: "Image Converters"
  3  id: image-converters-api
  4  description: "Various converters to transform image data from one format to another."
  5  slug: "/image-converters-api"
  6  ---
  7  
  8  <a id="document_to_image"></a>
  9  
 10  ## Module document\_to\_image
 11  
 12  <a id="document_to_image.DocumentToImageContent"></a>
 13  
 14  ### DocumentToImageContent
 15  
 16  Converts documents sourced from PDF and image files into ImageContents.
 17  
 18  This component processes a list of documents and extracts visual content from supported file formats, converting
 19  them into ImageContents that can be used for multimodal AI tasks. It handles both direct image files and PDF
 20  documents by extracting specific pages as images.
 21  
 22  Documents are expected to have metadata containing:
 23  - The `file_path_meta_field` key with a valid file path that exists when combined with `root_path`
 24  - A supported image format (MIME type must be one of the supported image types)
 25  - For PDF files, a `page_number` key specifying which page to extract
 26  
 27  ### Usage example
 28      ```python
 29      from haystack import Document
 30      from haystack.components.converters.image.document_to_image import DocumentToImageContent
 31  
 32      converter = DocumentToImageContent(
 33          file_path_meta_field="file_path",
 34          root_path="/data/files",
 35          detail="high",
 36          size=(800, 600)
 37      )
 38  
 39      documents = [
 40          Document(content="Optional description of image.jpg", meta={"file_path": "image.jpg"}),
 41          Document(content="Text content of page 1 of doc.pdf", meta={"file_path": "doc.pdf", "page_number": 1})
 42      ]
 43  
 44      result = converter.run(documents)
 45      image_contents = result["image_contents"]
 46      # [ImageContent(
 47      #    base64_image='/9j/4A...', mime_type='image/jpeg', detail='high', meta={'file_path': 'image.jpg'}
 48      #  ),
 49      #  ImageContent(
 50      #    base64_image='/9j/4A...', mime_type='image/jpeg', detail='high',
 51      #    meta={'page_number': 1, 'file_path': 'doc.pdf'}
 52      #  )]
 53      ```
 54  
 55  <a id="document_to_image.DocumentToImageContent.__init__"></a>
 56  
 57  #### DocumentToImageContent.\_\_init\_\_
 58  
 59  ```python
 60  def __init__(*,
 61               file_path_meta_field: str = "file_path",
 62               root_path: Optional[str] = None,
 63               detail: Optional[Literal["auto", "high", "low"]] = None,
 64               size: Optional[tuple[int, int]] = None)
 65  ```
 66  
 67  Initialize the DocumentToImageContent component.
 68  
 69  **Arguments**:
 70  
 71  - `file_path_meta_field`: The metadata field in the Document that contains the file path to the image or PDF.
 72  - `root_path`: The root directory path where document files are located. If provided, file paths in
 73  document metadata will be resolved relative to this path. If None, file paths are treated as absolute paths.
 74  - `detail`: Optional detail level of the image (only supported by OpenAI). Can be "auto", "high", or "low".
 75  This will be passed to the created ImageContent objects.
 76  - `size`: If provided, resizes the image to fit within the specified dimensions (width, height) while
 77  maintaining aspect ratio. This reduces file size, memory usage, and processing time, which is beneficial
 78  when working with models that have resolution constraints or when transmitting images to remote services.
 79  
 80  <a id="document_to_image.DocumentToImageContent.run"></a>
 81  
 82  #### DocumentToImageContent.run
 83  
 84  ```python
 85  @component.output_types(image_contents=list[Optional[ImageContent]])
 86  def run(documents: list[Document]) -> dict[str, list[Optional[ImageContent]]]
 87  ```
 88  
 89  Convert documents with image or PDF sources into ImageContent objects.
 90  
 91  This method processes the input documents, extracting images from supported file formats and converting them
 92  into ImageContent objects.
 93  
 94  **Arguments**:
 95  
 96  - `documents`: A list of documents to process. Each document should have metadata containing at minimum
 97  a 'file_path_meta_field' key. PDF documents additionally require a 'page_number' key to specify which
 98  page to convert.
 99  
100  **Raises**:
101  
102  - `ValueError`: If any document is missing the required metadata keys, has an invalid file path, or has an unsupported
103  MIME type. The error message will specify which document and what information is missing or incorrect.
104  
105  **Returns**:
106  
107  Dictionary containing one key:
108  - "image_contents": ImageContents created from the processed documents. These contain base64-encoded image
109  data and metadata. The order corresponds to order of input documents.
110  
111  <a id="file_to_document"></a>
112  
113  ## Module file\_to\_document
114  
115  <a id="file_to_document.ImageFileToDocument"></a>
116  
117  ### ImageFileToDocument
118  
119  Converts image file references into empty Document objects with associated metadata.
120  
121  This component is useful in pipelines where image file paths need to be wrapped in `Document` objects to be
122  processed by downstream components such as the `SentenceTransformersImageDocumentEmbedder`.
123  
124  It does **not** extract any content from the image files, instead it creates `Document` objects with `None` as
125  their content and attaches metadata such as file path and any user-provided values.
126  
127  ### Usage example
128  ```python
129  from haystack.components.converters.image import ImageFileToDocument
130  
131  converter = ImageFileToDocument()
132  
133  sources = ["image.jpg", "another_image.png"]
134  
135  result = converter.run(sources=sources)
136  documents = result["documents"]
137  
138  print(documents)
139  
140  # [Document(id=..., meta: {'file_path': 'image.jpg'}),
141  # Document(id=..., meta: {'file_path': 'another_image.png'})]
142  ```
143  
144  <a id="file_to_document.ImageFileToDocument.__init__"></a>
145  
146  #### ImageFileToDocument.\_\_init\_\_
147  
148  ```python
149  def __init__(*, store_full_path: bool = False)
150  ```
151  
152  Initialize the ImageFileToDocument component.
153  
154  **Arguments**:
155  
156  - `store_full_path`: If True, the full path of the file is stored in the metadata of the document.
157  If False, only the file name is stored.
158  
159  <a id="file_to_document.ImageFileToDocument.run"></a>
160  
161  #### ImageFileToDocument.run
162  
163  ```python
164  @component.output_types(documents=list[Document])
165  def run(
166      *,
167      sources: list[Union[str, Path, ByteStream]],
168      meta: Optional[Union[dict[str, Any], list[dict[str, Any]]]] = None
169  ) -> dict[str, list[Document]]
170  ```
171  
172  Convert image files into empty Document objects with metadata.
173  
174  This method accepts image file references (as file paths or ByteStreams) and creates `Document` objects
175  without content. These documents are enriched with metadata derived from the input source and optional
176  user-provided metadata.
177  
178  **Arguments**:
179  
180  - `sources`: List of file paths or ByteStream objects to convert.
181  - `meta`: Optional metadata to attach to the documents.
182  This value can be a list of dictionaries or a single dictionary.
183  If it's a single dictionary, its content is added to the metadata of all produced documents.
184  If it's a list, its length must match the number of sources, as they are zipped together.
185  For ByteStream objects, their `meta` is added to the output documents.
186  
187  **Returns**:
188  
189  A dictionary containing:
190  - `documents`: A list of `Document` objects with empty content and associated metadata.
191  
192  <a id="file_to_image"></a>
193  
194  ## Module file\_to\_image
195  
196  <a id="file_to_image.ImageFileToImageContent"></a>
197  
198  ### ImageFileToImageContent
199  
200  Converts image files to ImageContent objects.
201  
202  ### Usage example
203  ```python
204  from haystack.components.converters.image import ImageFileToImageContent
205  
206  converter = ImageFileToImageContent()
207  
208  sources = ["image.jpg", "another_image.png"]
209  
210  image_contents = converter.run(sources=sources)["image_contents"]
211  print(image_contents)
212  
213  # [ImageContent(base64_image='...',
214  #               mime_type='image/jpeg',
215  #               detail=None,
216  #               meta={'file_path': 'image.jpg'}),
217  #  ...]
218  ```
219  
220  <a id="file_to_image.ImageFileToImageContent.__init__"></a>
221  
222  #### ImageFileToImageContent.\_\_init\_\_
223  
224  ```python
225  def __init__(*,
226               detail: Optional[Literal["auto", "high", "low"]] = None,
227               size: Optional[tuple[int, int]] = None)
228  ```
229  
230  Create the ImageFileToImageContent component.
231  
232  **Arguments**:
233  
234  - `detail`: Optional detail level of the image (only supported by OpenAI). One of "auto", "high", or "low".
235  This will be passed to the created ImageContent objects.
236  - `size`: If provided, resizes the image to fit within the specified dimensions (width, height) while
237  maintaining aspect ratio. This reduces file size, memory usage, and processing time, which is beneficial
238  when working with models that have resolution constraints or when transmitting images to remote services.
239  
240  <a id="file_to_image.ImageFileToImageContent.run"></a>
241  
242  #### ImageFileToImageContent.run
243  
244  ```python
245  @component.output_types(image_contents=list[ImageContent])
246  def run(sources: list[Union[str, Path, ByteStream]],
247          meta: Optional[Union[dict[str, Any], list[dict[str, Any]]]] = None,
248          *,
249          detail: Optional[Literal["auto", "high", "low"]] = None,
250          size: Optional[tuple[int,
251                               int]] = None) -> dict[str, list[ImageContent]]
252  ```
253  
254  Converts files to ImageContent objects.
255  
256  **Arguments**:
257  
258  - `sources`: List of file paths or ByteStream objects to convert.
259  - `meta`: Optional metadata to attach to the ImageContent objects.
260  This value can be a list of dictionaries or a single dictionary.
261  If it's a single dictionary, its content is added to the metadata of all produced ImageContent objects.
262  If it's a list, its length must match the number of sources as they're zipped together.
263  For ByteStream objects, their `meta` is added to the output ImageContent objects.
264  - `detail`: Optional detail level of the image (only supported by OpenAI). One of "auto", "high", or "low".
265  This will be passed to the created ImageContent objects.
266  If not provided, the detail level will be the one set in the constructor.
267  - `size`: If provided, resizes the image to fit within the specified dimensions (width, height) while
268  maintaining aspect ratio. This reduces file size, memory usage, and processing time, which is beneficial
269  when working with models that have resolution constraints or when transmitting images to remote services.
270  If not provided, the size value will be the one set in the constructor.
271  
272  **Returns**:
273  
274  A dictionary with the following keys:
275  - `image_contents`: A list of ImageContent objects.
276  
277  <a id="pdf_to_image"></a>
278  
279  ## Module pdf\_to\_image
280  
281  <a id="pdf_to_image.PDFToImageContent"></a>
282  
283  ### PDFToImageContent
284  
285  Converts PDF files to ImageContent objects.
286  
287  ### Usage example
288  ```python
289  from haystack.components.converters.image import PDFToImageContent
290  
291  converter = PDFToImageContent()
292  
293  sources = ["file.pdf", "another_file.pdf"]
294  
295  image_contents = converter.run(sources=sources)["image_contents"]
296  print(image_contents)
297  
298  # [ImageContent(base64_image='...',
299  #               mime_type='application/pdf',
300  #               detail=None,
301  #               meta={'file_path': 'file.pdf', 'page_number': 1}),
302  #  ...]
303  ```
304  
305  <a id="pdf_to_image.PDFToImageContent.__init__"></a>
306  
307  #### PDFToImageContent.\_\_init\_\_
308  
309  ```python
310  def __init__(*,
311               detail: Optional[Literal["auto", "high", "low"]] = None,
312               size: Optional[tuple[int, int]] = None,
313               page_range: Optional[list[Union[str, int]]] = None)
314  ```
315  
316  Create the PDFToImageContent component.
317  
318  **Arguments**:
319  
320  - `detail`: Optional detail level of the image (only supported by OpenAI). One of "auto", "high", or "low".
321  This will be passed to the created ImageContent objects.
322  - `size`: If provided, resizes the image to fit within the specified dimensions (width, height) while
323  maintaining aspect ratio. This reduces file size, memory usage, and processing time, which is beneficial
324  when working with models that have resolution constraints or when transmitting images to remote services.
325  - `page_range`: List of page numbers and/or page ranges to convert to images. Page numbers start at 1.
326  If None, all pages in the PDF will be converted. Pages outside the valid range (1 to number of pages)
327  will be skipped with a warning. For example, page_range=[1, 3] will convert only the first and third
328  pages of the document. It also accepts printable range strings, e.g.:  ['1-3', '5', '8', '10-12']
329  will convert pages 1, 2, 3, 5, 8, 10, 11, 12.
330  
331  <a id="pdf_to_image.PDFToImageContent.run"></a>
332  
333  #### PDFToImageContent.run
334  
335  ```python
336  @component.output_types(image_contents=list[ImageContent])
337  def run(
338      sources: list[Union[str, Path, ByteStream]],
339      meta: Optional[Union[dict[str, Any], list[dict[str, Any]]]] = None,
340      *,
341      detail: Optional[Literal["auto", "high", "low"]] = None,
342      size: Optional[tuple[int, int]] = None,
343      page_range: Optional[list[Union[str, int]]] = None
344  ) -> dict[str, list[ImageContent]]
345  ```
346  
347  Converts files to ImageContent objects.
348  
349  **Arguments**:
350  
351  - `sources`: List of file paths or ByteStream objects to convert.
352  - `meta`: Optional metadata to attach to the ImageContent objects.
353  This value can be a list of dictionaries or a single dictionary.
354  If it's a single dictionary, its content is added to the metadata of all produced ImageContent objects.
355  If it's a list, its length must match the number of sources as they're zipped together.
356  For ByteStream objects, their `meta` is added to the output ImageContent objects.
357  - `detail`: Optional detail level of the image (only supported by OpenAI). One of "auto", "high", or "low".
358  This will be passed to the created ImageContent objects.
359  If not provided, the detail level will be the one set in the constructor.
360  - `size`: If provided, resizes the image to fit within the specified dimensions (width, height) while
361  maintaining aspect ratio. This reduces file size, memory usage, and processing time, which is beneficial
362  when working with models that have resolution constraints or when transmitting images to remote services.
363  If not provided, the size value will be the one set in the constructor.
364  - `page_range`: List of page numbers and/or page ranges to convert to images. Page numbers start at 1.
365  If None, all pages in the PDF will be converted. Pages outside the valid range (1 to number of pages)
366  will be skipped with a warning. For example, page_range=[1, 3] will convert only the first and third
367  pages of the document. It also accepts printable range strings, e.g.:  ['1-3', '5', '8', '10-12']
368  will convert pages 1, 2, 3, 5, 8, 10, 11, 12.
369  If not provided, the page_range value will be the one set in the constructor.
370  
371  **Returns**:
372  
373  A dictionary with the following keys:
374  - `image_contents`: A list of ImageContent objects.
375