markitdown.md
1 --- 2 title: "Markitdown" 3 id: integrations-markitdown 4 description: "Markitdown integration for Haystack" 5 slug: "/integrations-markitdown" 6 --- 7 8 9 ## haystack_integrations.components.converters.markitdown.markitdown_converter 10 11 ### MarkItDownConverter 12 13 Converts files to Haystack Documents using [MarkItDown](https://github.com/microsoft/markitdown). 14 15 MarkItDown is a Microsoft library that converts many file formats to Markdown, 16 including PDF, Word (.docx), PowerPoint (.pptx), Excel (.xlsx), HTML, images, 17 audio, and more. All processing is performed locally. 18 19 ### Usage example 20 21 ```python 22 from haystack_integrations.components.converters.markitdown import MarkItDownConverter 23 24 converter = MarkItDownConverter() 25 result = converter.run(sources=["document.pdf", "report.docx"]) 26 documents = result["documents"] 27 ``` 28 29 #### __init__ 30 31 ```python 32 __init__(store_full_path: bool = False) -> None 33 ``` 34 35 Initializes the MarkItDownConverter. 36 37 **Parameters:** 38 39 - **store_full_path** (<code>bool</code>) – If `True`, the full file path is stored in the Document metadata. 40 If `False`, only the file name is stored. Defaults to `False`. 41 42 #### run 43 44 ```python 45 run( 46 sources: list[str | Path | ByteStream], 47 meta: dict[str, Any] | list[dict[str, Any]] | None = None, 48 ) -> dict[str, list[Document]] 49 ``` 50 51 Converts files to Documents using MarkItDown. 52 53 **Parameters:** 54 55 - **sources** (<code>list\[str | Path | ByteStream\]</code>) – List of file paths or ByteStream objects to convert. 56 - **meta** (<code>dict\[str, Any\] | list\[dict\[str, Any\]\] | None</code>) – Optional metadata to attach to the Documents. Can be a single dict 57 applied to all Documents, or a list of dicts aligned with `sources`. 58 59 **Returns:** 60 61 - <code>dict\[str, list\[Document\]\]</code> – A dictionary with key `documents` containing the converted Documents.