/ docs-website / versioned_docs / version-2.21 / pipeline-components / converters / xlsxtodocument.mdx
xlsxtodocument.mdx
1 --- 2 title: "XLSXToDocument" 3 id: xlsxtodocument 4 slug: "/xlsxtodocument" 5 description: "Converts Excel files into documents." 6 --- 7 8 # XLSXToDocument 9 10 Converts Excel files into documents. 11 12 <div className="key-value-table"> 13 14 | | | 15 | --- | --- | 16 | **Most common position in a pipeline** | Before [PreProcessors](../preprocessors.mdx) or right at the beginning of an indexing pipeline | 17 | **Mandatory run variables** | `sources`: File paths or [`ByteStream`](../../concepts/data-classes.mdx#bytestream) objects | 18 | **Output variables** | `documents`: A list of documents | 19 | **API reference** | [Converters](/reference/converters-api) | 20 | **GitHub link** | https://github.com/deepset-ai/haystack/blob/main/haystack/components/converters/xlsx.py | 21 22 </div> 23 24 ## Overview 25 26 The `XLSXToDocument` component converts XLSX files into Haystack Documents with a CSV (default) or Markdown format. It takes a list of file paths or [`ByteStream`](../../concepts/data-classes.mdx#bytestream) objects as input and outputs the converted result as a list of documents. Optionally, you can attach metadata to the documents through the `meta` input parameter. 27 28 To see the additional parameters that you can specify with the component initialization, check out the [API Reference](/reference/converters-api#xlsxtodocument). 29 30 ## Usage 31 32 First, install the openpyxl and tabulate packages to start using this converter: 33 34 ```shell 35 pip install pandas openpyxl 36 pip install tabulate 37 ``` 38 39 ### On its own 40 41 ```python 42 from haystack.components.converters import XLSXToDocument 43 44 converter = XLSXToDocument() 45 results = converter.run( 46 sources=["sample.xlsx"], 47 meta={"date_added": datetime.now().isoformat()}, 48 ) 49 documents = results["documents"] 50 print(documents[0].content) 51 ## ",A,B\n1,col_a,col_b\n2,1.5,test\n" 52 ``` 53 54 ### In a pipeline 55 56 ```python 57 from haystack import Pipeline 58 from haystack.document_stores.in_memory import InMemoryDocumentStore 59 from haystack.components.converters import XLSXToDocument 60 from haystack.components.preprocessors import DocumentCleaner 61 from haystack.components.preprocessors import DocumentSplitter 62 from haystack.components.writers import DocumentWriter 63 64 document_store = InMemoryDocumentStore() 65 66 pipeline = Pipeline() 67 pipeline.add_component("converter", XLSXToDocument()) 68 pipeline.add_component("cleaner", DocumentCleaner()) 69 pipeline.add_component( 70 "splitter", 71 DocumentSplitter(split_by="sentence", split_length=5), 72 ) 73 pipeline.add_component("writer", DocumentWriter(document_store=document_store)) 74 pipeline.connect("converter", "cleaner") 75 pipeline.connect("cleaner", "splitter") 76 pipeline.connect("splitter", "writer") 77 78 pipeline.run({"converter": {"sources": file_names}}) 79 ```