filetyperouter.mdx
1 --- 2 title: "FileTypeRouter" 3 id: filetyperouter 4 slug: "/filetyperouter" 5 description: "Use this Router in indexing pipelines to route file paths or byte streams based on their type to different outputs for further processing." 6 --- 7 8 # FileTypeRouter 9 10 Use this Router in indexing pipelines to route file paths or byte streams based on their type to different outputs for further processing. 11 12 <div className="key-value-table"> 13 14 | | | 15 | --- | --- | 16 | **Most common position in a pipeline** | As the first component preprocessing data followed by [Converters](../converters.mdx) | 17 | **Mandatory init variables** | `mime_types`: A list of MIME types or regex patterns for classification | 18 | **Mandatory run variables** | `sources`: A list of file paths or byte streams to categorize | 19 | **Output variables** | `unclassified`: A list of uncategorized file paths or [byte streams](../../concepts/data-classes.mdx#bytestream) <br /> <br />`mime_types`: For example "text/plain", "text/html", "application/pdf", "text/markdown", "audio/x-wav", "image/jpeg": List of categorized file paths or byte streams | 20 | **API reference** | [Routers](/reference/routers-api) | 21 | **GitHub link** | https://github.com/deepset-ai/haystack/blob/main/haystack/components/routers/file_type_router.py | 22 23 </div> 24 25 ## Overview 26 27 `FileTypeRouter` routes file paths or byte streams based on their type, for example, plain text, jpeg image, or audio wave. For file paths, it infers MIME types from their extensions, while for byte streams, it determines MIME types based on the provided metadata. 28 29 When initializing the component, you specify the set of MIME types to route to separate outputs. To do this, set the `mime_types` parameter to a list of types, for example: `["text/plain", "audio/x-wav", "image/jpeg"]`. Types that are not listed are routed to an output named “unclassified”. 30 31 ## Usage 32 33 ### On its own 34 35 Below is an example that uses the `FileTypeRouter` to rank two simple documents: 36 37 ```python 38 from haystack import Document 39 from haystack.components.routers import FileTypeRouter 40 41 router = FileTypeRouter(mime_types=["text/plain"]) 42 router.run(sources=["text-file-will-be-added.txt", "pdf-will-not-ne-added.pdf"]) 43 ``` 44 45 ### In a pipeline 46 47 Below is an example of a pipeline that uses a `FileTypeRouter` to forward only plain text files to a `DocumentSplitter` and then a `DocumentWriter`. Only the content of plain text files gets added to the `InMemoryDocumentStore`, but not the content of files of any other type. As an alternative, you could add a `PyPDFConverter` to the pipeline and use the `FileTypeRouter` to route PDFs to it so that it converts them to documents. 48 49 ```python 50 from haystack import Pipeline 51 from haystack.components.routers import FileTypeRouter 52 from haystack.document_stores.in_memory import InMemoryDocumentStore 53 from haystack.components.converters import TextFileToDocument 54 from haystack.components.preprocessors import DocumentSplitter 55 from haystack.components.writers import DocumentWriter 56 57 document_store = InMemoryDocumentStore() 58 p = Pipeline() 59 p.add_component( 60 instance=FileTypeRouter(mime_types=["text/plain"]), 61 name="file_type_router", 62 ) 63 p.add_component(instance=TextFileToDocument(), name="text_file_converter") 64 p.add_component(instance=DocumentSplitter(), name="splitter") 65 p.add_component(instance=DocumentWriter(document_store=document_store), name="writer") 66 p.connect("file_type_router.text/plain", "text_file_converter.sources") 67 p.connect("text_file_converter.documents", "splitter.documents") 68 p.connect("splitter.documents", "writer.documents") 69 p.run( 70 { 71 "file_type_router": { 72 "sources": ["text-file-will-be-added.txt", "pdf-will-not-be-added.pdf"], 73 }, 74 }, 75 ) 76 ```