Cradicle Explorer

/ docs-website / docs / pipeline-components / routers / filetyperouter.mdx
filetyperouter.mdx
 1  ---
 2  title: "FileTypeRouter"
 3  id: filetyperouter
 4  slug: "/filetyperouter"
 5  description: "Use this Router in indexing pipelines to route file paths or byte streams based on their type to different outputs for further processing."
 6  ---
 7  
 8  # FileTypeRouter
 9  
10  Use this Router in indexing pipelines to route file paths or byte streams based on their type to different outputs for further processing.
11  
12  <div className="key-value-table">
13  
14  |  |  |
15  | --- | --- |
16  | **Most common position in a pipeline** | As the first component preprocessing data followed by [Converters](../converters.mdx) |
17  | **Mandatory init variables** | `mime_types`: A list of MIME types or regex patterns for classification |
18  | **Mandatory run variables** | `sources`: A list of file paths or byte streams to categorize |
19  | **Output variables** | `unclassified`: A list of uncategorized file paths or [byte streams](../../concepts/data-classes.mdx#bytestream)  <br /> <br />`mime_types`: For example "text/plain", "text/html", "application/pdf", "text/markdown", "audio/x-wav", "image/jpeg": List of categorized file paths or byte streams |
20  | **API reference** | [Routers](/reference/routers-api) |
21  | **GitHub link** | https://github.com/deepset-ai/haystack/blob/main/haystack/components/routers/file_type_router.py |
22  
23  </div>
24  
25  ## Overview
26  
27  `FileTypeRouter` routes file paths or byte streams based on their type, for example, plain text, jpeg image, or audio wave. For file paths, it infers MIME types from their extensions, while for byte streams, it determines MIME types based on the provided metadata.
28  
29  When initializing the component, you specify the set of MIME types to route to separate outputs. To do this, set the `mime_types` parameter to a list of types, for example: `["text/plain", "audio/x-wav", "image/jpeg"]`. Types that are not listed are routed to an output named “unclassified”.
30  
31  ## Usage
32  
33  ### On its own
34  
35  Below is an example that uses the `FileTypeRouter` to rank two simple documents:
36  
37  ```python
38  from haystack import Document
39  from haystack.components.routers import FileTypeRouter
40  
41  router = FileTypeRouter(mime_types=["text/plain"])
42  router.run(sources=["text-file-will-be-added.txt", "pdf-will-not-ne-added.pdf"])
43  ```
44  
45  ### In a pipeline
46  
47  Below is an example of a pipeline that uses a `FileTypeRouter` to forward only plain text files to a `DocumentSplitter` and then a `DocumentWriter`. Only the content of plain text files gets added to the `InMemoryDocumentStore`, but not the content of files of any other type. As an alternative, you could add a `PyPDFConverter` to the pipeline and use the `FileTypeRouter` to route PDFs to it so that it converts them to documents.
48  
49  ```python
50  from haystack import Pipeline
51  from haystack.components.routers import FileTypeRouter
52  from haystack.document_stores.in_memory import InMemoryDocumentStore
53  from haystack.components.converters import TextFileToDocument
54  from haystack.components.preprocessors import DocumentSplitter
55  from haystack.components.writers import DocumentWriter
56  
57  document_store = InMemoryDocumentStore()
58  p = Pipeline()
59  p.add_component(
60      instance=FileTypeRouter(mime_types=["text/plain"]),
61      name="file_type_router",
62  )
63  p.add_component(instance=TextFileToDocument(), name="text_file_converter")
64  p.add_component(instance=DocumentSplitter(), name="splitter")
65  p.add_component(instance=DocumentWriter(document_store=document_store), name="writer")
66  p.connect("file_type_router.text/plain", "text_file_converter.sources")
67  p.connect("text_file_converter.documents", "splitter.documents")
68  p.connect("splitter.documents", "writer.documents")
69  p.run(
70      {
71          "file_type_router": {
72              "sources": ["text-file-will-be-added.txt", "pdf-will-not-be-added.pdf"],
73          },
74      },
75  )
76  ```