routers_api.md
1 --- 2 title: "Routers" 3 id: routers-api 4 description: "Routers is a group of components that route queries or Documents to other components that can handle them best." 5 slug: "/routers-api" 6 --- 7 8 <a id="conditional_router"></a> 9 10 ## Module conditional\_router 11 12 <a id="conditional_router.NoRouteSelectedException"></a> 13 14 ### NoRouteSelectedException 15 16 Exception raised when no route is selected in ConditionalRouter. 17 18 <a id="conditional_router.RouteConditionException"></a> 19 20 ### RouteConditionException 21 22 Exception raised when there is an error parsing or evaluating the condition expression in ConditionalRouter. 23 24 <a id="conditional_router.ConditionalRouter"></a> 25 26 ### ConditionalRouter 27 28 Routes data based on specific conditions. 29 30 You define these conditions in a list of dictionaries called `routes`. 31 Each dictionary in this list represents a single route. Each route has these four elements: 32 - `condition`: A Jinja2 string expression that determines if the route is selected. 33 - `output`: A Jinja2 expression defining the route's output value. 34 - `output_type`: The type of the output data (for example, `str`, `list[int]`). 35 - `output_name`: The name you want to use to publish `output`. This name is used to connect 36 the router to other components in the pipeline. 37 38 ### Usage example 39 40 ```python 41 from haystack.components.routers import ConditionalRouter 42 43 routes = [ 44 { 45 "condition": "{{streams|length > 2}}", 46 "output": "{{streams}}", 47 "output_name": "enough_streams", 48 "output_type": list[int], 49 }, 50 { 51 "condition": "{{streams|length <= 2}}", 52 "output": "{{streams}}", 53 "output_name": "insufficient_streams", 54 "output_type": list[int], 55 }, 56 ] 57 router = ConditionalRouter(routes) 58 # When 'streams' has more than 2 items, 'enough_streams' output will activate, emitting the list [1, 2, 3] 59 kwargs = {"streams": [1, 2, 3], "query": "Haystack"} 60 result = router.run(**kwargs) 61 assert result == {"enough_streams": [1, 2, 3]} 62 ``` 63 64 In this example, we configure two routes. The first route sends the 'streams' value to 'enough_streams' if the 65 stream count exceeds two. The second route directs 'streams' to 'insufficient_streams' if there 66 are two or fewer streams. 67 68 In the pipeline setup, the Router connects to other components using the output names. For example, 69 'enough_streams' might connect to a component that processes streams, while 70 'insufficient_streams' might connect to a component that fetches more streams. 71 72 73 Here is a pipeline that uses `ConditionalRouter` and routes the fetched `ByteStreams` to 74 different components depending on the number of streams fetched: 75 76 ```python 77 from haystack import Pipeline 78 from haystack.dataclasses import ByteStream 79 from haystack.components.routers import ConditionalRouter 80 81 routes = [ 82 { 83 "condition": "{{streams|length > 2}}", 84 "output": "{{streams}}", 85 "output_name": "enough_streams", 86 "output_type": list[ByteStream], 87 }, 88 { 89 "condition": "{{streams|length <= 2}}", 90 "output": "{{streams}}", 91 "output_name": "insufficient_streams", 92 "output_type": list[ByteStream], 93 }, 94 ] 95 96 pipe = Pipeline() 97 pipe.add_component("router", router) 98 ... 99 pipe.connect("router.enough_streams", "some_component_a.streams") 100 pipe.connect("router.insufficient_streams", "some_component_b.streams_or_some_other_input") 101 ... 102 ``` 103 104 <a id="conditional_router.ConditionalRouter.__init__"></a> 105 106 #### ConditionalRouter.\_\_init\_\_ 107 108 ```python 109 def __init__(routes: list[Route], 110 custom_filters: Optional[dict[str, Callable]] = None, 111 unsafe: bool = False, 112 validate_output_type: bool = False, 113 optional_variables: Optional[list[str]] = None) 114 ``` 115 116 Initializes the `ConditionalRouter` with a list of routes detailing the conditions for routing. 117 118 **Arguments**: 119 120 - `routes`: A list of dictionaries, each defining a route. 121 Each route has these four elements: 122 - `condition`: A Jinja2 string expression that determines if the route is selected. 123 - `output`: A Jinja2 expression defining the route's output value. 124 - `output_type`: The type of the output data (for example, `str`, `list[int]`). 125 - `output_name`: The name you want to use to publish `output`. This name is used to connect 126 the router to other components in the pipeline. 127 - `custom_filters`: A dictionary of custom Jinja2 filters used in the condition expressions. 128 For example, passing `{"my_filter": my_filter_fcn}` where: 129 - `my_filter` is the name of the custom filter. 130 - `my_filter_fcn` is a callable that takes `my_var:str` and returns `my_var[:3]`. 131 `{{ my_var|my_filter }}` can then be used inside a route condition expression: 132 `"condition": "{{ my_var|my_filter == 'foo' }}"`. 133 - `unsafe`: Enable execution of arbitrary code in the Jinja template. 134 This should only be used if you trust the source of the template as it can be lead to remote code execution. 135 - `validate_output_type`: Enable validation of routes' output. 136 If a route output doesn't match the declared type a ValueError is raised running. 137 - `optional_variables`: A list of variable names that are optional in your route conditions and outputs. 138 If these variables are not provided at runtime, they will be set to `None`. 139 This allows you to write routes that can handle missing inputs gracefully without raising errors. 140 141 Example usage with a default fallback route in a Pipeline: 142 ```python 143 from haystack import Pipeline 144 from haystack.components.routers import ConditionalRouter 145 146 routes = [ 147 { 148 "condition": '{{ path == "rag" }}', 149 "output": "{{ question }}", 150 "output_name": "rag_route", 151 "output_type": str 152 }, 153 { 154 "condition": "{{ True }}", # fallback route 155 "output": "{{ question }}", 156 "output_name": "default_route", 157 "output_type": str 158 } 159 ] 160 161 router = ConditionalRouter(routes, optional_variables=["path"]) 162 pipe = Pipeline() 163 pipe.add_component("router", router) 164 165 # When 'path' is provided in the pipeline: 166 result = pipe.run(data={"router": {"question": "What?", "path": "rag"}}) 167 assert result["router"] == {"rag_route": "What?"} 168 169 # When 'path' is not provided, fallback route is taken: 170 result = pipe.run(data={"router": {"question": "What?"}}) 171 assert result["router"] == {"default_route": "What?"} 172 ``` 173 174 This pattern is particularly useful when: 175 - You want to provide default/fallback behavior when certain inputs are missing 176 - Some variables are only needed for specific routing conditions 177 - You're building flexible pipelines where not all inputs are guaranteed to be present 178 179 <a id="conditional_router.ConditionalRouter.to_dict"></a> 180 181 #### ConditionalRouter.to\_dict 182 183 ```python 184 def to_dict() -> dict[str, Any] 185 ``` 186 187 Serializes the component to a dictionary. 188 189 **Returns**: 190 191 Dictionary with serialized data. 192 193 <a id="conditional_router.ConditionalRouter.from_dict"></a> 194 195 #### ConditionalRouter.from\_dict 196 197 ```python 198 @classmethod 199 def from_dict(cls, data: dict[str, Any]) -> "ConditionalRouter" 200 ``` 201 202 Deserializes the component from a dictionary. 203 204 **Arguments**: 205 206 - `data`: The dictionary to deserialize from. 207 208 **Returns**: 209 210 The deserialized component. 211 212 <a id="conditional_router.ConditionalRouter.run"></a> 213 214 #### ConditionalRouter.run 215 216 ```python 217 def run(**kwargs) 218 ``` 219 220 Executes the routing logic. 221 222 Executes the routing logic by evaluating the specified boolean condition expressions for each route in the 223 order they are listed. The method directs the flow of data to the output specified in the first route whose 224 `condition` is True. 225 226 **Arguments**: 227 228 - `kwargs`: All variables used in the `condition` expressed in the routes. When the component is used in a 229 pipeline, these variables are passed from the previous component's output. 230 231 **Raises**: 232 233 - `NoRouteSelectedException`: If no `condition' in the routes is `True`. 234 - `RouteConditionException`: If there is an error parsing or evaluating the `condition` expression in the routes. 235 - `ValueError`: If type validation is enabled and route type doesn't match actual value type. 236 237 **Returns**: 238 239 A dictionary where the key is the `output_name` of the selected route and the value is the `output` 240 of the selected route. 241 242 <a id="document_length_router"></a> 243 244 ## Module document\_length\_router 245 246 <a id="document_length_router.DocumentLengthRouter"></a> 247 248 ### DocumentLengthRouter 249 250 Categorizes documents based on the length of the `content` field and routes them to the appropriate output. 251 252 A common use case for DocumentLengthRouter is handling documents obtained from PDFs that contain non-text 253 content, such as scanned pages or images. This component can detect empty or low-content documents and route them to 254 components that perform OCR, generate captions, or compute image embeddings. 255 256 ### Usage example 257 258 ```python 259 from haystack.components.routers import DocumentLengthRouter 260 from haystack.dataclasses import Document 261 262 docs = [ 263 Document(content="Short"), 264 Document(content="Long document "*20), 265 ] 266 267 router = DocumentLengthRouter(threshold=10) 268 269 result = router.run(documents=docs) 270 print(result) 271 272 # { 273 # "short_documents": [Document(content="Short", ...)], 274 # "long_documents": [Document(content="Long document ...", ...)], 275 # } 276 ``` 277 278 <a id="document_length_router.DocumentLengthRouter.__init__"></a> 279 280 #### DocumentLengthRouter.\_\_init\_\_ 281 282 ```python 283 def __init__(*, threshold: int = 10) -> None 284 ``` 285 286 Initialize the DocumentLengthRouter component. 287 288 **Arguments**: 289 290 - `threshold`: The threshold for the number of characters in the document `content` field. Documents where `content` is 291 None or whose character count is less than or equal to the threshold will be routed to the `short_documents` 292 output. Otherwise, they will be routed to the `long_documents` output. 293 To route only documents with None content to `short_documents`, set the threshold to a negative number. 294 295 <a id="document_length_router.DocumentLengthRouter.run"></a> 296 297 #### DocumentLengthRouter.run 298 299 ```python 300 @component.output_types(short_documents=list[Document], 301 long_documents=list[Document]) 302 def run(documents: list[Document]) -> dict[str, list[Document]] 303 ``` 304 305 Categorize input documents into groups based on the length of the `content` field. 306 307 **Arguments**: 308 309 - `documents`: A list of documents to be categorized. 310 311 **Returns**: 312 313 A dictionary with the following keys: 314 - `short_documents`: A list of documents where `content` is None or the length of `content` is less than or 315 equal to the threshold. 316 - `long_documents`: A list of documents where the length of `content` is greater than the threshold. 317 318 <a id="document_type_router"></a> 319 320 ## Module document\_type\_router 321 322 <a id="document_type_router.DocumentTypeRouter"></a> 323 324 ### DocumentTypeRouter 325 326 Routes documents by their MIME types. 327 328 DocumentTypeRouter is used to dynamically route documents within a pipeline based on their MIME types. 329 It supports exact MIME type matches and regex patterns. 330 331 MIME types can be extracted directly from document metadata or inferred from file paths using standard or 332 user-supplied MIME type mappings. 333 334 ### Usage example 335 336 ```python 337 from haystack.components.routers import DocumentTypeRouter 338 from haystack.dataclasses import Document 339 340 docs = [ 341 Document(content="Example text", meta={"file_path": "example.txt"}), 342 Document(content="Another document", meta={"mime_type": "application/pdf"}), 343 Document(content="Unknown type") 344 ] 345 346 router = DocumentTypeRouter( 347 mime_type_meta_field="mime_type", 348 file_path_meta_field="file_path", 349 mime_types=["text/plain", "application/pdf"] 350 ) 351 352 result = router.run(documents=docs) 353 print(result) 354 ``` 355 356 Expected output: 357 ```python 358 { 359 "text/plain": [Document(...)], 360 "application/pdf": [Document(...)], 361 "unclassified": [Document(...)] 362 } 363 ``` 364 365 <a id="document_type_router.DocumentTypeRouter.__init__"></a> 366 367 #### DocumentTypeRouter.\_\_init\_\_ 368 369 ```python 370 def __init__(*, 371 mime_types: list[str], 372 mime_type_meta_field: Optional[str] = None, 373 file_path_meta_field: Optional[str] = None, 374 additional_mimetypes: Optional[dict[str, str]] = None) -> None 375 ``` 376 377 Initialize the DocumentTypeRouter component. 378 379 **Arguments**: 380 381 - `mime_types`: A list of MIME types or regex patterns to classify the input documents. 382 (for example: `["text/plain", "audio/x-wav", "image/jpeg"]`). 383 - `mime_type_meta_field`: Optional name of the metadata field that holds the MIME type. 384 - `file_path_meta_field`: Optional name of the metadata field that holds the file path. Used to infer the MIME type if 385 `mime_type_meta_field` is not provided or missing in a document. 386 - `additional_mimetypes`: Optional dictionary mapping MIME types to file extensions to enhance or override the standard 387 `mimetypes` module. Useful when working with uncommon or custom file types. 388 For example: `{"application/vnd.custom-type": ".custom"}`. 389 390 **Raises**: 391 392 - `ValueError`: If `mime_types` is empty or if both `mime_type_meta_field` and `file_path_meta_field` are 393 not provided. 394 395 <a id="document_type_router.DocumentTypeRouter.run"></a> 396 397 #### DocumentTypeRouter.run 398 399 ```python 400 def run(documents: list[Document]) -> dict[str, list[Document]] 401 ``` 402 403 Categorize input documents into groups based on their MIME type. 404 405 MIME types can either be directly available in document metadata or derived from file paths using the 406 standard Python `mimetypes` module and custom mappings. 407 408 **Arguments**: 409 410 - `documents`: A list of documents to be categorized. 411 412 **Returns**: 413 414 A dictionary where the keys are MIME types (or `"unclassified"`) and the values are lists of documents. 415 416 <a id="file_type_router"></a> 417 418 ## Module file\_type\_router 419 420 <a id="file_type_router.FileTypeRouter"></a> 421 422 ### FileTypeRouter 423 424 Categorizes files or byte streams by their MIME types, helping in context-based routing. 425 426 FileTypeRouter supports both exact MIME type matching and regex patterns. 427 428 For file paths, MIME types come from extensions, while byte streams use metadata. 429 You can use regex patterns in the `mime_types` parameter to set broad categories 430 (such as 'audio/*' or 'text/*') or specific types. 431 MIME types without regex patterns are treated as exact matches. 432 433 ### Usage example 434 435 ```python 436 from haystack.components.routers import FileTypeRouter 437 from pathlib import Path 438 439 # For exact MIME type matching 440 router = FileTypeRouter(mime_types=["text/plain", "application/pdf"]) 441 442 # For flexible matching using regex, to handle all audio types 443 router_with_regex = FileTypeRouter(mime_types=[r"audio/.*", r"text/plain"]) 444 445 sources = [Path("file.txt"), Path("document.pdf"), Path("song.mp3")] 446 print(router.run(sources=sources)) 447 print(router_with_regex.run(sources=sources)) 448 449 # Expected output: 450 # {'text/plain': [ 451 # PosixPath('file.txt')], 'application/pdf': [PosixPath('document.pdf')], 'unclassified': [PosixPath('song.mp3') 452 # ]} 453 # {'audio/.*': [ 454 # PosixPath('song.mp3')], 'text/plain': [PosixPath('file.txt')], 'unclassified': [PosixPath('document.pdf') 455 # ]} 456 ``` 457 458 <a id="file_type_router.FileTypeRouter.__init__"></a> 459 460 #### FileTypeRouter.\_\_init\_\_ 461 462 ```python 463 def __init__(mime_types: list[str], 464 additional_mimetypes: Optional[dict[str, str]] = None, 465 raise_on_failure: bool = False) 466 ``` 467 468 Initialize the FileTypeRouter component. 469 470 **Arguments**: 471 472 - `mime_types`: A list of MIME types or regex patterns to classify the input files or byte streams. 473 (for example: `["text/plain", "audio/x-wav", "image/jpeg"]`). 474 - `additional_mimetypes`: A dictionary containing the MIME type to add to the mimetypes package to prevent unsupported or non-native 475 packages from being unclassified. 476 (for example: `{"application/vnd.openxmlformats-officedocument.wordprocessingml.document": ".docx"}`). 477 - `raise_on_failure`: If True, raises FileNotFoundError when a file path doesn't exist. 478 If False (default), only emits a warning when a file path doesn't exist. 479 480 <a id="file_type_router.FileTypeRouter.to_dict"></a> 481 482 #### FileTypeRouter.to\_dict 483 484 ```python 485 def to_dict() -> dict[str, Any] 486 ``` 487 488 Serializes the component to a dictionary. 489 490 **Returns**: 491 492 Dictionary with serialized data. 493 494 <a id="file_type_router.FileTypeRouter.from_dict"></a> 495 496 #### FileTypeRouter.from\_dict 497 498 ```python 499 @classmethod 500 def from_dict(cls, data: dict[str, Any]) -> "FileTypeRouter" 501 ``` 502 503 Deserializes the component from a dictionary. 504 505 **Arguments**: 506 507 - `data`: The dictionary to deserialize from. 508 509 **Returns**: 510 511 The deserialized component. 512 513 <a id="file_type_router.FileTypeRouter.run"></a> 514 515 #### FileTypeRouter.run 516 517 ```python 518 def run( 519 sources: list[Union[str, Path, ByteStream]], 520 meta: Optional[Union[dict[str, Any], list[dict[str, Any]]]] = None 521 ) -> dict[str, list[Union[ByteStream, Path]]] 522 ``` 523 524 Categorize files or byte streams according to their MIME types. 525 526 **Arguments**: 527 528 - `sources`: A list of file paths or byte streams to categorize. 529 - `meta`: Optional metadata to attach to the sources. 530 When provided, the sources are internally converted to ByteStream objects and the metadata is added. 531 This value can be a list of dictionaries or a single dictionary. 532 If it's a single dictionary, its content is added to the metadata of all ByteStream objects. 533 If it's a list, its length must match the number of sources, as they are zipped together. 534 535 **Returns**: 536 537 A dictionary where the keys are MIME types and the values are lists of data sources. 538 Two extra keys may be returned: `"unclassified"` when a source's MIME type doesn't match any pattern 539 and `"failed"` when a source cannot be processed (for example, a file path that doesn't exist). 540 541 <a id="llm_messages_router"></a> 542 543 ## Module llm\_messages\_router 544 545 <a id="llm_messages_router.LLMMessagesRouter"></a> 546 547 ### LLMMessagesRouter 548 549 Routes Chat Messages to different connections using a generative Language Model to perform classification. 550 551 This component can be used with general-purpose LLMs and with specialized LLMs for moderation like Llama Guard. 552 553 ### Usage example 554 ```python 555 from haystack.components.generators.chat import HuggingFaceAPIChatGenerator 556 from haystack.components.routers.llm_messages_router import LLMMessagesRouter 557 from haystack.dataclasses import ChatMessage 558 559 # initialize a Chat Generator with a generative model for moderation 560 chat_generator = HuggingFaceAPIChatGenerator( 561 api_type="serverless_inference_api", 562 api_params={"model": "meta-llama/Llama-Guard-4-12B", "provider": "groq"}, 563 ) 564 565 router = LLMMessagesRouter(chat_generator=chat_generator, 566 output_names=["unsafe", "safe"], 567 output_patterns=["unsafe", "safe"]) 568 569 570 print(router.run([ChatMessage.from_user("How to rob a bank?")])) 571 572 # { 573 # 'chat_generator_text': 'unsafe 574 S2', 575 # 'unsafe': [ 576 # ChatMessage( 577 # _role=<ChatRole.USER: 'user'>, 578 # _content=[TextContent(text='How to rob a bank?')], 579 # _name=None, 580 # _meta={} 581 # ) 582 # ] 583 # } 584 ``` 585 586 <a id="llm_messages_router.LLMMessagesRouter.__init__"></a> 587 588 #### LLMMessagesRouter.\_\_init\_\_ 589 590 ```python 591 def __init__(chat_generator: ChatGenerator, 592 output_names: list[str], 593 output_patterns: list[str], 594 system_prompt: Optional[str] = None) 595 ``` 596 597 Initialize the LLMMessagesRouter component. 598 599 **Arguments**: 600 601 - `chat_generator`: A ChatGenerator instance which represents the LLM. 602 - `output_names`: A list of output connection names. These can be used to connect the router to other 603 components. 604 - `output_patterns`: A list of regular expressions to be matched against the output of the LLM. Each pattern 605 corresponds to an output name. Patterns are evaluated in order. 606 When using moderation models, refer to the model card to understand the expected outputs. 607 - `system_prompt`: An optional system prompt to customize the behavior of the LLM. 608 For moderation models, refer to the model card for supported customization options. 609 610 **Raises**: 611 612 - `ValueError`: If output_names and output_patterns are not non-empty lists of the same length. 613 614 <a id="llm_messages_router.LLMMessagesRouter.warm_up"></a> 615 616 #### LLMMessagesRouter.warm\_up 617 618 ```python 619 def warm_up() 620 ``` 621 622 Warm up the underlying LLM. 623 624 <a id="llm_messages_router.LLMMessagesRouter.run"></a> 625 626 #### LLMMessagesRouter.run 627 628 ```python 629 def run(messages: list[ChatMessage] 630 ) -> dict[str, Union[str, list[ChatMessage]]] 631 ``` 632 633 Classify the messages based on LLM output and route them to the appropriate output connection. 634 635 **Arguments**: 636 637 - `messages`: A list of ChatMessages to be routed. Only user and assistant messages are supported. 638 639 **Raises**: 640 641 - `ValueError`: If messages is an empty list or contains messages with unsupported roles. 642 643 **Returns**: 644 645 A dictionary with the following keys: 646 - "chat_generator_text": The text output of the LLM, useful for debugging. 647 - "output_names": Each contains the list of messages that matched the corresponding pattern. 648 - "unmatched": The messages that did not match any of the output patterns. 649 650 <a id="llm_messages_router.LLMMessagesRouter.to_dict"></a> 651 652 #### LLMMessagesRouter.to\_dict 653 654 ```python 655 def to_dict() -> dict[str, Any] 656 ``` 657 658 Serialize this component to a dictionary. 659 660 **Returns**: 661 662 The serialized component as a dictionary. 663 664 <a id="llm_messages_router.LLMMessagesRouter.from_dict"></a> 665 666 #### LLMMessagesRouter.from\_dict 667 668 ```python 669 @classmethod 670 def from_dict(cls, data: dict[str, Any]) -> "LLMMessagesRouter" 671 ``` 672 673 Deserialize this component from a dictionary. 674 675 **Arguments**: 676 677 - `data`: The dictionary representation of this component. 678 679 **Returns**: 680 681 The deserialized component instance. 682 683 <a id="metadata_router"></a> 684 685 ## Module metadata\_router 686 687 <a id="metadata_router.MetadataRouter"></a> 688 689 ### MetadataRouter 690 691 Routes documents or byte streams to different connections based on their metadata fields. 692 693 Specify the routing rules in the `init` method. 694 If a document or byte stream does not match any of the rules, it's routed to a connection named "unmatched". 695 696 697 ### Usage examples 698 699 **Routing Documents by metadata:** 700 ```python 701 from haystack import Document 702 from haystack.components.routers import MetadataRouter 703 704 docs = [Document(content="Paris is the capital of France.", meta={"language": "en"}), 705 Document(content="Berlin ist die Haupststadt von Deutschland.", meta={"language": "de"})] 706 707 router = MetadataRouter(rules={"en": {"field": "meta.language", "operator": "==", "value": "en"}}) 708 709 print(router.run(documents=docs)) 710 # {'en': [Document(id=..., content: 'Paris is the capital of France.', meta: {'language': 'en'})], 711 # 'unmatched': [Document(id=..., content: 'Berlin ist die Haupststadt von Deutschland.', meta: {'language': 'de'})]} 712 ``` 713 714 **Routing ByteStreams by metadata:** 715 ```python 716 from haystack.dataclasses import ByteStream 717 from haystack.components.routers import MetadataRouter 718 719 streams = [ 720 ByteStream.from_string("Hello world", meta={"language": "en"}), 721 ByteStream.from_string("Bonjour le monde", meta={"language": "fr"}) 722 ] 723 724 router = MetadataRouter( 725 rules={"english": {"field": "meta.language", "operator": "==", "value": "en"}}, 726 output_type=list[ByteStream] 727 ) 728 729 result = router.run(documents=streams) 730 # {'english': [ByteStream(...)], 'unmatched': [ByteStream(...)]} 731 ``` 732 733 <a id="metadata_router.MetadataRouter.__init__"></a> 734 735 #### MetadataRouter.\_\_init\_\_ 736 737 ```python 738 def __init__(rules: dict[str, dict], 739 output_type: type = list[Document]) -> None 740 ``` 741 742 Initializes the MetadataRouter component. 743 744 **Arguments**: 745 746 - `rules`: A dictionary defining how to route documents or byte streams to output connections based on their 747 metadata. Keys are output connection names, and values are dictionaries of 748 [filtering expressions](https://docs.haystack.deepset.ai/docs/metadata-filtering) in Haystack. 749 For example: 750 ```python 751 { 752 "edge_1": { 753 "operator": "AND", 754 "conditions": [ 755 {"field": "meta.created_at", "operator": ">=", "value": "2023-01-01"}, 756 {"field": "meta.created_at", "operator": "<", "value": "2023-04-01"}, 757 ], 758 }, 759 "edge_2": { 760 "operator": "AND", 761 "conditions": [ 762 {"field": "meta.created_at", "operator": ">=", "value": "2023-04-01"}, 763 {"field": "meta.created_at", "operator": "<", "value": "2023-07-01"}, 764 ], 765 }, 766 "edge_3": { 767 "operator": "AND", 768 "conditions": [ 769 {"field": "meta.created_at", "operator": ">=", "value": "2023-07-01"}, 770 {"field": "meta.created_at", "operator": "<", "value": "2023-10-01"}, 771 ], 772 }, 773 "edge_4": { 774 "operator": "AND", 775 "conditions": [ 776 {"field": "meta.created_at", "operator": ">=", "value": "2023-10-01"}, 777 {"field": "meta.created_at", "operator": "<", "value": "2024-01-01"}, 778 ], 779 }, 780 } 781 ``` 782 :param output_type: The type of the output produced. Lists of Documents or ByteStreams can be specified. 783 784 <a id="metadata_router.MetadataRouter.run"></a> 785 786 #### MetadataRouter.run 787 788 ```python 789 def run(documents: Union[list[Document], list[ByteStream]]) 790 ``` 791 792 Routes documents or byte streams to different connections based on their metadata fields. 793 794 If a document or byte stream does not match any of the rules, it's routed to a connection named "unmatched". 795 796 **Arguments**: 797 798 - `documents`: A list of `Document` or `ByteStream` objects to be routed based on their metadata. 799 800 **Returns**: 801 802 A dictionary where the keys are the names of the output connections (including `"unmatched"`) 803 and the values are lists of `Document` or `ByteStream` objects that matched the corresponding rules. 804 805 <a id="metadata_router.MetadataRouter.to_dict"></a> 806 807 #### MetadataRouter.to\_dict 808 809 ```python 810 def to_dict() -> dict[str, Any] 811 ``` 812 813 Serialize this component to a dictionary. 814 815 **Returns**: 816 817 The serialized component as a dictionary. 818 819 <a id="metadata_router.MetadataRouter.from_dict"></a> 820 821 #### MetadataRouter.from\_dict 822 823 ```python 824 @classmethod 825 def from_dict(cls, data: dict[str, Any]) -> "MetadataRouter" 826 ``` 827 828 Deserialize this component from a dictionary. 829 830 **Arguments**: 831 832 - `data`: The dictionary representation of this component. 833 834 **Returns**: 835 836 The deserialized component instance. 837 838 <a id="text_language_router"></a> 839 840 ## Module text\_language\_router 841 842 <a id="text_language_router.TextLanguageRouter"></a> 843 844 ### TextLanguageRouter 845 846 Routes text strings to different output connections based on their language. 847 848 Provide a list of languages during initialization. If the document's text doesn't match any of the 849 specified languages, the metadata value is set to "unmatched". 850 For routing documents based on their language, use the DocumentLanguageClassifier component, 851 followed by the MetaDataRouter. 852 853 ### Usage example 854 855 ```python 856 from haystack import Pipeline, Document 857 from haystack.components.routers import TextLanguageRouter 858 from haystack.document_stores.in_memory import InMemoryDocumentStore 859 from haystack.components.retrievers.in_memory import InMemoryBM25Retriever 860 861 document_store = InMemoryDocumentStore() 862 document_store.write_documents([Document(content="Elvis Presley was an American singer and actor.")]) 863 864 p = Pipeline() 865 p.add_component(instance=TextLanguageRouter(languages=["en"]), name="text_language_router") 866 p.add_component(instance=InMemoryBM25Retriever(document_store=document_store), name="retriever") 867 p.connect("text_language_router.en", "retriever.query") 868 869 result = p.run({"text_language_router": {"text": "Who was Elvis Presley?"}}) 870 assert result["retriever"]["documents"][0].content == "Elvis Presley was an American singer and actor." 871 872 result = p.run({"text_language_router": {"text": "ένα ελληνικό κείμενο"}}) 873 assert result["text_language_router"]["unmatched"] == "ένα ελληνικό κείμενο" 874 ``` 875 876 <a id="text_language_router.TextLanguageRouter.__init__"></a> 877 878 #### TextLanguageRouter.\_\_init\_\_ 879 880 ```python 881 def __init__(languages: Optional[list[str]] = None) 882 ``` 883 884 Initialize the TextLanguageRouter component. 885 886 **Arguments**: 887 888 - `languages`: A list of ISO language codes. 889 See the supported languages in [`langdetect` documentation](https://github.com/Mimino666/langdetect#languages). 890 If not specified, defaults to ["en"]. 891 892 <a id="text_language_router.TextLanguageRouter.run"></a> 893 894 #### TextLanguageRouter.run 895 896 ```python 897 def run(text: str) -> dict[str, str] 898 ``` 899 900 Routes the text strings to different output connections based on their language. 901 902 If the document's text doesn't match any of the specified languages, the metadata value is set to "unmatched". 903 904 **Arguments**: 905 906 - `text`: A text string to route. 907 908 **Raises**: 909 910 - `TypeError`: If the input is not a string. 911 912 **Returns**: 913 914 A dictionary in which the key is the language (or `"unmatched"`), 915 and the value is the text. 916 917 <a id="transformers_text_router"></a> 918 919 ## Module transformers\_text\_router 920 921 <a id="transformers_text_router.TransformersTextRouter"></a> 922 923 ### TransformersTextRouter 924 925 Routes the text strings to different connections based on a category label. 926 927 The labels are specific to each model and can be found it its description on Hugging Face. 928 929 ### Usage example 930 931 ```python 932 from haystack.core.pipeline import Pipeline 933 from haystack.components.routers import TransformersTextRouter 934 from haystack.components.builders import PromptBuilder 935 from haystack.components.generators import HuggingFaceLocalGenerator 936 937 p = Pipeline() 938 p.add_component( 939 instance=TransformersTextRouter(model="papluca/xlm-roberta-base-language-detection"), 940 name="text_router" 941 ) 942 p.add_component( 943 instance=PromptBuilder(template="Answer the question: {{query}}\nAnswer:"), 944 name="english_prompt_builder" 945 ) 946 p.add_component( 947 instance=PromptBuilder(template="Beantworte die Frage: {{query}}\nAntwort:"), 948 name="german_prompt_builder" 949 ) 950 951 p.add_component( 952 instance=HuggingFaceLocalGenerator(model="DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1"), 953 name="german_llm" 954 ) 955 p.add_component( 956 instance=HuggingFaceLocalGenerator(model="microsoft/Phi-3-mini-4k-instruct"), 957 name="english_llm" 958 ) 959 960 p.connect("text_router.en", "english_prompt_builder.query") 961 p.connect("text_router.de", "german_prompt_builder.query") 962 p.connect("english_prompt_builder.prompt", "english_llm.prompt") 963 p.connect("german_prompt_builder.prompt", "german_llm.prompt") 964 965 # English Example 966 print(p.run({"text_router": {"text": "What is the capital of Germany?"}})) 967 968 # German Example 969 print(p.run({"text_router": {"text": "Was ist die Hauptstadt von Deutschland?"}})) 970 ``` 971 972 <a id="transformers_text_router.TransformersTextRouter.__init__"></a> 973 974 #### TransformersTextRouter.\_\_init\_\_ 975 976 ```python 977 def __init__(model: str, 978 labels: Optional[list[str]] = None, 979 device: Optional[ComponentDevice] = None, 980 token: Optional[Secret] = Secret.from_env_var( 981 ["HF_API_TOKEN", "HF_TOKEN"], strict=False), 982 huggingface_pipeline_kwargs: Optional[dict[str, Any]] = None) 983 ``` 984 985 Initializes the TransformersTextRouter component. 986 987 **Arguments**: 988 989 - `model`: The name or path of a Hugging Face model for text classification. 990 - `labels`: The list of labels. If not provided, the component fetches the labels 991 from the model configuration file hosted on the Hugging Face Hub using 992 `transformers.AutoConfig.from_pretrained`. 993 - `device`: The device for loading the model. If `None`, automatically selects the default device. 994 If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter. 995 - `token`: The API token used to download private models from Hugging Face. 996 If `True`, uses either `HF_API_TOKEN` or `HF_TOKEN` environment variables. 997 To generate these tokens, run `transformers-cli login`. 998 - `huggingface_pipeline_kwargs`: A dictionary of keyword arguments for initializing the Hugging Face 999 text classification pipeline. 1000 1001 <a id="transformers_text_router.TransformersTextRouter.warm_up"></a> 1002 1003 #### TransformersTextRouter.warm\_up 1004 1005 ```python 1006 def warm_up() 1007 ``` 1008 1009 Initializes the component. 1010 1011 <a id="transformers_text_router.TransformersTextRouter.to_dict"></a> 1012 1013 #### TransformersTextRouter.to\_dict 1014 1015 ```python 1016 def to_dict() -> dict[str, Any] 1017 ``` 1018 1019 Serializes the component to a dictionary. 1020 1021 **Returns**: 1022 1023 Dictionary with serialized data. 1024 1025 <a id="transformers_text_router.TransformersTextRouter.from_dict"></a> 1026 1027 #### TransformersTextRouter.from\_dict 1028 1029 ```python 1030 @classmethod 1031 def from_dict(cls, data: dict[str, Any]) -> "TransformersTextRouter" 1032 ``` 1033 1034 Deserializes the component from a dictionary. 1035 1036 **Arguments**: 1037 1038 - `data`: Dictionary to deserialize from. 1039 1040 **Returns**: 1041 1042 Deserialized component. 1043 1044 <a id="transformers_text_router.TransformersTextRouter.run"></a> 1045 1046 #### TransformersTextRouter.run 1047 1048 ```python 1049 def run(text: str) -> dict[str, str] 1050 ``` 1051 1052 Routes the text strings to different connections based on a category label. 1053 1054 **Arguments**: 1055 1056 - `text`: A string of text to route. 1057 1058 **Raises**: 1059 1060 - `TypeError`: If the input is not a str. 1061 - `RuntimeError`: If the pipeline has not been loaded because warm_up() was not called before. 1062 1063 **Returns**: 1064 1065 A dictionary with the label as key and the text as value. 1066 1067 <a id="zero_shot_text_router"></a> 1068 1069 ## Module zero\_shot\_text\_router 1070 1071 <a id="zero_shot_text_router.TransformersZeroShotTextRouter"></a> 1072 1073 ### TransformersZeroShotTextRouter 1074 1075 Routes the text strings to different connections based on a category label. 1076 1077 Specify the set of labels for categorization when initializing the component. 1078 1079 ### Usage example 1080 1081 ```python 1082 from haystack import Document 1083 from haystack.document_stores.in_memory import InMemoryDocumentStore 1084 from haystack.core.pipeline import Pipeline 1085 from haystack.components.routers import TransformersZeroShotTextRouter 1086 from haystack.components.embedders import SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder 1087 from haystack.components.retrievers import InMemoryEmbeddingRetriever 1088 1089 document_store = InMemoryDocumentStore() 1090 doc_embedder = SentenceTransformersDocumentEmbedder(model="intfloat/e5-base-v2") 1091 doc_embedder.warm_up() 1092 docs = [ 1093 Document( 1094 content="Germany, officially the Federal Republic of Germany, is a country in the western region of " 1095 "Central Europe. The nation's capital and most populous city is Berlin and its main financial centre " 1096 "is Frankfurt; the largest urban area is the Ruhr." 1097 ), 1098 Document( 1099 content="France, officially the French Republic, is a country located primarily in Western Europe. " 1100 "France is a unitary semi-presidential republic with its capital in Paris, the country's largest city " 1101 "and main cultural and commercial centre; other major urban areas include Marseille, Lyon, Toulouse, " 1102 "Lille, Bordeaux, Strasbourg, Nantes and Nice." 1103 ) 1104 ] 1105 docs_with_embeddings = doc_embedder.run(docs) 1106 document_store.write_documents(docs_with_embeddings["documents"]) 1107 1108 p = Pipeline() 1109 p.add_component(instance=TransformersZeroShotTextRouter(labels=["passage", "query"]), name="text_router") 1110 p.add_component( 1111 instance=SentenceTransformersTextEmbedder(model="intfloat/e5-base-v2", prefix="passage: "), 1112 name="passage_embedder" 1113 ) 1114 p.add_component( 1115 instance=SentenceTransformersTextEmbedder(model="intfloat/e5-base-v2", prefix="query: "), 1116 name="query_embedder" 1117 ) 1118 p.add_component( 1119 instance=InMemoryEmbeddingRetriever(document_store=document_store), 1120 name="query_retriever" 1121 ) 1122 p.add_component( 1123 instance=InMemoryEmbeddingRetriever(document_store=document_store), 1124 name="passage_retriever" 1125 ) 1126 1127 p.connect("text_router.passage", "passage_embedder.text") 1128 p.connect("passage_embedder.embedding", "passage_retriever.query_embedding") 1129 p.connect("text_router.query", "query_embedder.text") 1130 p.connect("query_embedder.embedding", "query_retriever.query_embedding") 1131 1132 # Query Example 1133 p.run({"text_router": {"text": "What is the capital of Germany?"}}) 1134 1135 # Passage Example 1136 p.run({ 1137 "text_router":{ 1138 "text": "The United Kingdom of Great Britain and Northern Ireland, commonly known as the " "United Kingdom (UK) or Britain, is a country in Northwestern Europe, off the north-western coast of " "the continental mainland." 1139 } 1140 }) 1141 ``` 1142 1143 <a id="zero_shot_text_router.TransformersZeroShotTextRouter.__init__"></a> 1144 1145 #### TransformersZeroShotTextRouter.\_\_init\_\_ 1146 1147 ```python 1148 def __init__(labels: list[str], 1149 multi_label: bool = False, 1150 model: str = "MoritzLaurer/deberta-v3-base-zeroshot-v1.1-all-33", 1151 device: Optional[ComponentDevice] = None, 1152 token: Optional[Secret] = Secret.from_env_var( 1153 ["HF_API_TOKEN", "HF_TOKEN"], strict=False), 1154 huggingface_pipeline_kwargs: Optional[dict[str, Any]] = None) 1155 ``` 1156 1157 Initializes the TransformersZeroShotTextRouter component. 1158 1159 **Arguments**: 1160 1161 - `labels`: The set of labels to use for classification. Can be a single label, 1162 a string of comma-separated labels, or a list of labels. 1163 - `multi_label`: Indicates if multiple labels can be true. 1164 If `False`, label scores are normalized so their sum equals 1 for each sequence. 1165 If `True`, the labels are considered independent and probabilities are normalized for each candidate by 1166 doing a softmax of the entailment score vs. the contradiction score. 1167 - `model`: The name or path of a Hugging Face model for zero-shot text classification. 1168 - `device`: The device for loading the model. If `None`, automatically selects the default device. 1169 If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter. 1170 - `token`: The API token used to download private models from Hugging Face. 1171 If `True`, uses either `HF_API_TOKEN` or `HF_TOKEN` environment variables. 1172 To generate these tokens, run `transformers-cli login`. 1173 - `huggingface_pipeline_kwargs`: A dictionary of keyword arguments for initializing the Hugging Face 1174 zero shot text classification. 1175 1176 <a id="zero_shot_text_router.TransformersZeroShotTextRouter.warm_up"></a> 1177 1178 #### TransformersZeroShotTextRouter.warm\_up 1179 1180 ```python 1181 def warm_up() 1182 ``` 1183 1184 Initializes the component. 1185 1186 <a id="zero_shot_text_router.TransformersZeroShotTextRouter.to_dict"></a> 1187 1188 #### TransformersZeroShotTextRouter.to\_dict 1189 1190 ```python 1191 def to_dict() -> dict[str, Any] 1192 ``` 1193 1194 Serializes the component to a dictionary. 1195 1196 **Returns**: 1197 1198 Dictionary with serialized data. 1199 1200 <a id="zero_shot_text_router.TransformersZeroShotTextRouter.from_dict"></a> 1201 1202 #### TransformersZeroShotTextRouter.from\_dict 1203 1204 ```python 1205 @classmethod 1206 def from_dict(cls, data: dict[str, Any]) -> "TransformersZeroShotTextRouter" 1207 ``` 1208 1209 Deserializes the component from a dictionary. 1210 1211 **Arguments**: 1212 1213 - `data`: Dictionary to deserialize from. 1214 1215 **Returns**: 1216 1217 Deserialized component. 1218 1219 <a id="zero_shot_text_router.TransformersZeroShotTextRouter.run"></a> 1220 1221 #### TransformersZeroShotTextRouter.run 1222 1223 ```python 1224 def run(text: str) -> dict[str, str] 1225 ``` 1226 1227 Routes the text strings to different connections based on a category label. 1228 1229 **Arguments**: 1230 1231 - `text`: A string of text to route. 1232 1233 **Raises**: 1234 1235 - `TypeError`: If the input is not a str. 1236 - `RuntimeError`: If the pipeline has not been loaded because warm_up() was not called before. 1237 1238 **Returns**: 1239 1240 A dictionary with the label as key and the text as value. 1241