routers_api.md
1 --- 2 title: "Routers" 3 id: routers-api 4 description: "Routers is a group of components that route queries or Documents to other components that can handle them best." 5 slug: "/routers-api" 6 --- 7 8 <a id="conditional_router"></a> 9 10 ## Module conditional\_router 11 12 <a id="conditional_router.NoRouteSelectedException"></a> 13 14 ### NoRouteSelectedException 15 16 Exception raised when no route is selected in ConditionalRouter. 17 18 <a id="conditional_router.RouteConditionException"></a> 19 20 ### RouteConditionException 21 22 Exception raised when there is an error parsing or evaluating the condition expression in ConditionalRouter. 23 24 <a id="conditional_router.ConditionalRouter"></a> 25 26 ### ConditionalRouter 27 28 Routes data based on specific conditions. 29 30 You define these conditions in a list of dictionaries called `routes`. 31 Each dictionary in this list represents a single route. Each route has these four elements: 32 - `condition`: A Jinja2 string expression that determines if the route is selected. 33 - `output`: A Jinja2 expression defining the route's output value. 34 - `output_type`: The type of the output data (for example, `str`, `list[int]`). 35 - `output_name`: The name you want to use to publish `output`. This name is used to connect 36 the router to other components in the pipeline. 37 38 ### Usage example 39 40 ```python 41 from haystack.components.routers import ConditionalRouter 42 43 routes = [ 44 { 45 "condition": "{{streams|length > 2}}", 46 "output": "{{streams}}", 47 "output_name": "enough_streams", 48 "output_type": list[int], 49 }, 50 { 51 "condition": "{{streams|length <= 2}}", 52 "output": "{{streams}}", 53 "output_name": "insufficient_streams", 54 "output_type": list[int], 55 }, 56 ] 57 router = ConditionalRouter(routes) 58 # When 'streams' has more than 2 items, 'enough_streams' output will activate, emitting the list [1, 2, 3] 59 kwargs = {"streams": [1, 2, 3], "query": "Haystack"} 60 result = router.run(**kwargs) 61 assert result == {"enough_streams": [1, 2, 3]} 62 ``` 63 64 In this example, we configure two routes. The first route sends the 'streams' value to 'enough_streams' if the 65 stream count exceeds two. The second route directs 'streams' to 'insufficient_streams' if there 66 are two or fewer streams. 67 68 In the pipeline setup, the Router connects to other components using the output names. For example, 69 'enough_streams' might connect to a component that processes streams, while 70 'insufficient_streams' might connect to a component that fetches more streams. 71 72 73 Here is a pipeline that uses `ConditionalRouter` and routes the fetched `ByteStreams` to 74 different components depending on the number of streams fetched: 75 76 ```python 77 from haystack import Pipeline 78 from haystack.dataclasses import ByteStream 79 from haystack.components.routers import ConditionalRouter 80 81 routes = [ 82 {"condition": "{{count > 5}}", 83 "output": "Processing many items", 84 "output_name": "many_items", 85 "output_type": str, 86 }, 87 {"condition": "{{count <= 5}}", 88 "output": "Processing few items", 89 "output_name": "few_items", 90 "output_type": str, 91 }, 92 ] 93 94 pipe = Pipeline() 95 pipe.add_component("router", ConditionalRouter(routes)) 96 97 # Run with count > 5 98 result = pipe.run({"router": {"count": 10}}) 99 print(result) 100 # >> {'router': {'many_items': 'Processing many items'}} 101 102 # Run with count <= 5 103 result = pipe.run({"router": {"count": 3}}) 104 print(result) 105 # >> {'router': {'few_items': 'Processing few items'}} 106 ``` 107 108 <a id="conditional_router.ConditionalRouter.__init__"></a> 109 110 #### ConditionalRouter.\_\_init\_\_ 111 112 ```python 113 def __init__(routes: list[Route], 114 custom_filters: dict[str, Callable] | None = None, 115 unsafe: bool = False, 116 validate_output_type: bool = False, 117 optional_variables: list[str] | None = None) 118 ``` 119 120 Initializes the `ConditionalRouter` with a list of routes detailing the conditions for routing. 121 122 **Arguments**: 123 124 - `routes`: A list of dictionaries, each defining a route. 125 Each route has these four elements: 126 - `condition`: A Jinja2 string expression that determines if the route is selected. 127 - `output`: A Jinja2 expression defining the route's output value. 128 - `output_type`: The type of the output data (for example, `str`, `list[int]`). 129 - `output_name`: The name you want to use to publish `output`. This name is used to connect 130 the router to other components in the pipeline. 131 - `custom_filters`: A dictionary of custom Jinja2 filters used in the condition expressions. 132 For example, passing `{"my_filter": my_filter_fcn}` where: 133 - `my_filter` is the name of the custom filter. 134 - `my_filter_fcn` is a callable that takes `my_var:str` and returns `my_var[:3]`. 135 `{{ my_var|my_filter }}` can then be used inside a route condition expression: 136 `"condition": "{{ my_var|my_filter == 'foo' }}"`. 137 - `unsafe`: Enable execution of arbitrary code in the Jinja template. 138 This should only be used if you trust the source of the template as it can be lead to remote code execution. 139 - `validate_output_type`: Enable validation of routes' output. 140 If a route output doesn't match the declared type a ValueError is raised running. 141 - `optional_variables`: A list of variable names that are optional in your route conditions and outputs. 142 If these variables are not provided at runtime, they will be set to `None`. 143 This allows you to write routes that can handle missing inputs gracefully without raising errors. 144 145 Example usage with a default fallback route in a Pipeline: 146 ```python 147 from haystack import Pipeline 148 from haystack.components.routers import ConditionalRouter 149 150 routes = [ 151 { 152 "condition": '{{ path == "rag" }}', 153 "output": "{{ question }}", 154 "output_name": "rag_route", 155 "output_type": str 156 }, 157 { 158 "condition": "{{ True }}", # fallback route 159 "output": "{{ question }}", 160 "output_name": "default_route", 161 "output_type": str 162 } 163 ] 164 165 router = ConditionalRouter(routes, optional_variables=["path"]) 166 pipe = Pipeline() 167 pipe.add_component("router", router) 168 169 # When 'path' is provided in the pipeline: 170 result = pipe.run(data={"router": {"question": "What?", "path": "rag"}}) 171 assert result["router"] == {"rag_route": "What?"} 172 173 # When 'path' is not provided, fallback route is taken: 174 result = pipe.run(data={"router": {"question": "What?"}}) 175 assert result["router"] == {"default_route": "What?"} 176 ``` 177 178 This pattern is particularly useful when: 179 - You want to provide default/fallback behavior when certain inputs are missing 180 - Some variables are only needed for specific routing conditions 181 - You're building flexible pipelines where not all inputs are guaranteed to be present 182 183 <a id="conditional_router.ConditionalRouter.to_dict"></a> 184 185 #### ConditionalRouter.to\_dict 186 187 ```python 188 def to_dict() -> dict[str, Any] 189 ``` 190 191 Serializes the component to a dictionary. 192 193 **Returns**: 194 195 Dictionary with serialized data. 196 197 <a id="conditional_router.ConditionalRouter.from_dict"></a> 198 199 #### ConditionalRouter.from\_dict 200 201 ```python 202 @classmethod 203 def from_dict(cls, data: dict[str, Any]) -> "ConditionalRouter" 204 ``` 205 206 Deserializes the component from a dictionary. 207 208 **Arguments**: 209 210 - `data`: The dictionary to deserialize from. 211 212 **Returns**: 213 214 The deserialized component. 215 216 <a id="conditional_router.ConditionalRouter.run"></a> 217 218 #### ConditionalRouter.run 219 220 ```python 221 def run(**kwargs) 222 ``` 223 224 Executes the routing logic. 225 226 Executes the routing logic by evaluating the specified boolean condition expressions for each route in the 227 order they are listed. The method directs the flow of data to the output specified in the first route whose 228 `condition` is True. 229 230 **Arguments**: 231 232 - `kwargs`: All variables used in the `condition` expressed in the routes. When the component is used in a 233 pipeline, these variables are passed from the previous component's output. 234 235 **Raises**: 236 237 - `NoRouteSelectedException`: If no `condition' in the routes is `True`. 238 - `RouteConditionException`: If there is an error parsing or evaluating the `condition` expression in the routes. 239 - `ValueError`: If type validation is enabled and route type doesn't match actual value type. 240 241 **Returns**: 242 243 A dictionary where the key is the `output_name` of the selected route and the value is the `output` 244 of the selected route. 245 246 <a id="document_length_router"></a> 247 248 ## Module document\_length\_router 249 250 <a id="document_length_router.DocumentLengthRouter"></a> 251 252 ### DocumentLengthRouter 253 254 Categorizes documents based on the length of the `content` field and routes them to the appropriate output. 255 256 A common use case for DocumentLengthRouter is handling documents obtained from PDFs that contain non-text 257 content, such as scanned pages or images. This component can detect empty or low-content documents and route them to 258 components that perform OCR, generate captions, or compute image embeddings. 259 260 ### Usage example 261 262 ```python 263 from haystack.components.routers import DocumentLengthRouter 264 from haystack.dataclasses import Document 265 266 docs = [ 267 Document(content="Short"), 268 Document(content="Long document "*20), 269 ] 270 271 router = DocumentLengthRouter(threshold=10) 272 273 result = router.run(documents=docs) 274 print(result) 275 276 # { 277 # "short_documents": [Document(content="Short", ...)], 278 # "long_documents": [Document(content="Long document ...", ...)], 279 # } 280 ``` 281 282 <a id="document_length_router.DocumentLengthRouter.__init__"></a> 283 284 #### DocumentLengthRouter.\_\_init\_\_ 285 286 ```python 287 def __init__(*, threshold: int = 10) -> None 288 ``` 289 290 Initialize the DocumentLengthRouter component. 291 292 **Arguments**: 293 294 - `threshold`: The threshold for the number of characters in the document `content` field. Documents where `content` is 295 None or whose character count is less than or equal to the threshold will be routed to the `short_documents` 296 output. Otherwise, they will be routed to the `long_documents` output. 297 To route only documents with None content to `short_documents`, set the threshold to a negative number. 298 299 <a id="document_length_router.DocumentLengthRouter.run"></a> 300 301 #### DocumentLengthRouter.run 302 303 ```python 304 @component.output_types(short_documents=list[Document], 305 long_documents=list[Document]) 306 def run(documents: list[Document]) -> dict[str, list[Document]] 307 ``` 308 309 Categorize input documents into groups based on the length of the `content` field. 310 311 **Arguments**: 312 313 - `documents`: A list of documents to be categorized. 314 315 **Returns**: 316 317 A dictionary with the following keys: 318 - `short_documents`: A list of documents where `content` is None or the length of `content` is less than or 319 equal to the threshold. 320 - `long_documents`: A list of documents where the length of `content` is greater than the threshold. 321 322 <a id="document_type_router"></a> 323 324 ## Module document\_type\_router 325 326 <a id="document_type_router.DocumentTypeRouter"></a> 327 328 ### DocumentTypeRouter 329 330 Routes documents by their MIME types. 331 332 DocumentTypeRouter is used to dynamically route documents within a pipeline based on their MIME types. 333 It supports exact MIME type matches and regex patterns. 334 335 MIME types can be extracted directly from document metadata or inferred from file paths using standard or 336 user-supplied MIME type mappings. 337 338 ### Usage example 339 340 ```python 341 from haystack.components.routers import DocumentTypeRouter 342 from haystack.dataclasses import Document 343 344 docs = [ 345 Document(content="Example text", meta={"file_path": "example.txt"}), 346 Document(content="Another document", meta={"mime_type": "application/pdf"}), 347 Document(content="Unknown type") 348 ] 349 350 router = DocumentTypeRouter( 351 mime_type_meta_field="mime_type", 352 file_path_meta_field="file_path", 353 mime_types=["text/plain", "application/pdf"] 354 ) 355 356 result = router.run(documents=docs) 357 print(result) 358 ``` 359 360 Expected output: 361 ```python 362 { 363 "text/plain": [Document(...)], 364 "application/pdf": [Document(...)], 365 "unclassified": [Document(...)] 366 } 367 ``` 368 369 <a id="document_type_router.DocumentTypeRouter.__init__"></a> 370 371 #### DocumentTypeRouter.\_\_init\_\_ 372 373 ```python 374 def __init__(*, 375 mime_types: list[str], 376 mime_type_meta_field: str | None = None, 377 file_path_meta_field: str | None = None, 378 additional_mimetypes: dict[str, str] | None = None) -> None 379 ``` 380 381 Initialize the DocumentTypeRouter component. 382 383 **Arguments**: 384 385 - `mime_types`: A list of MIME types or regex patterns to classify the input documents. 386 (for example: `["text/plain", "audio/x-wav", "image/jpeg"]`). 387 - `mime_type_meta_field`: Optional name of the metadata field that holds the MIME type. 388 - `file_path_meta_field`: Optional name of the metadata field that holds the file path. Used to infer the MIME type if 389 `mime_type_meta_field` is not provided or missing in a document. 390 - `additional_mimetypes`: Optional dictionary mapping MIME types to file extensions to enhance or override the standard 391 `mimetypes` module. Useful when working with uncommon or custom file types. 392 For example: `{"application/vnd.custom-type": ".custom"}`. 393 394 **Raises**: 395 396 - `ValueError`: If `mime_types` is empty or if both `mime_type_meta_field` and `file_path_meta_field` are 397 not provided. 398 399 <a id="document_type_router.DocumentTypeRouter.run"></a> 400 401 #### DocumentTypeRouter.run 402 403 ```python 404 def run(documents: list[Document]) -> dict[str, list[Document]] 405 ``` 406 407 Categorize input documents into groups based on their MIME type. 408 409 MIME types can either be directly available in document metadata or derived from file paths using the 410 standard Python `mimetypes` module and custom mappings. 411 412 **Arguments**: 413 414 - `documents`: A list of documents to be categorized. 415 416 **Returns**: 417 418 A dictionary where the keys are MIME types (or `"unclassified"`) and the values are lists of documents. 419 420 <a id="file_type_router"></a> 421 422 ## Module file\_type\_router 423 424 <a id="file_type_router.FileTypeRouter"></a> 425 426 ### FileTypeRouter 427 428 Categorizes files or byte streams by their MIME types, helping in context-based routing. 429 430 FileTypeRouter supports both exact MIME type matching and regex patterns. 431 432 For file paths, MIME types come from extensions, while byte streams use metadata. 433 You can use regex patterns in the `mime_types` parameter to set broad categories 434 (such as 'audio/*' or 'text/*') or specific types. 435 MIME types without regex patterns are treated as exact matches. 436 437 ### Usage example 438 439 ```python 440 from haystack.components.routers import FileTypeRouter 441 from pathlib import Path 442 443 # For exact MIME type matching 444 router = FileTypeRouter(mime_types=["text/plain", "application/pdf"]) 445 446 # For flexible matching using regex, to handle all audio types 447 router_with_regex = FileTypeRouter(mime_types=[r"audio/.*", r"text/plain"]) 448 449 sources = [Path("file.txt"), Path("document.pdf"), Path("song.mp3")] 450 print(router.run(sources=sources)) 451 print(router_with_regex.run(sources=sources)) 452 453 # Expected output: 454 # {'text/plain': [ 455 # PosixPath('file.txt')], 'application/pdf': [PosixPath('document.pdf')], 'unclassified': [PosixPath('song.mp3') 456 # ]} 457 # {'audio/.*': [ 458 # PosixPath('song.mp3')], 'text/plain': [PosixPath('file.txt')], 'unclassified': [PosixPath('document.pdf') 459 # ]} 460 ``` 461 462 <a id="file_type_router.FileTypeRouter.__init__"></a> 463 464 #### FileTypeRouter.\_\_init\_\_ 465 466 ```python 467 def __init__(mime_types: list[str], 468 additional_mimetypes: dict[str, str] | None = None, 469 raise_on_failure: bool = False) 470 ``` 471 472 Initialize the FileTypeRouter component. 473 474 **Arguments**: 475 476 - `mime_types`: A list of MIME types or regex patterns to classify the input files or byte streams. 477 (for example: `["text/plain", "audio/x-wav", "image/jpeg"]`). 478 - `additional_mimetypes`: A dictionary containing the MIME type to add to the mimetypes package to prevent unsupported or non-native 479 packages from being unclassified. 480 (for example: `{"application/vnd.openxmlformats-officedocument.wordprocessingml.document": ".docx"}`). 481 - `raise_on_failure`: If True, raises FileNotFoundError when a file path doesn't exist. 482 If False (default), only emits a warning when a file path doesn't exist. 483 484 <a id="file_type_router.FileTypeRouter.to_dict"></a> 485 486 #### FileTypeRouter.to\_dict 487 488 ```python 489 def to_dict() -> dict[str, Any] 490 ``` 491 492 Serializes the component to a dictionary. 493 494 **Returns**: 495 496 Dictionary with serialized data. 497 498 <a id="file_type_router.FileTypeRouter.from_dict"></a> 499 500 #### FileTypeRouter.from\_dict 501 502 ```python 503 @classmethod 504 def from_dict(cls, data: dict[str, Any]) -> "FileTypeRouter" 505 ``` 506 507 Deserializes the component from a dictionary. 508 509 **Arguments**: 510 511 - `data`: The dictionary to deserialize from. 512 513 **Returns**: 514 515 The deserialized component. 516 517 <a id="file_type_router.FileTypeRouter.run"></a> 518 519 #### FileTypeRouter.run 520 521 ```python 522 def run( 523 sources: list[str | Path | ByteStream], 524 meta: dict[str, Any] | list[dict[str, Any]] | None = None 525 ) -> dict[str, list[ByteStream | Path]] 526 ``` 527 528 Categorize files or byte streams according to their MIME types. 529 530 **Arguments**: 531 532 - `sources`: A list of file paths or byte streams to categorize. 533 - `meta`: Optional metadata to attach to the sources. 534 When provided, the sources are internally converted to ByteStream objects and the metadata is added. 535 This value can be a list of dictionaries or a single dictionary. 536 If it's a single dictionary, its content is added to the metadata of all ByteStream objects. 537 If it's a list, its length must match the number of sources, as they are zipped together. 538 539 **Returns**: 540 541 A dictionary where the keys are MIME types and the values are lists of data sources. 542 Two extra keys may be returned: `"unclassified"` when a source's MIME type doesn't match any pattern 543 and `"failed"` when a source cannot be processed (for example, a file path that doesn't exist). 544 545 <a id="llm_messages_router"></a> 546 547 ## Module llm\_messages\_router 548 549 <a id="llm_messages_router.LLMMessagesRouter"></a> 550 551 ### LLMMessagesRouter 552 553 Routes Chat Messages to different connections using a generative Language Model to perform classification. 554 555 This component can be used with general-purpose LLMs and with specialized LLMs for moderation like Llama Guard. 556 557 ### Usage example 558 ```python 559 from haystack.components.generators.chat import HuggingFaceAPIChatGenerator 560 from haystack.components.routers.llm_messages_router import LLMMessagesRouter 561 from haystack.dataclasses import ChatMessage 562 563 # initialize a Chat Generator with a generative model for moderation 564 chat_generator = HuggingFaceAPIChatGenerator( 565 api_type="serverless_inference_api", 566 api_params={"model": "meta-llama/Llama-Guard-4-12B", "provider": "groq"}, 567 ) 568 569 router = LLMMessagesRouter(chat_generator=chat_generator, 570 output_names=["unsafe", "safe"], 571 output_patterns=["unsafe", "safe"]) 572 573 574 print(router.run([ChatMessage.from_user("How to rob a bank?")])) 575 576 # { 577 # 'chat_generator_text': 'unsafe 578 S2', 579 # 'unsafe': [ 580 # ChatMessage( 581 # _role=<ChatRole.USER: 'user'>, 582 # _content=[TextContent(text='How to rob a bank?')], 583 # _name=None, 584 # _meta={} 585 # ) 586 # ] 587 # } 588 ``` 589 590 <a id="llm_messages_router.LLMMessagesRouter.__init__"></a> 591 592 #### LLMMessagesRouter.\_\_init\_\_ 593 594 ```python 595 def __init__(chat_generator: ChatGenerator, 596 output_names: list[str], 597 output_patterns: list[str], 598 system_prompt: str | None = None) 599 ``` 600 601 Initialize the LLMMessagesRouter component. 602 603 **Arguments**: 604 605 - `chat_generator`: A ChatGenerator instance which represents the LLM. 606 - `output_names`: A list of output connection names. These can be used to connect the router to other 607 components. 608 - `output_patterns`: A list of regular expressions to be matched against the output of the LLM. Each pattern 609 corresponds to an output name. Patterns are evaluated in order. 610 When using moderation models, refer to the model card to understand the expected outputs. 611 - `system_prompt`: An optional system prompt to customize the behavior of the LLM. 612 For moderation models, refer to the model card for supported customization options. 613 614 **Raises**: 615 616 - `ValueError`: If output_names and output_patterns are not non-empty lists of the same length. 617 618 <a id="llm_messages_router.LLMMessagesRouter.warm_up"></a> 619 620 #### LLMMessagesRouter.warm\_up 621 622 ```python 623 def warm_up() 624 ``` 625 626 Warm up the underlying LLM. 627 628 <a id="llm_messages_router.LLMMessagesRouter.run"></a> 629 630 #### LLMMessagesRouter.run 631 632 ```python 633 def run(messages: list[ChatMessage]) -> dict[str, str | list[ChatMessage]] 634 ``` 635 636 Classify the messages based on LLM output and route them to the appropriate output connection. 637 638 **Arguments**: 639 640 - `messages`: A list of ChatMessages to be routed. Only user and assistant messages are supported. 641 642 **Raises**: 643 644 - `ValueError`: If messages is an empty list or contains messages with unsupported roles. 645 646 **Returns**: 647 648 A dictionary with the following keys: 649 - "chat_generator_text": The text output of the LLM, useful for debugging. 650 - "output_names": Each contains the list of messages that matched the corresponding pattern. 651 - "unmatched": The messages that did not match any of the output patterns. 652 653 <a id="llm_messages_router.LLMMessagesRouter.to_dict"></a> 654 655 #### LLMMessagesRouter.to\_dict 656 657 ```python 658 def to_dict() -> dict[str, Any] 659 ``` 660 661 Serialize this component to a dictionary. 662 663 **Returns**: 664 665 The serialized component as a dictionary. 666 667 <a id="llm_messages_router.LLMMessagesRouter.from_dict"></a> 668 669 #### LLMMessagesRouter.from\_dict 670 671 ```python 672 @classmethod 673 def from_dict(cls, data: dict[str, Any]) -> "LLMMessagesRouter" 674 ``` 675 676 Deserialize this component from a dictionary. 677 678 **Arguments**: 679 680 - `data`: The dictionary representation of this component. 681 682 **Returns**: 683 684 The deserialized component instance. 685 686 <a id="metadata_router"></a> 687 688 ## Module metadata\_router 689 690 <a id="metadata_router.MetadataRouter"></a> 691 692 ### MetadataRouter 693 694 Routes documents or byte streams to different connections based on their metadata fields. 695 696 Specify the routing rules in the `init` method. 697 If a document or byte stream does not match any of the rules, it's routed to a connection named "unmatched". 698 699 700 ### Usage examples 701 702 **Routing Documents by metadata:** 703 ```python 704 from haystack import Document 705 from haystack.components.routers import MetadataRouter 706 707 docs = [Document(content="Paris is the capital of France.", meta={"language": "en"}), 708 Document(content="Berlin ist die Haupststadt von Deutschland.", meta={"language": "de"})] 709 710 router = MetadataRouter(rules={"en": {"field": "meta.language", "operator": "==", "value": "en"}}) 711 712 print(router.run(documents=docs)) 713 # {'en': [Document(id=..., content: 'Paris is the capital of France.', meta: {'language': 'en'})], 714 # 'unmatched': [Document(id=..., content: 'Berlin ist die Haupststadt von Deutschland.', meta: {'language': 'de'})]} 715 ``` 716 717 **Routing ByteStreams by metadata:** 718 ```python 719 from haystack.dataclasses import ByteStream 720 from haystack.components.routers import MetadataRouter 721 722 streams = [ 723 ByteStream.from_string("Hello world", meta={"language": "en"}), 724 ByteStream.from_string("Bonjour le monde", meta={"language": "fr"}) 725 ] 726 727 router = MetadataRouter( 728 rules={"english": {"field": "meta.language", "operator": "==", "value": "en"}}, 729 output_type=list[ByteStream] 730 ) 731 732 result = router.run(documents=streams) 733 # {'english': [ByteStream(...)], 'unmatched': [ByteStream(...)]} 734 ``` 735 736 <a id="metadata_router.MetadataRouter.__init__"></a> 737 738 #### MetadataRouter.\_\_init\_\_ 739 740 ```python 741 def __init__(rules: dict[str, dict], 742 output_type: type = list[Document]) -> None 743 ``` 744 745 Initializes the MetadataRouter component. 746 747 **Arguments**: 748 749 - `rules`: A dictionary defining how to route documents or byte streams to output connections based on their 750 metadata. Keys are output connection names, and values are dictionaries of 751 [filtering expressions](https://docs.haystack.deepset.ai/docs/metadata-filtering) in Haystack. 752 For example: 753 ```python 754 { 755 "edge_1": { 756 "operator": "AND", 757 "conditions": [ 758 {"field": "meta.created_at", "operator": ">=", "value": "2023-01-01"}, 759 {"field": "meta.created_at", "operator": "<", "value": "2023-04-01"}, 760 ], 761 }, 762 "edge_2": { 763 "operator": "AND", 764 "conditions": [ 765 {"field": "meta.created_at", "operator": ">=", "value": "2023-04-01"}, 766 {"field": "meta.created_at", "operator": "<", "value": "2023-07-01"}, 767 ], 768 }, 769 "edge_3": { 770 "operator": "AND", 771 "conditions": [ 772 {"field": "meta.created_at", "operator": ">=", "value": "2023-07-01"}, 773 {"field": "meta.created_at", "operator": "<", "value": "2023-10-01"}, 774 ], 775 }, 776 "edge_4": { 777 "operator": "AND", 778 "conditions": [ 779 {"field": "meta.created_at", "operator": ">=", "value": "2023-10-01"}, 780 {"field": "meta.created_at", "operator": "<", "value": "2024-01-01"}, 781 ], 782 }, 783 } 784 ``` 785 :param output_type: The type of the output produced. Lists of Documents or ByteStreams can be specified. 786 787 <a id="metadata_router.MetadataRouter.run"></a> 788 789 #### MetadataRouter.run 790 791 ```python 792 def run(documents: list[Document] | list[ByteStream]) 793 ``` 794 795 Routes documents or byte streams to different connections based on their metadata fields. 796 797 If a document or byte stream does not match any of the rules, it's routed to a connection named "unmatched". 798 799 **Arguments**: 800 801 - `documents`: A list of `Document` or `ByteStream` objects to be routed based on their metadata. 802 803 **Returns**: 804 805 A dictionary where the keys are the names of the output connections (including `"unmatched"`) 806 and the values are lists of `Document` or `ByteStream` objects that matched the corresponding rules. 807 808 <a id="metadata_router.MetadataRouter.to_dict"></a> 809 810 #### MetadataRouter.to\_dict 811 812 ```python 813 def to_dict() -> dict[str, Any] 814 ``` 815 816 Serialize this component to a dictionary. 817 818 **Returns**: 819 820 The serialized component as a dictionary. 821 822 <a id="metadata_router.MetadataRouter.from_dict"></a> 823 824 #### MetadataRouter.from\_dict 825 826 ```python 827 @classmethod 828 def from_dict(cls, data: dict[str, Any]) -> "MetadataRouter" 829 ``` 830 831 Deserialize this component from a dictionary. 832 833 **Arguments**: 834 835 - `data`: The dictionary representation of this component. 836 837 **Returns**: 838 839 The deserialized component instance. 840 841 <a id="text_language_router"></a> 842 843 ## Module text\_language\_router 844 845 <a id="text_language_router.TextLanguageRouter"></a> 846 847 ### TextLanguageRouter 848 849 Routes text strings to different output connections based on their language. 850 851 Provide a list of languages during initialization. If the document's text doesn't match any of the 852 specified languages, the metadata value is set to "unmatched". 853 For routing documents based on their language, use the DocumentLanguageClassifier component, 854 followed by the MetaDataRouter. 855 856 ### Usage example 857 858 ```python 859 from haystack import Pipeline, Document 860 from haystack.components.routers import TextLanguageRouter 861 from haystack.document_stores.in_memory import InMemoryDocumentStore 862 from haystack.components.retrievers.in_memory import InMemoryBM25Retriever 863 864 document_store = InMemoryDocumentStore() 865 document_store.write_documents([Document(content="Elvis Presley was an American singer and actor.")]) 866 867 p = Pipeline() 868 p.add_component(instance=TextLanguageRouter(languages=["en"]), name="text_language_router") 869 p.add_component(instance=InMemoryBM25Retriever(document_store=document_store), name="retriever") 870 p.connect("text_language_router.en", "retriever.query") 871 872 result = p.run({"text_language_router": {"text": "Who was Elvis Presley?"}}) 873 assert result["retriever"]["documents"][0].content == "Elvis Presley was an American singer and actor." 874 875 result = p.run({"text_language_router": {"text": "ένα ελληνικό κείμενο"}}) 876 assert result["text_language_router"]["unmatched"] == "ένα ελληνικό κείμενο" 877 ``` 878 879 <a id="text_language_router.TextLanguageRouter.__init__"></a> 880 881 #### TextLanguageRouter.\_\_init\_\_ 882 883 ```python 884 def __init__(languages: list[str] | None = None) 885 ``` 886 887 Initialize the TextLanguageRouter component. 888 889 **Arguments**: 890 891 - `languages`: A list of ISO language codes. 892 See the supported languages in [`langdetect` documentation](https://github.com/Mimino666/langdetect#languages). 893 If not specified, defaults to ["en"]. 894 895 <a id="text_language_router.TextLanguageRouter.run"></a> 896 897 #### TextLanguageRouter.run 898 899 ```python 900 def run(text: str) -> dict[str, str] 901 ``` 902 903 Routes the text strings to different output connections based on their language. 904 905 If the document's text doesn't match any of the specified languages, the metadata value is set to "unmatched". 906 907 **Arguments**: 908 909 - `text`: A text string to route. 910 911 **Raises**: 912 913 - `TypeError`: If the input is not a string. 914 915 **Returns**: 916 917 A dictionary in which the key is the language (or `"unmatched"`), 918 and the value is the text. 919 920 <a id="transformers_text_router"></a> 921 922 ## Module transformers\_text\_router 923 924 <a id="transformers_text_router.TransformersTextRouter"></a> 925 926 ### TransformersTextRouter 927 928 Routes the text strings to different connections based on a category label. 929 930 The labels are specific to each model and can be found it its description on Hugging Face. 931 932 ### Usage example 933 934 ```python 935 from haystack.core.pipeline import Pipeline 936 from haystack.components.routers import TransformersTextRouter 937 from haystack.components.builders import PromptBuilder 938 from haystack.components.generators import HuggingFaceLocalGenerator 939 940 p = Pipeline() 941 p.add_component( 942 instance=TransformersTextRouter(model="papluca/xlm-roberta-base-language-detection"), 943 name="text_router" 944 ) 945 p.add_component( 946 instance=PromptBuilder(template="Answer the question: {{query}}\nAnswer:"), 947 name="english_prompt_builder" 948 ) 949 p.add_component( 950 instance=PromptBuilder(template="Beantworte die Frage: {{query}}\nAntwort:"), 951 name="german_prompt_builder" 952 ) 953 954 p.add_component( 955 instance=HuggingFaceLocalGenerator(model="DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1"), 956 name="german_llm" 957 ) 958 p.add_component( 959 instance=HuggingFaceLocalGenerator(model="microsoft/Phi-3-mini-4k-instruct"), 960 name="english_llm" 961 ) 962 963 p.connect("text_router.en", "english_prompt_builder.query") 964 p.connect("text_router.de", "german_prompt_builder.query") 965 p.connect("english_prompt_builder.prompt", "english_llm.prompt") 966 p.connect("german_prompt_builder.prompt", "german_llm.prompt") 967 968 # English Example 969 print(p.run({"text_router": {"text": "What is the capital of Germany?"}})) 970 971 # German Example 972 print(p.run({"text_router": {"text": "Was ist die Hauptstadt von Deutschland?"}})) 973 ``` 974 975 <a id="transformers_text_router.TransformersTextRouter.__init__"></a> 976 977 #### TransformersTextRouter.\_\_init\_\_ 978 979 ```python 980 def __init__(model: str, 981 labels: list[str] | None = None, 982 device: ComponentDevice | None = None, 983 token: Secret | None = Secret.from_env_var( 984 ["HF_API_TOKEN", "HF_TOKEN"], strict=False), 985 huggingface_pipeline_kwargs: dict[str, Any] | None = None) 986 ``` 987 988 Initializes the TransformersTextRouter component. 989 990 **Arguments**: 991 992 - `model`: The name or path of a Hugging Face model for text classification. 993 - `labels`: The list of labels. If not provided, the component fetches the labels 994 from the model configuration file hosted on the Hugging Face Hub using 995 `transformers.AutoConfig.from_pretrained`. 996 - `device`: The device for loading the model. If `None`, automatically selects the default device. 997 If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter. 998 - `token`: The API token used to download private models from Hugging Face. 999 If `True`, uses either `HF_API_TOKEN` or `HF_TOKEN` environment variables. 1000 To generate these tokens, run `transformers-cli login`. 1001 - `huggingface_pipeline_kwargs`: A dictionary of keyword arguments for initializing the Hugging Face 1002 text classification pipeline. 1003 1004 <a id="transformers_text_router.TransformersTextRouter.warm_up"></a> 1005 1006 #### TransformersTextRouter.warm\_up 1007 1008 ```python 1009 def warm_up() 1010 ``` 1011 1012 Initializes the component. 1013 1014 <a id="transformers_text_router.TransformersTextRouter.to_dict"></a> 1015 1016 #### TransformersTextRouter.to\_dict 1017 1018 ```python 1019 def to_dict() -> dict[str, Any] 1020 ``` 1021 1022 Serializes the component to a dictionary. 1023 1024 **Returns**: 1025 1026 Dictionary with serialized data. 1027 1028 <a id="transformers_text_router.TransformersTextRouter.from_dict"></a> 1029 1030 #### TransformersTextRouter.from\_dict 1031 1032 ```python 1033 @classmethod 1034 def from_dict(cls, data: dict[str, Any]) -> "TransformersTextRouter" 1035 ``` 1036 1037 Deserializes the component from a dictionary. 1038 1039 **Arguments**: 1040 1041 - `data`: Dictionary to deserialize from. 1042 1043 **Returns**: 1044 1045 Deserialized component. 1046 1047 <a id="transformers_text_router.TransformersTextRouter.run"></a> 1048 1049 #### TransformersTextRouter.run 1050 1051 ```python 1052 def run(text: str) -> dict[str, str] 1053 ``` 1054 1055 Routes the text strings to different connections based on a category label. 1056 1057 **Arguments**: 1058 1059 - `text`: A string of text to route. 1060 1061 **Raises**: 1062 1063 - `TypeError`: If the input is not a str. 1064 1065 **Returns**: 1066 1067 A dictionary with the label as key and the text as value. 1068 1069 <a id="zero_shot_text_router"></a> 1070 1071 ## Module zero\_shot\_text\_router 1072 1073 <a id="zero_shot_text_router.TransformersZeroShotTextRouter"></a> 1074 1075 ### TransformersZeroShotTextRouter 1076 1077 Routes the text strings to different connections based on a category label. 1078 1079 Specify the set of labels for categorization when initializing the component. 1080 1081 ### Usage example 1082 1083 ```python 1084 from haystack import Document 1085 from haystack.document_stores.in_memory import InMemoryDocumentStore 1086 from haystack.core.pipeline import Pipeline 1087 from haystack.components.routers import TransformersZeroShotTextRouter 1088 from haystack.components.embedders import SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder 1089 from haystack.components.retrievers import InMemoryEmbeddingRetriever 1090 1091 document_store = InMemoryDocumentStore() 1092 doc_embedder = SentenceTransformersDocumentEmbedder(model="intfloat/e5-base-v2") 1093 doc_embedder.warm_up() 1094 docs = [ 1095 Document( 1096 content="Germany, officially the Federal Republic of Germany, is a country in the western region of " 1097 "Central Europe. The nation's capital and most populous city is Berlin and its main financial centre " 1098 "is Frankfurt; the largest urban area is the Ruhr." 1099 ), 1100 Document( 1101 content="France, officially the French Republic, is a country located primarily in Western Europe. " 1102 "France is a unitary semi-presidential republic with its capital in Paris, the country's largest city " 1103 "and main cultural and commercial centre; other major urban areas include Marseille, Lyon, Toulouse, " 1104 "Lille, Bordeaux, Strasbourg, Nantes and Nice." 1105 ) 1106 ] 1107 docs_with_embeddings = doc_embedder.run(docs) 1108 document_store.write_documents(docs_with_embeddings["documents"]) 1109 1110 p = Pipeline() 1111 p.add_component(instance=TransformersZeroShotTextRouter(labels=["passage", "query"]), name="text_router") 1112 p.add_component( 1113 instance=SentenceTransformersTextEmbedder(model="intfloat/e5-base-v2", prefix="passage: "), 1114 name="passage_embedder" 1115 ) 1116 p.add_component( 1117 instance=SentenceTransformersTextEmbedder(model="intfloat/e5-base-v2", prefix="query: "), 1118 name="query_embedder" 1119 ) 1120 p.add_component( 1121 instance=InMemoryEmbeddingRetriever(document_store=document_store), 1122 name="query_retriever" 1123 ) 1124 p.add_component( 1125 instance=InMemoryEmbeddingRetriever(document_store=document_store), 1126 name="passage_retriever" 1127 ) 1128 1129 p.connect("text_router.passage", "passage_embedder.text") 1130 p.connect("passage_embedder.embedding", "passage_retriever.query_embedding") 1131 p.connect("text_router.query", "query_embedder.text") 1132 p.connect("query_embedder.embedding", "query_retriever.query_embedding") 1133 1134 # Query Example 1135 p.run({"text_router": {"text": "What is the capital of Germany?"}}) 1136 1137 # Passage Example 1138 p.run({ 1139 "text_router":{ 1140 "text": "The United Kingdom of Great Britain and Northern Ireland, commonly known as the " "United Kingdom (UK) or Britain, is a country in Northwestern Europe, off the north-western coast of " "the continental mainland." 1141 } 1142 }) 1143 ``` 1144 1145 <a id="zero_shot_text_router.TransformersZeroShotTextRouter.__init__"></a> 1146 1147 #### TransformersZeroShotTextRouter.\_\_init\_\_ 1148 1149 ```python 1150 def __init__(labels: list[str], 1151 multi_label: bool = False, 1152 model: str = "MoritzLaurer/deberta-v3-base-zeroshot-v1.1-all-33", 1153 device: ComponentDevice | None = None, 1154 token: Secret | None = Secret.from_env_var( 1155 ["HF_API_TOKEN", "HF_TOKEN"], strict=False), 1156 huggingface_pipeline_kwargs: dict[str, Any] | None = None) 1157 ``` 1158 1159 Initializes the TransformersZeroShotTextRouter component. 1160 1161 **Arguments**: 1162 1163 - `labels`: The set of labels to use for classification. Can be a single label, 1164 a string of comma-separated labels, or a list of labels. 1165 - `multi_label`: Indicates if multiple labels can be true. 1166 If `False`, label scores are normalized so their sum equals 1 for each sequence. 1167 If `True`, the labels are considered independent and probabilities are normalized for each candidate by 1168 doing a softmax of the entailment score vs. the contradiction score. 1169 - `model`: The name or path of a Hugging Face model for zero-shot text classification. 1170 - `device`: The device for loading the model. If `None`, automatically selects the default device. 1171 If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter. 1172 - `token`: The API token used to download private models from Hugging Face. 1173 If `True`, uses either `HF_API_TOKEN` or `HF_TOKEN` environment variables. 1174 To generate these tokens, run `transformers-cli login`. 1175 - `huggingface_pipeline_kwargs`: A dictionary of keyword arguments for initializing the Hugging Face 1176 zero shot text classification. 1177 1178 <a id="zero_shot_text_router.TransformersZeroShotTextRouter.warm_up"></a> 1179 1180 #### TransformersZeroShotTextRouter.warm\_up 1181 1182 ```python 1183 def warm_up() 1184 ``` 1185 1186 Initializes the component. 1187 1188 <a id="zero_shot_text_router.TransformersZeroShotTextRouter.to_dict"></a> 1189 1190 #### TransformersZeroShotTextRouter.to\_dict 1191 1192 ```python 1193 def to_dict() -> dict[str, Any] 1194 ``` 1195 1196 Serializes the component to a dictionary. 1197 1198 **Returns**: 1199 1200 Dictionary with serialized data. 1201 1202 <a id="zero_shot_text_router.TransformersZeroShotTextRouter.from_dict"></a> 1203 1204 #### TransformersZeroShotTextRouter.from\_dict 1205 1206 ```python 1207 @classmethod 1208 def from_dict(cls, data: dict[str, Any]) -> "TransformersZeroShotTextRouter" 1209 ``` 1210 1211 Deserializes the component from a dictionary. 1212 1213 **Arguments**: 1214 1215 - `data`: Dictionary to deserialize from. 1216 1217 **Returns**: 1218 1219 Deserialized component. 1220 1221 <a id="zero_shot_text_router.TransformersZeroShotTextRouter.run"></a> 1222 1223 #### TransformersZeroShotTextRouter.run 1224 1225 ```python 1226 def run(text: str) -> dict[str, str] 1227 ``` 1228 1229 Routes the text strings to different connections based on a category label. 1230 1231 **Arguments**: 1232 1233 - `text`: A string of text to route. 1234 1235 **Raises**: 1236 1237 - `TypeError`: If the input is not a str. 1238 1239 **Returns**: 1240 1241 A dictionary with the label as key and the text as value. 1242