routers_api.md
1 --- 2 title: Routers 3 id: routers-api 4 description: Routers is a group of components that route queries or Documents to other components that can handle them best. 5 slug: "/routers-api" 6 --- 7 8 <a id="conditional_router"></a> 9 10 # Module conditional\_router 11 12 <a id="conditional_router.NoRouteSelectedException"></a> 13 14 ## NoRouteSelectedException 15 16 Exception raised when no route is selected in ConditionalRouter. 17 18 <a id="conditional_router.RouteConditionException"></a> 19 20 ## RouteConditionException 21 22 Exception raised when there is an error parsing or evaluating the condition expression in ConditionalRouter. 23 24 <a id="conditional_router.ConditionalRouter"></a> 25 26 ## ConditionalRouter 27 28 Routes data based on specific conditions. 29 30 You define these conditions in a list of dictionaries called `routes`. 31 Each dictionary in this list represents a single route. Each route has these four elements: 32 - `condition`: A Jinja2 string expression that determines if the route is selected. 33 - `output`: A Jinja2 expression defining the route's output value. 34 - `output_type`: The type of the output data (for example, `str`, `list[int]`). 35 - `output_name`: The name you want to use to publish `output`. This name is used to connect 36 the router to other components in the pipeline. 37 38 ### Usage example 39 40 ```python 41 from haystack.components.routers import ConditionalRouter 42 43 routes = [ 44 { 45 "condition": "{{streams|length > 2}}", 46 "output": "{{streams}}", 47 "output_name": "enough_streams", 48 "output_type": list[int], 49 }, 50 { 51 "condition": "{{streams|length <= 2}}", 52 "output": "{{streams}}", 53 "output_name": "insufficient_streams", 54 "output_type": list[int], 55 }, 56 ] 57 router = ConditionalRouter(routes) 58 # When 'streams' has more than 2 items, 'enough_streams' output will activate, emitting the list [1, 2, 3] 59 kwargs = {"streams": [1, 2, 3], "query": "Haystack"} 60 result = router.run(**kwargs) 61 assert result == {"enough_streams": [1, 2, 3]} 62 ``` 63 64 In this example, we configure two routes. The first route sends the 'streams' value to 'enough_streams' if the 65 stream count exceeds two. The second route directs 'streams' to 'insufficient_streams' if there 66 are two or fewer streams. 67 68 In the pipeline setup, the Router connects to other components using the output names. For example, 69 'enough_streams' might connect to a component that processes streams, while 70 'insufficient_streams' might connect to a component that fetches more streams. 71 72 73 Here is a pipeline that uses `ConditionalRouter` and routes the fetched `ByteStreams` to 74 different components depending on the number of streams fetched: 75 76 ```python 77 from haystack import Pipeline 78 from haystack.dataclasses import ByteStream 79 from haystack.components.routers import ConditionalRouter 80 81 routes = [ 82 { 83 "condition": "{{streams|length > 2}}", 84 "output": "{{streams}}", 85 "output_name": "enough_streams", 86 "output_type": list[ByteStream], 87 }, 88 { 89 "condition": "{{streams|length <= 2}}", 90 "output": "{{streams}}", 91 "output_name": "insufficient_streams", 92 "output_type": list[ByteStream], 93 }, 94 ] 95 96 pipe = Pipeline() 97 pipe.add_component("router", router) 98 ... 99 pipe.connect("router.enough_streams", "some_component_a.streams") 100 pipe.connect("router.insufficient_streams", "some_component_b.streams_or_some_other_input") 101 ... 102 ``` 103 104 <a id="conditional_router.ConditionalRouter.__init__"></a> 105 106 #### ConditionalRouter.\_\_init\_\_ 107 108 ```python 109 def __init__(routes: list[Route], 110 custom_filters: Optional[dict[str, Callable]] = None, 111 unsafe: bool = False, 112 validate_output_type: bool = False, 113 optional_variables: Optional[list[str]] = None) 114 ``` 115 116 Initializes the `ConditionalRouter` with a list of routes detailing the conditions for routing. 117 118 **Arguments**: 119 120 - `routes`: A list of dictionaries, each defining a route. 121 Each route has these four elements: 122 - `condition`: A Jinja2 string expression that determines if the route is selected. 123 - `output`: A Jinja2 expression defining the route's output value. 124 - `output_type`: The type of the output data (for example, `str`, `list[int]`). 125 - `output_name`: The name you want to use to publish `output`. This name is used to connect 126 the router to other components in the pipeline. 127 - `custom_filters`: A dictionary of custom Jinja2 filters used in the condition expressions. 128 For example, passing `{"my_filter": my_filter_fcn}` where: 129 - `my_filter` is the name of the custom filter. 130 - `my_filter_fcn` is a callable that takes `my_var:str` and returns `my_var[:3]`. 131 `{{ my_var|my_filter }}` can then be used inside a route condition expression: 132 `"condition": "{{ my_var|my_filter == 'foo' }}"`. 133 - `unsafe`: Enable execution of arbitrary code in the Jinja template. 134 This should only be used if you trust the source of the template as it can be lead to remote code execution. 135 - `validate_output_type`: Enable validation of routes' output. 136 If a route output doesn't match the declared type a ValueError is raised running. 137 - `optional_variables`: A list of variable names that are optional in your route conditions and outputs. 138 If these variables are not provided at runtime, they will be set to `None`. 139 This allows you to write routes that can handle missing inputs gracefully without raising errors. 140 141 Example usage with a default fallback route in a Pipeline: 142 ```python 143 from haystack import Pipeline 144 from haystack.components.routers import ConditionalRouter 145 146 routes = [ 147 { 148 "condition": '{{ path == "rag" }}', 149 "output": "{{ question }}", 150 "output_name": "rag_route", 151 "output_type": str 152 }, 153 { 154 "condition": "{{ True }}", # fallback route 155 "output": "{{ question }}", 156 "output_name": "default_route", 157 "output_type": str 158 } 159 ] 160 161 router = ConditionalRouter(routes, optional_variables=["path"]) 162 pipe = Pipeline() 163 pipe.add_component("router", router) 164 165 # When 'path' is provided in the pipeline: 166 result = pipe.run(data={"router": {"question": "What?", "path": "rag"}}) 167 assert result["router"] == {"rag_route": "What?"} 168 169 # When 'path' is not provided, fallback route is taken: 170 result = pipe.run(data={"router": {"question": "What?"}}) 171 assert result["router"] == {"default_route": "What?"} 172 ``` 173 174 This pattern is particularly useful when: 175 - You want to provide default/fallback behavior when certain inputs are missing 176 - Some variables are only needed for specific routing conditions 177 - You're building flexible pipelines where not all inputs are guaranteed to be present 178 179 <a id="conditional_router.ConditionalRouter.to_dict"></a> 180 181 #### ConditionalRouter.to\_dict 182 183 ```python 184 def to_dict() -> dict[str, Any] 185 ``` 186 187 Serializes the component to a dictionary. 188 189 **Returns**: 190 191 Dictionary with serialized data. 192 193 <a id="conditional_router.ConditionalRouter.from_dict"></a> 194 195 #### ConditionalRouter.from\_dict 196 197 ```python 198 @classmethod 199 def from_dict(cls, data: dict[str, Any]) -> "ConditionalRouter" 200 ``` 201 202 Deserializes the component from a dictionary. 203 204 **Arguments**: 205 206 - `data`: The dictionary to deserialize from. 207 208 **Returns**: 209 210 The deserialized component. 211 212 <a id="conditional_router.ConditionalRouter.run"></a> 213 214 #### ConditionalRouter.run 215 216 ```python 217 def run(**kwargs) 218 ``` 219 220 Executes the routing logic. 221 222 Executes the routing logic by evaluating the specified boolean condition expressions for each route in the 223 order they are listed. The method directs the flow of data to the output specified in the first route whose 224 `condition` is True. 225 226 **Arguments**: 227 228 - `kwargs`: All variables used in the `condition` expressed in the routes. When the component is used in a 229 pipeline, these variables are passed from the previous component's output. 230 231 **Raises**: 232 233 - `NoRouteSelectedException`: If no `condition' in the routes is `True`. 234 - `RouteConditionException`: If there is an error parsing or evaluating the `condition` expression in the routes. 235 - `ValueError`: If type validation is enabled and route type doesn't match actual value type. 236 237 **Returns**: 238 239 A dictionary where the key is the `output_name` of the selected route and the value is the `output` 240 of the selected route. 241 242 <a id="document_length_router"></a> 243 244 # Module document\_length\_router 245 246 <a id="document_length_router.DocumentLengthRouter"></a> 247 248 ## DocumentLengthRouter 249 250 Categorizes documents based on the length of the `content` field and routes them to the appropriate output. 251 252 A common use case for DocumentLengthRouter is handling documents obtained from PDFs that contain non-text 253 content, such as scanned pages or images. This component can detect empty or low-content documents and route them to 254 components that perform OCR, generate captions, or compute image embeddings. 255 256 ### Usage example 257 258 ```python 259 from haystack.components.routers import DocumentLengthRouter 260 from haystack.dataclasses import Document 261 262 docs = [ 263 Document(content="Short"), 264 Document(content="Long document "*20), 265 ] 266 267 router = DocumentLengthRouter(threshold=10) 268 269 result = router.run(documents=docs) 270 print(result) 271 272 # { 273 # "short_documents": [Document(content="Short", ...)], 274 # "long_documents": [Document(content="Long document ...", ...)], 275 # } 276 ``` 277 278 <a id="document_length_router.DocumentLengthRouter.__init__"></a> 279 280 #### DocumentLengthRouter.\_\_init\_\_ 281 282 ```python 283 def __init__(*, threshold: int = 10) -> None 284 ``` 285 286 Initialize the DocumentLengthRouter component. 287 288 **Arguments**: 289 290 - `threshold`: The threshold for the number of characters in the document `content` field. Documents where `content` is 291 None or whose character count is less than or equal to the threshold will be routed to the `short_documents` 292 output. Otherwise, they will be routed to the `long_documents` output. 293 To route only documents with None content to `short_documents`, set the threshold to a negative number. 294 295 <a id="document_length_router.DocumentLengthRouter.run"></a> 296 297 #### DocumentLengthRouter.run 298 299 ```python 300 @component.output_types(short_documents=list[Document], 301 long_documents=list[Document]) 302 def run(documents: list[Document]) -> dict[str, list[Document]] 303 ``` 304 305 Categorize input documents into groups based on the length of the `content` field. 306 307 **Arguments**: 308 309 - `documents`: A list of documents to be categorized. 310 311 **Returns**: 312 313 A dictionary with the following keys: 314 - `short_documents`: A list of documents where `content` is None or the length of `content` is less than or 315 equal to the threshold. 316 - `long_documents`: A list of documents where the length of `content` is greater than the threshold. 317 318 <a id="document_type_router"></a> 319 320 # Module document\_type\_router 321 322 <a id="document_type_router.DocumentTypeRouter"></a> 323 324 ## DocumentTypeRouter 325 326 Routes documents by their MIME types. 327 328 DocumentTypeRouter is used to dynamically route documents within a pipeline based on their MIME types. 329 It supports exact MIME type matches and regex patterns. 330 331 MIME types can be extracted directly from document metadata or inferred from file paths using standard or 332 user-supplied MIME type mappings. 333 334 ### Usage example 335 336 ```python 337 from haystack.components.routers import DocumentTypeRouter 338 from haystack.dataclasses import Document 339 340 docs = [ 341 Document(content="Example text", meta={"file_path": "example.txt"}), 342 Document(content="Another document", meta={"mime_type": "application/pdf"}), 343 Document(content="Unknown type") 344 ] 345 346 router = DocumentTypeRouter( 347 mime_type_meta_field="mime_type", 348 file_path_meta_field="file_path", 349 mime_types=["text/plain", "application/pdf"] 350 ) 351 352 result = router.run(documents=docs) 353 print(result) 354 ``` 355 356 Expected output: 357 ```python 358 { 359 "text/plain": [Document(...)], 360 "application/pdf": [Document(...)], 361 "unclassified": [Document(...)] 362 } 363 ``` 364 365 <a id="document_type_router.DocumentTypeRouter.__init__"></a> 366 367 #### DocumentTypeRouter.\_\_init\_\_ 368 369 ```python 370 def __init__(*, 371 mime_types: list[str], 372 mime_type_meta_field: Optional[str] = None, 373 file_path_meta_field: Optional[str] = None, 374 additional_mimetypes: Optional[dict[str, str]] = None) -> None 375 ``` 376 377 Initialize the DocumentTypeRouter component. 378 379 **Arguments**: 380 381 - `mime_types`: A list of MIME types or regex patterns to classify the input documents. 382 (for example: `["text/plain", "audio/x-wav", "image/jpeg"]`). 383 - `mime_type_meta_field`: Optional name of the metadata field that holds the MIME type. 384 - `file_path_meta_field`: Optional name of the metadata field that holds the file path. Used to infer the MIME type if 385 `mime_type_meta_field` is not provided or missing in a document. 386 - `additional_mimetypes`: Optional dictionary mapping MIME types to file extensions to enhance or override the standard 387 `mimetypes` module. Useful when working with uncommon or custom file types. 388 For example: `{"application/vnd.custom-type": ".custom"}`. 389 390 **Raises**: 391 392 - `ValueError`: If `mime_types` is empty or if both `mime_type_meta_field` and `file_path_meta_field` are 393 not provided. 394 395 <a id="document_type_router.DocumentTypeRouter.run"></a> 396 397 #### DocumentTypeRouter.run 398 399 ```python 400 def run(documents: list[Document]) -> dict[str, list[Document]] 401 ``` 402 403 Categorize input documents into groups based on their MIME type. 404 405 MIME types can either be directly available in document metadata or derived from file paths using the 406 standard Python `mimetypes` module and custom mappings. 407 408 **Arguments**: 409 410 - `documents`: A list of documents to be categorized. 411 412 **Returns**: 413 414 A dictionary where the keys are MIME types (or `"unclassified"`) and the values are lists of documents. 415 416 <a id="file_type_router"></a> 417 418 # Module file\_type\_router 419 420 <a id="file_type_router.FileTypeRouter"></a> 421 422 ## FileTypeRouter 423 424 Categorizes files or byte streams by their MIME types, helping in context-based routing. 425 426 FileTypeRouter supports both exact MIME type matching and regex patterns. 427 428 For file paths, MIME types come from extensions, while byte streams use metadata. 429 You can use regex patterns in the `mime_types` parameter to set broad categories 430 (such as 'audio/*' or 'text/*') or specific types. 431 MIME types without regex patterns are treated as exact matches. 432 433 ### Usage example 434 435 ```python 436 from haystack.components.routers import FileTypeRouter 437 from pathlib import Path 438 439 # For exact MIME type matching 440 router = FileTypeRouter(mime_types=["text/plain", "application/pdf"]) 441 442 # For flexible matching using regex, to handle all audio types 443 router_with_regex = FileTypeRouter(mime_types=[r"audio/.*", r"text/plain"]) 444 445 sources = [Path("file.txt"), Path("document.pdf"), Path("song.mp3")] 446 print(router.run(sources=sources)) 447 print(router_with_regex.run(sources=sources)) 448 449 # Expected output: 450 # {'text/plain': [ 451 # PosixPath('file.txt')], 'application/pdf': [PosixPath('document.pdf')], 'unclassified': [PosixPath('song.mp3') 452 # ]} 453 # {'audio/.*': [ 454 # PosixPath('song.mp3')], 'text/plain': [PosixPath('file.txt')], 'unclassified': [PosixPath('document.pdf') 455 # ]} 456 ``` 457 458 <a id="file_type_router.FileTypeRouter.__init__"></a> 459 460 #### FileTypeRouter.\_\_init\_\_ 461 462 ```python 463 def __init__(mime_types: list[str], 464 additional_mimetypes: Optional[dict[str, str]] = None, 465 raise_on_failure: bool = False) 466 ``` 467 468 Initialize the FileTypeRouter component. 469 470 **Arguments**: 471 472 - `mime_types`: A list of MIME types or regex patterns to classify the input files or byte streams. 473 (for example: `["text/plain", "audio/x-wav", "image/jpeg"]`). 474 - `additional_mimetypes`: A dictionary containing the MIME type to add to the mimetypes package to prevent unsupported or non-native 475 packages from being unclassified. 476 (for example: `{"application/vnd.openxmlformats-officedocument.wordprocessingml.document": ".docx"}`). 477 - `raise_on_failure`: If True, raises FileNotFoundError when a file path doesn't exist. 478 If False (default), only emits a warning when a file path doesn't exist. 479 480 <a id="file_type_router.FileTypeRouter.to_dict"></a> 481 482 #### FileTypeRouter.to\_dict 483 484 ```python 485 def to_dict() -> dict[str, Any] 486 ``` 487 488 Serializes the component to a dictionary. 489 490 **Returns**: 491 492 Dictionary with serialized data. 493 494 <a id="file_type_router.FileTypeRouter.from_dict"></a> 495 496 #### FileTypeRouter.from\_dict 497 498 ```python 499 @classmethod 500 def from_dict(cls, data: dict[str, Any]) -> "FileTypeRouter" 501 ``` 502 503 Deserializes the component from a dictionary. 504 505 **Arguments**: 506 507 - `data`: The dictionary to deserialize from. 508 509 **Returns**: 510 511 The deserialized component. 512 513 <a id="file_type_router.FileTypeRouter.run"></a> 514 515 #### FileTypeRouter.run 516 517 ```python 518 def run( 519 sources: list[Union[str, Path, ByteStream]], 520 meta: Optional[Union[dict[str, Any], list[dict[str, Any]]]] = None 521 ) -> dict[str, list[Union[ByteStream, Path]]] 522 ``` 523 524 Categorize files or byte streams according to their MIME types. 525 526 **Arguments**: 527 528 - `sources`: A list of file paths or byte streams to categorize. 529 - `meta`: Optional metadata to attach to the sources. 530 When provided, the sources are internally converted to ByteStream objects and the metadata is added. 531 This value can be a list of dictionaries or a single dictionary. 532 If it's a single dictionary, its content is added to the metadata of all ByteStream objects. 533 If it's a list, its length must match the number of sources, as they are zipped together. 534 535 **Returns**: 536 537 A dictionary where the keys are MIME types and the values are lists of data sources. 538 Two extra keys may be returned: `"unclassified"` when a source's MIME type doesn't match any pattern 539 and `"failed"` when a source cannot be processed (for example, a file path that doesn't exist). 540 541 <a id="llm_messages_router"></a> 542 543 # Module llm\_messages\_router 544 545 <a id="llm_messages_router.LLMMessagesRouter"></a> 546 547 ## LLMMessagesRouter 548 549 Routes Chat Messages to different connections using a generative Language Model to perform classification. 550 551 This component can be used with general-purpose LLMs and with specialized LLMs for moderation like Llama Guard. 552 553 ### Usage example 554 ```python 555 from haystack.components.generators.chat import HuggingFaceAPIChatGenerator 556 from haystack.components.routers.llm_messages_router import LLMMessagesRouter 557 from haystack.dataclasses import ChatMessage 558 559 # initialize a Chat Generator with a generative model for moderation 560 chat_generator = HuggingFaceAPIChatGenerator( 561 api_type="serverless_inference_api", 562 api_params={"model": "meta-llama/Llama-Guard-4-12B", "provider": "groq"}, 563 ) 564 565 router = LLMMessagesRouter(chat_generator=chat_generator, 566 output_names=["unsafe", "safe"], 567 output_patterns=["unsafe", "safe"]) 568 569 570 print(router.run([ChatMessage.from_user("How to rob a bank?")])) 571 572 # { 573 # 'chat_generator_text': 'unsafe 574 S2', 575 # 'unsafe': [ 576 # ChatMessage( 577 # _role=<ChatRole.USER: 'user'>, 578 # _content=[TextContent(text='How to rob a bank?')], 579 # _name=None, 580 # _meta={} 581 # ) 582 # ] 583 # } 584 ``` 585 586 <a id="llm_messages_router.LLMMessagesRouter.__init__"></a> 587 588 #### LLMMessagesRouter.\_\_init\_\_ 589 590 ```python 591 def __init__(chat_generator: ChatGenerator, 592 output_names: list[str], 593 output_patterns: list[str], 594 system_prompt: Optional[str] = None) 595 ``` 596 597 Initialize the LLMMessagesRouter component. 598 599 **Arguments**: 600 601 - `chat_generator`: A ChatGenerator instance which represents the LLM. 602 - `output_names`: A list of output connection names. These can be used to connect the router to other 603 components. 604 - `output_patterns`: A list of regular expressions to be matched against the output of the LLM. Each pattern 605 corresponds to an output name. Patterns are evaluated in order. 606 When using moderation models, refer to the model card to understand the expected outputs. 607 - `system_prompt`: An optional system prompt to customize the behavior of the LLM. 608 For moderation models, refer to the model card for supported customization options. 609 610 **Raises**: 611 612 - `ValueError`: If output_names and output_patterns are not non-empty lists of the same length. 613 614 <a id="llm_messages_router.LLMMessagesRouter.warm_up"></a> 615 616 #### LLMMessagesRouter.warm\_up 617 618 ```python 619 def warm_up() 620 ``` 621 622 Warm up the underlying LLM. 623 624 <a id="llm_messages_router.LLMMessagesRouter.run"></a> 625 626 #### LLMMessagesRouter.run 627 628 ```python 629 def run(messages: list[ChatMessage] 630 ) -> dict[str, Union[str, list[ChatMessage]]] 631 ``` 632 633 Classify the messages based on LLM output and route them to the appropriate output connection. 634 635 **Arguments**: 636 637 - `messages`: A list of ChatMessages to be routed. Only user and assistant messages are supported. 638 639 **Raises**: 640 641 - `ValueError`: If messages is an empty list or contains messages with unsupported roles. 642 - `RuntimeError`: If the component is not warmed up and the ChatGenerator has a warm_up method. 643 644 **Returns**: 645 646 A dictionary with the following keys: 647 - "chat_generator_text": The text output of the LLM, useful for debugging. 648 - "output_names": Each contains the list of messages that matched the corresponding pattern. 649 - "unmatched": The messages that did not match any of the output patterns. 650 651 <a id="llm_messages_router.LLMMessagesRouter.to_dict"></a> 652 653 #### LLMMessagesRouter.to\_dict 654 655 ```python 656 def to_dict() -> dict[str, Any] 657 ``` 658 659 Serialize this component to a dictionary. 660 661 **Returns**: 662 663 The serialized component as a dictionary. 664 665 <a id="llm_messages_router.LLMMessagesRouter.from_dict"></a> 666 667 #### LLMMessagesRouter.from\_dict 668 669 ```python 670 @classmethod 671 def from_dict(cls, data: dict[str, Any]) -> "LLMMessagesRouter" 672 ``` 673 674 Deserialize this component from a dictionary. 675 676 **Arguments**: 677 678 - `data`: The dictionary representation of this component. 679 680 **Returns**: 681 682 The deserialized component instance. 683 684 <a id="metadata_router"></a> 685 686 # Module metadata\_router 687 688 <a id="metadata_router.MetadataRouter"></a> 689 690 ## MetadataRouter 691 692 Routes documents or byte streams to different connections based on their metadata fields. 693 694 Specify the routing rules in the `init` method. 695 If a document or byte stream does not match any of the rules, it's routed to a connection named "unmatched". 696 697 698 ### Usage examples 699 700 **Routing Documents by metadata:** 701 ```python 702 from haystack import Document 703 from haystack.components.routers import MetadataRouter 704 705 docs = [Document(content="Paris is the capital of France.", meta={"language": "en"}), 706 Document(content="Berlin ist die Haupststadt von Deutschland.", meta={"language": "de"})] 707 708 router = MetadataRouter(rules={"en": {"field": "meta.language", "operator": "==", "value": "en"}}) 709 710 print(router.run(documents=docs)) 711 # {'en': [Document(id=..., content: 'Paris is the capital of France.', meta: {'language': 'en'})], 712 # 'unmatched': [Document(id=..., content: 'Berlin ist die Haupststadt von Deutschland.', meta: {'language': 'de'})]} 713 ``` 714 715 **Routing ByteStreams by metadata:** 716 ```python 717 from haystack.dataclasses import ByteStream 718 from haystack.components.routers import MetadataRouter 719 720 streams = [ 721 ByteStream.from_string("Hello world", meta={"language": "en"}), 722 ByteStream.from_string("Bonjour le monde", meta={"language": "fr"}) 723 ] 724 725 router = MetadataRouter( 726 rules={"english": {"field": "meta.language", "operator": "==", "value": "en"}}, 727 output_type=list[ByteStream] 728 ) 729 730 result = router.run(documents=streams) 731 # {'english': [ByteStream(...)], 'unmatched': [ByteStream(...)]} 732 ``` 733 734 <a id="metadata_router.MetadataRouter.__init__"></a> 735 736 #### MetadataRouter.\_\_init\_\_ 737 738 ```python 739 def __init__(rules: dict[str, dict], 740 output_type: type = list[Document]) -> None 741 ``` 742 743 Initializes the MetadataRouter component. 744 745 **Arguments**: 746 747 - `rules`: A dictionary defining how to route documents or byte streams to output connections based on their 748 metadata. Keys are output connection names, and values are dictionaries of 749 [filtering expressions](https://docs.haystack.deepset.ai/docs/metadata-filtering) in Haystack. 750 For example: 751 ```python 752 { 753 "edge_1": { 754 "operator": "AND", 755 "conditions": [ 756 {"field": "meta.created_at", "operator": ">=", "value": "2023-01-01"}, 757 {"field": "meta.created_at", "operator": "<", "value": "2023-04-01"}, 758 ], 759 }, 760 "edge_2": { 761 "operator": "AND", 762 "conditions": [ 763 {"field": "meta.created_at", "operator": ">=", "value": "2023-04-01"}, 764 {"field": "meta.created_at", "operator": "<", "value": "2023-07-01"}, 765 ], 766 }, 767 "edge_3": { 768 "operator": "AND", 769 "conditions": [ 770 {"field": "meta.created_at", "operator": ">=", "value": "2023-07-01"}, 771 {"field": "meta.created_at", "operator": "<", "value": "2023-10-01"}, 772 ], 773 }, 774 "edge_4": { 775 "operator": "AND", 776 "conditions": [ 777 {"field": "meta.created_at", "operator": ">=", "value": "2023-10-01"}, 778 {"field": "meta.created_at", "operator": "<", "value": "2024-01-01"}, 779 ], 780 }, 781 } 782 ``` 783 :param output_type: The type of the output produced. Lists of Documents or ByteStreams can be specified. 784 785 <a id="metadata_router.MetadataRouter.run"></a> 786 787 #### MetadataRouter.run 788 789 ```python 790 def run(documents: Union[list[Document], list[ByteStream]]) 791 ``` 792 793 Routes documents or byte streams to different connections based on their metadata fields. 794 795 If a document or byte stream does not match any of the rules, it's routed to a connection named "unmatched". 796 797 **Arguments**: 798 799 - `documents`: A list of `Document` or `ByteStream` objects to be routed based on their metadata. 800 801 **Returns**: 802 803 A dictionary where the keys are the names of the output connections (including `"unmatched"`) 804 and the values are lists of `Document` or `ByteStream` objects that matched the corresponding rules. 805 806 <a id="metadata_router.MetadataRouter.to_dict"></a> 807 808 #### MetadataRouter.to\_dict 809 810 ```python 811 def to_dict() -> dict[str, Any] 812 ``` 813 814 Serialize this component to a dictionary. 815 816 **Returns**: 817 818 The serialized component as a dictionary. 819 820 <a id="metadata_router.MetadataRouter.from_dict"></a> 821 822 #### MetadataRouter.from\_dict 823 824 ```python 825 @classmethod 826 def from_dict(cls, data: dict[str, Any]) -> "MetadataRouter" 827 ``` 828 829 Deserialize this component from a dictionary. 830 831 **Arguments**: 832 833 - `data`: The dictionary representation of this component. 834 835 **Returns**: 836 837 The deserialized component instance. 838 839 <a id="text_language_router"></a> 840 841 # Module text\_language\_router 842 843 <a id="text_language_router.TextLanguageRouter"></a> 844 845 ## TextLanguageRouter 846 847 Routes text strings to different output connections based on their language. 848 849 Provide a list of languages during initialization. If the document's text doesn't match any of the 850 specified languages, the metadata value is set to "unmatched". 851 For routing documents based on their language, use the DocumentLanguageClassifier component, 852 followed by the MetaDataRouter. 853 854 ### Usage example 855 856 ```python 857 from haystack import Pipeline, Document 858 from haystack.components.routers import TextLanguageRouter 859 from haystack.document_stores.in_memory import InMemoryDocumentStore 860 from haystack.components.retrievers.in_memory import InMemoryBM25Retriever 861 862 document_store = InMemoryDocumentStore() 863 document_store.write_documents([Document(content="Elvis Presley was an American singer and actor.")]) 864 865 p = Pipeline() 866 p.add_component(instance=TextLanguageRouter(languages=["en"]), name="text_language_router") 867 p.add_component(instance=InMemoryBM25Retriever(document_store=document_store), name="retriever") 868 p.connect("text_language_router.en", "retriever.query") 869 870 result = p.run({"text_language_router": {"text": "Who was Elvis Presley?"}}) 871 assert result["retriever"]["documents"][0].content == "Elvis Presley was an American singer and actor." 872 873 result = p.run({"text_language_router": {"text": "ένα ελληνικό κείμενο"}}) 874 assert result["text_language_router"]["unmatched"] == "ένα ελληνικό κείμενο" 875 ``` 876 877 <a id="text_language_router.TextLanguageRouter.__init__"></a> 878 879 #### TextLanguageRouter.\_\_init\_\_ 880 881 ```python 882 def __init__(languages: Optional[list[str]] = None) 883 ``` 884 885 Initialize the TextLanguageRouter component. 886 887 **Arguments**: 888 889 - `languages`: A list of ISO language codes. 890 See the supported languages in [`langdetect` documentation](https://github.com/Mimino666/langdetect#languages). 891 If not specified, defaults to ["en"]. 892 893 <a id="text_language_router.TextLanguageRouter.run"></a> 894 895 #### TextLanguageRouter.run 896 897 ```python 898 def run(text: str) -> dict[str, str] 899 ``` 900 901 Routes the text strings to different output connections based on their language. 902 903 If the document's text doesn't match any of the specified languages, the metadata value is set to "unmatched". 904 905 **Arguments**: 906 907 - `text`: A text string to route. 908 909 **Raises**: 910 911 - `TypeError`: If the input is not a string. 912 913 **Returns**: 914 915 A dictionary in which the key is the language (or `"unmatched"`), 916 and the value is the text. 917 918 <a id="transformers_text_router"></a> 919 920 # Module transformers\_text\_router 921 922 <a id="transformers_text_router.TransformersTextRouter"></a> 923 924 ## TransformersTextRouter 925 926 Routes the text strings to different connections based on a category label. 927 928 The labels are specific to each model and can be found it its description on Hugging Face. 929 930 ### Usage example 931 932 ```python 933 from haystack.core.pipeline import Pipeline 934 from haystack.components.routers import TransformersTextRouter 935 from haystack.components.builders import PromptBuilder 936 from haystack.components.generators import HuggingFaceLocalGenerator 937 938 p = Pipeline() 939 p.add_component( 940 instance=TransformersTextRouter(model="papluca/xlm-roberta-base-language-detection"), 941 name="text_router" 942 ) 943 p.add_component( 944 instance=PromptBuilder(template="Answer the question: {{query}}\nAnswer:"), 945 name="english_prompt_builder" 946 ) 947 p.add_component( 948 instance=PromptBuilder(template="Beantworte die Frage: {{query}}\nAntwort:"), 949 name="german_prompt_builder" 950 ) 951 952 p.add_component( 953 instance=HuggingFaceLocalGenerator(model="DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1"), 954 name="german_llm" 955 ) 956 p.add_component( 957 instance=HuggingFaceLocalGenerator(model="microsoft/Phi-3-mini-4k-instruct"), 958 name="english_llm" 959 ) 960 961 p.connect("text_router.en", "english_prompt_builder.query") 962 p.connect("text_router.de", "german_prompt_builder.query") 963 p.connect("english_prompt_builder.prompt", "english_llm.prompt") 964 p.connect("german_prompt_builder.prompt", "german_llm.prompt") 965 966 # English Example 967 print(p.run({"text_router": {"text": "What is the capital of Germany?"}})) 968 969 # German Example 970 print(p.run({"text_router": {"text": "Was ist die Hauptstadt von Deutschland?"}})) 971 ``` 972 973 <a id="transformers_text_router.TransformersTextRouter.__init__"></a> 974 975 #### TransformersTextRouter.\_\_init\_\_ 976 977 ```python 978 def __init__(model: str, 979 labels: Optional[list[str]] = None, 980 device: Optional[ComponentDevice] = None, 981 token: Optional[Secret] = Secret.from_env_var( 982 ["HF_API_TOKEN", "HF_TOKEN"], strict=False), 983 huggingface_pipeline_kwargs: Optional[dict[str, Any]] = None) 984 ``` 985 986 Initializes the TransformersTextRouter component. 987 988 **Arguments**: 989 990 - `model`: The name or path of a Hugging Face model for text classification. 991 - `labels`: The list of labels. If not provided, the component fetches the labels 992 from the model configuration file hosted on the Hugging Face Hub using 993 `transformers.AutoConfig.from_pretrained`. 994 - `device`: The device for loading the model. If `None`, automatically selects the default device. 995 If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter. 996 - `token`: The API token used to download private models from Hugging Face. 997 If `True`, uses either `HF_API_TOKEN` or `HF_TOKEN` environment variables. 998 To generate these tokens, run `transformers-cli login`. 999 - `huggingface_pipeline_kwargs`: A dictionary of keyword arguments for initializing the Hugging Face 1000 text classification pipeline. 1001 1002 <a id="transformers_text_router.TransformersTextRouter.warm_up"></a> 1003 1004 #### TransformersTextRouter.warm\_up 1005 1006 ```python 1007 def warm_up() 1008 ``` 1009 1010 Initializes the component. 1011 1012 <a id="transformers_text_router.TransformersTextRouter.to_dict"></a> 1013 1014 #### TransformersTextRouter.to\_dict 1015 1016 ```python 1017 def to_dict() -> dict[str, Any] 1018 ``` 1019 1020 Serializes the component to a dictionary. 1021 1022 **Returns**: 1023 1024 Dictionary with serialized data. 1025 1026 <a id="transformers_text_router.TransformersTextRouter.from_dict"></a> 1027 1028 #### TransformersTextRouter.from\_dict 1029 1030 ```python 1031 @classmethod 1032 def from_dict(cls, data: dict[str, Any]) -> "TransformersTextRouter" 1033 ``` 1034 1035 Deserializes the component from a dictionary. 1036 1037 **Arguments**: 1038 1039 - `data`: Dictionary to deserialize from. 1040 1041 **Returns**: 1042 1043 Deserialized component. 1044 1045 <a id="transformers_text_router.TransformersTextRouter.run"></a> 1046 1047 #### TransformersTextRouter.run 1048 1049 ```python 1050 def run(text: str) -> dict[str, str] 1051 ``` 1052 1053 Routes the text strings to different connections based on a category label. 1054 1055 **Arguments**: 1056 1057 - `text`: A string of text to route. 1058 1059 **Raises**: 1060 1061 - `TypeError`: If the input is not a str. 1062 - `RuntimeError`: If the pipeline has not been loaded because warm_up() was not called before. 1063 1064 **Returns**: 1065 1066 A dictionary with the label as key and the text as value. 1067 1068 <a id="zero_shot_text_router"></a> 1069 1070 # Module zero\_shot\_text\_router 1071 1072 <a id="zero_shot_text_router.TransformersZeroShotTextRouter"></a> 1073 1074 ## TransformersZeroShotTextRouter 1075 1076 Routes the text strings to different connections based on a category label. 1077 1078 Specify the set of labels for categorization when initializing the component. 1079 1080 ### Usage example 1081 1082 ```python 1083 from haystack import Document 1084 from haystack.document_stores.in_memory import InMemoryDocumentStore 1085 from haystack.core.pipeline import Pipeline 1086 from haystack.components.routers import TransformersZeroShotTextRouter 1087 from haystack.components.embedders import SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder 1088 from haystack.components.retrievers import InMemoryEmbeddingRetriever 1089 1090 document_store = InMemoryDocumentStore() 1091 doc_embedder = SentenceTransformersDocumentEmbedder(model="intfloat/e5-base-v2") 1092 doc_embedder.warm_up() 1093 docs = [ 1094 Document( 1095 content="Germany, officially the Federal Republic of Germany, is a country in the western region of " 1096 "Central Europe. The nation's capital and most populous city is Berlin and its main financial centre " 1097 "is Frankfurt; the largest urban area is the Ruhr." 1098 ), 1099 Document( 1100 content="France, officially the French Republic, is a country located primarily in Western Europe. " 1101 "France is a unitary semi-presidential republic with its capital in Paris, the country's largest city " 1102 "and main cultural and commercial centre; other major urban areas include Marseille, Lyon, Toulouse, " 1103 "Lille, Bordeaux, Strasbourg, Nantes and Nice." 1104 ) 1105 ] 1106 docs_with_embeddings = doc_embedder.run(docs) 1107 document_store.write_documents(docs_with_embeddings["documents"]) 1108 1109 p = Pipeline() 1110 p.add_component(instance=TransformersZeroShotTextRouter(labels=["passage", "query"]), name="text_router") 1111 p.add_component( 1112 instance=SentenceTransformersTextEmbedder(model="intfloat/e5-base-v2", prefix="passage: "), 1113 name="passage_embedder" 1114 ) 1115 p.add_component( 1116 instance=SentenceTransformersTextEmbedder(model="intfloat/e5-base-v2", prefix="query: "), 1117 name="query_embedder" 1118 ) 1119 p.add_component( 1120 instance=InMemoryEmbeddingRetriever(document_store=document_store), 1121 name="query_retriever" 1122 ) 1123 p.add_component( 1124 instance=InMemoryEmbeddingRetriever(document_store=document_store), 1125 name="passage_retriever" 1126 ) 1127 1128 p.connect("text_router.passage", "passage_embedder.text") 1129 p.connect("passage_embedder.embedding", "passage_retriever.query_embedding") 1130 p.connect("text_router.query", "query_embedder.text") 1131 p.connect("query_embedder.embedding", "query_retriever.query_embedding") 1132 1133 # Query Example 1134 p.run({"text_router": {"text": "What is the capital of Germany?"}}) 1135 1136 # Passage Example 1137 p.run({ 1138 "text_router":{ 1139 "text": "The United Kingdom of Great Britain and Northern Ireland, commonly known as the " "United Kingdom (UK) or Britain, is a country in Northwestern Europe, off the north-western coast of " "the continental mainland." 1140 } 1141 }) 1142 ``` 1143 1144 <a id="zero_shot_text_router.TransformersZeroShotTextRouter.__init__"></a> 1145 1146 #### TransformersZeroShotTextRouter.\_\_init\_\_ 1147 1148 ```python 1149 def __init__(labels: list[str], 1150 multi_label: bool = False, 1151 model: str = "MoritzLaurer/deberta-v3-base-zeroshot-v1.1-all-33", 1152 device: Optional[ComponentDevice] = None, 1153 token: Optional[Secret] = Secret.from_env_var( 1154 ["HF_API_TOKEN", "HF_TOKEN"], strict=False), 1155 huggingface_pipeline_kwargs: Optional[dict[str, Any]] = None) 1156 ``` 1157 1158 Initializes the TransformersZeroShotTextRouter component. 1159 1160 **Arguments**: 1161 1162 - `labels`: The set of labels to use for classification. Can be a single label, 1163 a string of comma-separated labels, or a list of labels. 1164 - `multi_label`: Indicates if multiple labels can be true. 1165 If `False`, label scores are normalized so their sum equals 1 for each sequence. 1166 If `True`, the labels are considered independent and probabilities are normalized for each candidate by 1167 doing a softmax of the entailment score vs. the contradiction score. 1168 - `model`: The name or path of a Hugging Face model for zero-shot text classification. 1169 - `device`: The device for loading the model. If `None`, automatically selects the default device. 1170 If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter. 1171 - `token`: The API token used to download private models from Hugging Face. 1172 If `True`, uses either `HF_API_TOKEN` or `HF_TOKEN` environment variables. 1173 To generate these tokens, run `transformers-cli login`. 1174 - `huggingface_pipeline_kwargs`: A dictionary of keyword arguments for initializing the Hugging Face 1175 zero shot text classification. 1176 1177 <a id="zero_shot_text_router.TransformersZeroShotTextRouter.warm_up"></a> 1178 1179 #### TransformersZeroShotTextRouter.warm\_up 1180 1181 ```python 1182 def warm_up() 1183 ``` 1184 1185 Initializes the component. 1186 1187 <a id="zero_shot_text_router.TransformersZeroShotTextRouter.to_dict"></a> 1188 1189 #### TransformersZeroShotTextRouter.to\_dict 1190 1191 ```python 1192 def to_dict() -> dict[str, Any] 1193 ``` 1194 1195 Serializes the component to a dictionary. 1196 1197 **Returns**: 1198 1199 Dictionary with serialized data. 1200 1201 <a id="zero_shot_text_router.TransformersZeroShotTextRouter.from_dict"></a> 1202 1203 #### TransformersZeroShotTextRouter.from\_dict 1204 1205 ```python 1206 @classmethod 1207 def from_dict(cls, data: dict[str, Any]) -> "TransformersZeroShotTextRouter" 1208 ``` 1209 1210 Deserializes the component from a dictionary. 1211 1212 **Arguments**: 1213 1214 - `data`: Dictionary to deserialize from. 1215 1216 **Returns**: 1217 1218 Deserialized component. 1219 1220 <a id="zero_shot_text_router.TransformersZeroShotTextRouter.run"></a> 1221 1222 #### TransformersZeroShotTextRouter.run 1223 1224 ```python 1225 def run(text: str) -> dict[str, str] 1226 ``` 1227 1228 Routes the text strings to different connections based on a category label. 1229 1230 **Arguments**: 1231 1232 - `text`: A string of text to route. 1233 1234 **Raises**: 1235 1236 - `TypeError`: If the input is not a str. 1237 - `RuntimeError`: If the pipeline has not been loaded because warm_up() was not called before. 1238 1239 **Returns**: 1240 1241 A dictionary with the label as key and the text as value.