routers_api.md
1 --- 2 title: "Routers" 3 id: routers-api 4 description: "Routers is a group of components that route queries or Documents to other components that can handle them best." 5 slug: "/routers-api" 6 --- 7 8 9 ## conditional_router 10 11 ### NoRouteSelectedException 12 13 Bases: <code>Exception</code> 14 15 Exception raised when no route is selected in ConditionalRouter. 16 17 ### RouteConditionException 18 19 Bases: <code>Exception</code> 20 21 Exception raised when there is an error parsing or evaluating the condition expression in ConditionalRouter. 22 23 ### ConditionalRouter 24 25 Routes data based on specific conditions. 26 27 You define these conditions in a list of dictionaries called `routes`. 28 Each dictionary in this list represents a single route. Each route has these four elements: 29 30 - `condition`: A Jinja2 string expression that determines if the route is selected. 31 - `output`: A Jinja2 expression defining the route's output value. 32 - `output_type`: The type of the output data (for example, `str`, `list[int]`). 33 - `output_name`: The name you want to use to publish `output`. This name is used to connect 34 the router to other components in the pipeline. 35 36 ### Usage example 37 38 ```python 39 from haystack.components.routers import ConditionalRouter 40 41 routes = [ 42 { 43 "condition": "{{streams|length > 2}}", 44 "output": "{{streams}}", 45 "output_name": "enough_streams", 46 "output_type": list[int], 47 }, 48 { 49 "condition": "{{streams|length <= 2}}", 50 "output": "{{streams}}", 51 "output_name": "insufficient_streams", 52 "output_type": list[int], 53 }, 54 ] 55 router = ConditionalRouter(routes) 56 # When 'streams' has more than 2 items, 'enough_streams' output will activate, emitting the list [1, 2, 3] 57 kwargs = {"streams": [1, 2, 3], "query": "Haystack"} 58 result = router.run(**kwargs) 59 assert result == {"enough_streams": [1, 2, 3]} 60 ``` 61 62 In this example, we configure two routes. The first route sends the 'streams' value to 'enough_streams' if the 63 stream count exceeds two. The second route directs 'streams' to 'insufficient_streams' if there 64 are two or fewer streams. 65 66 In the pipeline setup, the Router connects to other components using the output names. For example, 67 'enough_streams' might connect to a component that processes streams, while 68 'insufficient_streams' might connect to a component that fetches more streams. 69 70 Here is a pipeline that uses `ConditionalRouter` and routes the fetched `ByteStreams` to 71 different components depending on the number of streams fetched: 72 73 ```python 74 from haystack import Pipeline 75 from haystack.dataclasses import ByteStream 76 from haystack.components.routers import ConditionalRouter 77 78 routes = [ 79 {"condition": "{{count > 5}}", 80 "output": "Processing many items", 81 "output_name": "many_items", 82 "output_type": str, 83 }, 84 {"condition": "{{count <= 5}}", 85 "output": "Processing few items", 86 "output_name": "few_items", 87 "output_type": str, 88 }, 89 ] 90 91 pipe = Pipeline() 92 pipe.add_component("router", ConditionalRouter(routes)) 93 94 # Run with count > 5 95 result = pipe.run({"router": {"count": 10}}) 96 print(result) 97 # >> {'router': {'many_items': 'Processing many items'}} 98 99 # Run with count <= 5 100 result = pipe.run({"router": {"count": 3}}) 101 print(result) 102 # >> {'router': {'few_items': 'Processing few items'}} 103 ``` 104 105 #### __init__ 106 107 ```python 108 __init__( 109 routes: list[Route], 110 custom_filters: dict[str, Callable] | None = None, 111 unsafe: bool = False, 112 validate_output_type: bool = False, 113 optional_variables: list[str] | None = None, 114 ) 115 ``` 116 117 Initializes the `ConditionalRouter` with a list of routes detailing the conditions for routing. 118 119 **Parameters:** 120 121 - **routes** (<code>list\[Route\]</code>) – A list of dictionaries, each defining a route. 122 Each route has these four elements: 123 - `condition`: A Jinja2 string expression that determines if the route is selected. 124 - `output`: A Jinja2 expression defining the route's output value. 125 - `output_type`: The type of the output data (for example, `str`, `list[int]`). 126 - `output_name`: The name you want to use to publish `output`. This name is used to connect 127 the router to other components in the pipeline. 128 - **custom_filters** (<code>dict\[str, Callable\] | None</code>) – A dictionary of custom Jinja2 filters used in the condition expressions. 129 For example, passing `{"my_filter": my_filter_fcn}` where: 130 - `my_filter` is the name of the custom filter. 131 - `my_filter_fcn` is a callable that takes `my_var:str` and returns `my_var[:3]`. 132 `{{ my_var|my_filter }}` can then be used inside a route condition expression: 133 `"condition": "{{ my_var|my_filter == 'foo' }}"`. 134 - **unsafe** (<code>bool</code>) – Enable execution of arbitrary code in the Jinja template. 135 This should only be used if you trust the source of the template as it can be lead to remote code execution. 136 - **validate_output_type** (<code>bool</code>) – Enable validation of routes' output. 137 If a route output doesn't match the declared type a ValueError is raised running. 138 - **optional_variables** (<code>list\[str\] | None</code>) – A list of variable names that are optional in your route conditions and outputs. 139 If these variables are not provided at runtime, they will be set to `None`. 140 This allows you to write routes that can handle missing inputs gracefully without raising errors. 141 142 Example usage with a default fallback route in a Pipeline: 143 144 ```python 145 from haystack import Pipeline 146 from haystack.components.routers import ConditionalRouter 147 148 routes = [ 149 { 150 "condition": '{{ path == "rag" }}', 151 "output": "{{ question }}", 152 "output_name": "rag_route", 153 "output_type": str 154 }, 155 { 156 "condition": "{{ True }}", # fallback route 157 "output": "{{ question }}", 158 "output_name": "default_route", 159 "output_type": str 160 } 161 ] 162 163 router = ConditionalRouter(routes, optional_variables=["path"]) 164 pipe = Pipeline() 165 pipe.add_component("router", router) 166 167 # When 'path' is provided in the pipeline: 168 result = pipe.run(data={"router": {"question": "What?", "path": "rag"}}) 169 assert result["router"] == {"rag_route": "What?"} 170 171 # When 'path' is not provided, fallback route is taken: 172 result = pipe.run(data={"router": {"question": "What?"}}) 173 assert result["router"] == {"default_route": "What?"} 174 ``` 175 176 This pattern is particularly useful when: 177 178 - You want to provide default/fallback behavior when certain inputs are missing 179 - Some variables are only needed for specific routing conditions 180 - You're building flexible pipelines where not all inputs are guaranteed to be present 181 182 #### to_dict 183 184 ```python 185 to_dict() -> dict[str, Any] 186 ``` 187 188 Serializes the component to a dictionary. 189 190 **Returns:** 191 192 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 193 194 #### from_dict 195 196 ```python 197 from_dict(data: dict[str, Any]) -> ConditionalRouter 198 ``` 199 200 Deserializes the component from a dictionary. 201 202 **Parameters:** 203 204 - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from. 205 206 **Returns:** 207 208 - <code>ConditionalRouter</code> – The deserialized component. 209 210 #### run 211 212 ```python 213 run(**kwargs) 214 ``` 215 216 Executes the routing logic. 217 218 Executes the routing logic by evaluating the specified boolean condition expressions for each route in the 219 order they are listed. The method directs the flow of data to the output specified in the first route whose 220 `condition` is True. 221 222 **Parameters:** 223 224 - **kwargs** – All variables used in the `condition` expressed in the routes. When the component is used in a 225 pipeline, these variables are passed from the previous component's output. 226 227 **Returns:** 228 229 - – A dictionary where the key is the `output_name` of the selected route and the value is the `output` 230 of the selected route. 231 232 **Raises:** 233 234 - <code>NoRouteSelectedException</code> – If no `condition' in the routes is `True\`. 235 - <code>RouteConditionException</code> – If there is an error parsing or evaluating the `condition` expression in the routes. 236 - <code>ValueError</code> – If type validation is enabled and route type doesn't match actual value type. 237 238 ## document_length_router 239 240 ### DocumentLengthRouter 241 242 Categorizes documents based on the length of the `content` field and routes them to the appropriate output. 243 244 A common use case for DocumentLengthRouter is handling documents obtained from PDFs that contain non-text 245 content, such as scanned pages or images. This component can detect empty or low-content documents and route them to 246 components that perform OCR, generate captions, or compute image embeddings. 247 248 ### Usage example 249 250 ```python 251 from haystack.components.routers import DocumentLengthRouter 252 from haystack.dataclasses import Document 253 254 docs = [ 255 Document(content="Short"), 256 Document(content="Long document "*20), 257 ] 258 259 router = DocumentLengthRouter(threshold=10) 260 261 result = router.run(documents=docs) 262 print(result) 263 264 # { 265 # "short_documents": [Document(content="Short", ...)], 266 # "long_documents": [Document(content="Long document ...", ...)], 267 # } 268 ``` 269 270 #### __init__ 271 272 ```python 273 __init__(*, threshold: int = 10) -> None 274 ``` 275 276 Initialize the DocumentLengthRouter component. 277 278 **Parameters:** 279 280 - **threshold** (<code>int</code>) – The threshold for the number of characters in the document `content` field. Documents where `content` is 281 None or whose character count is less than or equal to the threshold will be routed to the `short_documents` 282 output. Otherwise, they will be routed to the `long_documents` output. 283 To route only documents with None content to `short_documents`, set the threshold to a negative number. 284 285 #### run 286 287 ```python 288 run(documents: list[Document]) -> dict[str, list[Document]] 289 ``` 290 291 Categorize input documents into groups based on the length of the `content` field. 292 293 **Parameters:** 294 295 - **documents** (<code>list\[Document\]</code>) – A list of documents to be categorized. 296 297 **Returns:** 298 299 - <code>dict\[str, list\[Document\]\]</code> – A dictionary with the following keys: 300 - `short_documents`: A list of documents where `content` is None or the length of `content` is less than or 301 equal to the threshold. 302 - `long_documents`: A list of documents where the length of `content` is greater than the threshold. 303 304 ## document_type_router 305 306 ### DocumentTypeRouter 307 308 Routes documents by their MIME types. 309 310 DocumentTypeRouter is used to dynamically route documents within a pipeline based on their MIME types. 311 It supports exact MIME type matches and regex patterns. 312 313 MIME types can be extracted directly from document metadata or inferred from file paths using standard or 314 user-supplied MIME type mappings. 315 316 ### Usage example 317 318 ```python 319 from haystack.components.routers import DocumentTypeRouter 320 from haystack.dataclasses import Document 321 322 docs = [ 323 Document(content="Example text", meta={"file_path": "example.txt"}), 324 Document(content="Another document", meta={"mime_type": "application/pdf"}), 325 Document(content="Unknown type") 326 ] 327 328 router = DocumentTypeRouter( 329 mime_type_meta_field="mime_type", 330 file_path_meta_field="file_path", 331 mime_types=["text/plain", "application/pdf"] 332 ) 333 334 result = router.run(documents=docs) 335 print(result) 336 ``` 337 338 Expected output: 339 340 ```python 341 { 342 "text/plain": [Document(...)], 343 "application/pdf": [Document(...)], 344 "unclassified": [Document(...)] 345 } 346 ``` 347 348 #### __init__ 349 350 ```python 351 __init__( 352 *, 353 mime_types: list[str], 354 mime_type_meta_field: str | None = None, 355 file_path_meta_field: str | None = None, 356 additional_mimetypes: dict[str, str] | None = None 357 ) -> None 358 ``` 359 360 Initialize the DocumentTypeRouter component. 361 362 **Parameters:** 363 364 - **mime_types** (<code>list\[str\]</code>) – A list of MIME types or regex patterns to classify the input documents. 365 (for example: `["text/plain", "audio/x-wav", "image/jpeg"]`). 366 - **mime_type_meta_field** (<code>str | None</code>) – Optional name of the metadata field that holds the MIME type. 367 - **file_path_meta_field** (<code>str | None</code>) – Optional name of the metadata field that holds the file path. Used to infer the MIME type if 368 `mime_type_meta_field` is not provided or missing in a document. 369 - **additional_mimetypes** (<code>dict\[str, str\] | None</code>) – Optional dictionary mapping MIME types to file extensions to enhance or override the standard 370 `mimetypes` module. Useful when working with uncommon or custom file types. 371 For example: `{"application/vnd.custom-type": ".custom"}`. 372 373 **Raises:** 374 375 - <code>ValueError</code> – If `mime_types` is empty or if both `mime_type_meta_field` and `file_path_meta_field` are 376 not provided. 377 378 #### run 379 380 ```python 381 run(documents: list[Document]) -> dict[str, list[Document]] 382 ``` 383 384 Categorize input documents into groups based on their MIME type. 385 386 MIME types can either be directly available in document metadata or derived from file paths using the 387 standard Python `mimetypes` module and custom mappings. 388 389 **Parameters:** 390 391 - **documents** (<code>list\[Document\]</code>) – A list of documents to be categorized. 392 393 **Returns:** 394 395 - <code>dict\[str, list\[Document\]\]</code> – A dictionary where the keys are MIME types (or `"unclassified"`) and the values are lists of documents. 396 397 ## file_type_router 398 399 ### FileTypeRouter 400 401 Categorizes files or byte streams by their MIME types, helping in context-based routing. 402 403 FileTypeRouter supports both exact MIME type matching and regex patterns. 404 405 For file paths, MIME types come from extensions, while byte streams use metadata. 406 You can use regex patterns in the `mime_types` parameter to set broad categories 407 (such as 'audio/*' or 'text/*') or specific types. 408 MIME types without regex patterns are treated as exact matches. 409 410 ### Usage example 411 412 ```python 413 from haystack.components.routers import FileTypeRouter 414 from pathlib import Path 415 416 # For exact MIME type matching 417 router = FileTypeRouter(mime_types=["text/plain", "application/pdf"]) 418 419 # For flexible matching using regex, to handle all audio types 420 router_with_regex = FileTypeRouter(mime_types=[r"audio/.*", r"text/plain"]) 421 422 sources = [Path("file.txt"), Path("document.pdf"), Path("song.mp3")] 423 print(router.run(sources=sources)) 424 print(router_with_regex.run(sources=sources)) 425 426 # Expected output: 427 # {'text/plain': [ 428 # PosixPath('file.txt')], 'application/pdf': [PosixPath('document.pdf')], 'unclassified': [PosixPath('song.mp3') 429 # ]} 430 # {'audio/.*': [ 431 # PosixPath('song.mp3')], 'text/plain': [PosixPath('file.txt')], 'unclassified': [PosixPath('document.pdf') 432 # ]} 433 ``` 434 435 #### __init__ 436 437 ```python 438 __init__( 439 mime_types: list[str], 440 additional_mimetypes: dict[str, str] | None = None, 441 raise_on_failure: bool = False, 442 ) 443 ``` 444 445 Initialize the FileTypeRouter component. 446 447 **Parameters:** 448 449 - **mime_types** (<code>list\[str\]</code>) – A list of MIME types or regex patterns to classify the input files or byte streams. 450 (for example: `["text/plain", "audio/x-wav", "image/jpeg"]`). 451 - **additional_mimetypes** (<code>dict\[str, str\] | None</code>) – A dictionary containing the MIME type to add to the mimetypes package to prevent unsupported or non-native 452 packages from being unclassified. 453 (for example: `{"application/vnd.openxmlformats-officedocument.wordprocessingml.document": ".docx"}`). 454 - **raise_on_failure** (<code>bool</code>) – If True, raises FileNotFoundError when a file path doesn't exist. 455 If False (default), only emits a warning when a file path doesn't exist. 456 457 #### to_dict 458 459 ```python 460 to_dict() -> dict[str, Any] 461 ``` 462 463 Serializes the component to a dictionary. 464 465 **Returns:** 466 467 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 468 469 #### from_dict 470 471 ```python 472 from_dict(data: dict[str, Any]) -> FileTypeRouter 473 ``` 474 475 Deserializes the component from a dictionary. 476 477 **Parameters:** 478 479 - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from. 480 481 **Returns:** 482 483 - <code>FileTypeRouter</code> – The deserialized component. 484 485 #### run 486 487 ```python 488 run( 489 sources: list[str | Path | ByteStream], 490 meta: dict[str, Any] | list[dict[str, Any]] | None = None, 491 ) -> dict[str, list[ByteStream | Path]] 492 ``` 493 494 Categorize files or byte streams according to their MIME types. 495 496 **Parameters:** 497 498 - **sources** (<code>list\[str | Path | ByteStream\]</code>) – A list of file paths or byte streams to categorize. 499 - **meta** (<code>dict\[str, Any\] | list\[dict\[str, Any\]\] | None</code>) – Optional metadata to attach to the sources. 500 When provided, the sources are internally converted to ByteStream objects and the metadata is added. 501 This value can be a list of dictionaries or a single dictionary. 502 If it's a single dictionary, its content is added to the metadata of all ByteStream objects. 503 If it's a list, its length must match the number of sources, as they are zipped together. 504 505 **Returns:** 506 507 - <code>dict\[str, list\[ByteStream | Path\]\]</code> – A dictionary where the keys are MIME types and the values are lists of data sources. 508 Two extra keys may be returned: `"unclassified"` when a source's MIME type doesn't match any pattern 509 and `"failed"` when a source cannot be processed (for example, a file path that doesn't exist). 510 511 **Raises:** 512 513 - <code>TypeError</code> – If a source is not a Path, str, or ByteStream. 514 515 ## llm_messages_router 516 517 ### LLMMessagesRouter 518 519 ```` 520 Routes Chat Messages to different connections using a generative Language Model to perform classification. 521 522 This component can be used with general-purpose LLMs and with specialized LLMs for moderation like Llama Guard. 523 524 ### Usage example 525 ```python 526 from haystack.components.generators.chat import HuggingFaceAPIChatGenerator 527 from haystack.components.routers.llm_messages_router import LLMMessagesRouter 528 from haystack.dataclasses import ChatMessage 529 530 # initialize a Chat Generator with a generative model for moderation 531 chat_generator = HuggingFaceAPIChatGenerator( 532 api_type="serverless_inference_api", 533 api_params={"model": "meta-llama/Llama-Guard-4-12B", "provider": "groq"}, 534 ) 535 536 router = LLMMessagesRouter(chat_generator=chat_generator, 537 output_names=["unsafe", "safe"], 538 output_patterns=["unsafe", "safe"]) 539 540 541 print(router.run([ChatMessage.from_user("How to rob a bank?")])) 542 543 # { 544 # 'chat_generator_text': 'unsafe 545 ```` 546 547 S2', 548 \# 'unsafe': \[ 549 \# ChatMessage( 550 \# \_role=\<ChatRole.USER: 'user'>, 551 \# \_content=[TextContent(text='How to rob a bank?')], 552 \# \_name=None, 553 \# \_meta={} 554 \# ) 555 \# \] 556 \# } 557 \`\`\` 558 559 #### __init__ 560 561 ```python 562 __init__( 563 chat_generator: ChatGenerator, 564 output_names: list[str], 565 output_patterns: list[str], 566 system_prompt: str | None = None, 567 ) 568 ``` 569 570 Initialize the LLMMessagesRouter component. 571 572 **Parameters:** 573 574 - **chat_generator** (<code>ChatGenerator</code>) – A ChatGenerator instance which represents the LLM. 575 - **output_names** (<code>list\[str\]</code>) – A list of output connection names. These can be used to connect the router to other 576 components. 577 - **output_patterns** (<code>list\[str\]</code>) – A list of regular expressions to be matched against the output of the LLM. Each pattern 578 corresponds to an output name. Patterns are evaluated in order. 579 When using moderation models, refer to the model card to understand the expected outputs. 580 - **system_prompt** (<code>str | None</code>) – An optional system prompt to customize the behavior of the LLM. 581 For moderation models, refer to the model card for supported customization options. 582 583 **Raises:** 584 585 - <code>ValueError</code> – If output_names and output_patterns are not non-empty lists of the same length. 586 587 #### warm_up 588 589 ```python 590 warm_up() 591 ``` 592 593 Warm up the underlying LLM. 594 595 #### run 596 597 ```python 598 run(messages: list[ChatMessage]) -> dict[str, str | list[ChatMessage]] 599 ``` 600 601 Classify the messages based on LLM output and route them to the appropriate output connection. 602 603 **Parameters:** 604 605 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessages to be routed. Only user and assistant messages are supported. 606 607 **Returns:** 608 609 - <code>dict\[str, str | list\[ChatMessage\]\]</code> – A dictionary with the following keys: 610 - "chat_generator_text": The text output of the LLM, useful for debugging. 611 - "output_names": Each contains the list of messages that matched the corresponding pattern. 612 - "unmatched": The messages that did not match any of the output patterns. 613 614 **Raises:** 615 616 - <code>ValueError</code> – If messages is an empty list or contains messages with unsupported roles. 617 618 #### to_dict 619 620 ```python 621 to_dict() -> dict[str, Any] 622 ``` 623 624 Serialize this component to a dictionary. 625 626 **Returns:** 627 628 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 629 630 #### from_dict 631 632 ```python 633 from_dict(data: dict[str, Any]) -> LLMMessagesRouter 634 ``` 635 636 Deserialize this component from a dictionary. 637 638 **Parameters:** 639 640 - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component. 641 642 **Returns:** 643 644 - <code>LLMMessagesRouter</code> – The deserialized component instance. 645 646 ## metadata_router 647 648 ### MetadataRouter 649 650 Routes documents or byte streams to different connections based on their metadata fields. 651 652 Specify the routing rules in the `init` method. 653 If a document or byte stream does not match any of the rules, it's routed to a connection named "unmatched". 654 655 ### Usage examples 656 657 **Routing Documents by metadata:** 658 659 ```python 660 from haystack import Document 661 from haystack.components.routers import MetadataRouter 662 663 docs = [Document(content="Paris is the capital of France.", meta={"language": "en"}), 664 Document(content="Berlin ist die Haupststadt von Deutschland.", meta={"language": "de"})] 665 666 router = MetadataRouter(rules={"en": {"field": "meta.language", "operator": "==", "value": "en"}}) 667 668 print(router.run(documents=docs)) 669 # {'en': [Document(id=..., content: 'Paris is the capital of France.', meta: {'language': 'en'})], 670 # 'unmatched': [Document(id=..., content: 'Berlin ist die Haupststadt von Deutschland.', meta: {'language': 'de'})]} 671 ``` 672 673 **Routing ByteStreams by metadata:** 674 675 ```python 676 from haystack.dataclasses import ByteStream 677 from haystack.components.routers import MetadataRouter 678 679 streams = [ 680 ByteStream.from_string("Hello world", meta={"language": "en"}), 681 ByteStream.from_string("Bonjour le monde", meta={"language": "fr"}) 682 ] 683 684 router = MetadataRouter( 685 rules={"english": {"field": "meta.language", "operator": "==", "value": "en"}}, 686 output_type=list[ByteStream] 687 ) 688 689 result = router.run(documents=streams) 690 # {'english': [ByteStream(...)], 'unmatched': [ByteStream(...)]} 691 ``` 692 693 #### __init__ 694 695 ```python 696 __init__(rules: dict[str, dict], output_type: type = list[Document]) -> None 697 ``` 698 699 Initializes the MetadataRouter component. 700 701 **Parameters:** 702 703 - **rules** (<code>dict\[str, dict\]</code>) – A dictionary defining how to route documents or byte streams to output connections based on their 704 metadata. Keys are output connection names, and values are dictionaries of 705 [filtering expressions](https://docs.haystack.deepset.ai/docs/metadata-filtering) in Haystack. 706 For example: 707 708 ```python 709 { 710 "edge_1": { 711 "operator": "AND", 712 "conditions": [ 713 {"field": "meta.created_at", "operator": ">=", "value": "2023-01-01"}, 714 {"field": "meta.created_at", "operator": "<", "value": "2023-04-01"}, 715 ], 716 }, 717 "edge_2": { 718 "operator": "AND", 719 "conditions": [ 720 {"field": "meta.created_at", "operator": ">=", "value": "2023-04-01"}, 721 {"field": "meta.created_at", "operator": "<", "value": "2023-07-01"}, 722 ], 723 }, 724 "edge_3": { 725 "operator": "AND", 726 "conditions": [ 727 {"field": "meta.created_at", "operator": ">=", "value": "2023-07-01"}, 728 {"field": "meta.created_at", "operator": "<", "value": "2023-10-01"}, 729 ], 730 }, 731 "edge_4": { 732 "operator": "AND", 733 "conditions": [ 734 {"field": "meta.created_at", "operator": ">=", "value": "2023-10-01"}, 735 {"field": "meta.created_at", "operator": "<", "value": "2024-01-01"}, 736 ], 737 }, 738 } 739 ``` 740 741 :param output_type: The type of the output produced. Lists of Documents or ByteStreams can be specified. 742 743 #### run 744 745 ```python 746 run(documents: list[Document] | list[ByteStream]) 747 ``` 748 749 Routes documents or byte streams to different connections based on their metadata fields. 750 751 If a document or byte stream does not match any of the rules, it's routed to a connection named "unmatched". 752 753 **Parameters:** 754 755 - **documents** (<code>list\[Document\] | list\[ByteStream\]</code>) – A list of `Document` or `ByteStream` objects to be routed based on their metadata. 756 757 **Returns:** 758 759 - – A dictionary where the keys are the names of the output connections (including `"unmatched"`) 760 and the values are lists of `Document` or `ByteStream` objects that matched the corresponding rules. 761 762 #### to_dict 763 764 ```python 765 to_dict() -> dict[str, Any] 766 ``` 767 768 Serialize this component to a dictionary. 769 770 **Returns:** 771 772 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 773 774 #### from_dict 775 776 ```python 777 from_dict(data: dict[str, Any]) -> MetadataRouter 778 ``` 779 780 Deserialize this component from a dictionary. 781 782 **Parameters:** 783 784 - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component. 785 786 **Returns:** 787 788 - <code>MetadataRouter</code> – The deserialized component instance. 789 790 ## text_language_router 791 792 ### TextLanguageRouter 793 794 Routes text strings to different output connections based on their language. 795 796 Provide a list of languages during initialization. If the document's text doesn't match any of the 797 specified languages, the metadata value is set to "unmatched". 798 For routing documents based on their language, use the DocumentLanguageClassifier component, 799 followed by the MetaDataRouter. 800 801 ### Usage example 802 803 ```python 804 from haystack import Pipeline, Document 805 from haystack.components.routers import TextLanguageRouter 806 from haystack.document_stores.in_memory import InMemoryDocumentStore 807 from haystack.components.retrievers.in_memory import InMemoryBM25Retriever 808 809 document_store = InMemoryDocumentStore() 810 document_store.write_documents([Document(content="Elvis Presley was an American singer and actor.")]) 811 812 p = Pipeline() 813 p.add_component(instance=TextLanguageRouter(languages=["en"]), name="text_language_router") 814 p.add_component(instance=InMemoryBM25Retriever(document_store=document_store), name="retriever") 815 p.connect("text_language_router.en", "retriever.query") 816 817 result = p.run({"text_language_router": {"text": "Who was Elvis Presley?"}}) 818 assert result["retriever"]["documents"][0].content == "Elvis Presley was an American singer and actor." 819 820 result = p.run({"text_language_router": {"text": "ένα ελληνικό κείμενο"}}) 821 assert result["text_language_router"]["unmatched"] == "ένα ελληνικό κείμενο" 822 ``` 823 824 #### __init__ 825 826 ```python 827 __init__(languages: list[str] | None = None) 828 ``` 829 830 Initialize the TextLanguageRouter component. 831 832 **Parameters:** 833 834 - **languages** (<code>list\[str\] | None</code>) – A list of ISO language codes. 835 See the supported languages in [`langdetect` documentation](https://github.com/Mimino666/langdetect#languages). 836 If not specified, defaults to ["en"]. 837 838 #### run 839 840 ```python 841 run(text: str) -> dict[str, str] 842 ``` 843 844 Routes the text strings to different output connections based on their language. 845 846 If the document's text doesn't match any of the specified languages, the metadata value is set to "unmatched". 847 848 **Parameters:** 849 850 - **text** (<code>str</code>) – A text string to route. 851 852 **Returns:** 853 854 - <code>dict\[str, str\]</code> – A dictionary in which the key is the language (or `"unmatched"`), 855 and the value is the text. 856 857 **Raises:** 858 859 - <code>TypeError</code> – If the input is not a string. 860 861 ## transformers_text_router 862 863 ### TransformersTextRouter 864 865 Routes the text strings to different connections based on a category label. 866 867 The labels are specific to each model and can be found it its description on Hugging Face. 868 869 ### Usage example 870 871 ```python 872 from haystack.core.pipeline import Pipeline 873 from haystack.components.routers import TransformersTextRouter 874 from haystack.components.builders import PromptBuilder 875 from haystack.components.generators import HuggingFaceLocalGenerator 876 877 p = Pipeline() 878 p.add_component( 879 instance=TransformersTextRouter(model="papluca/xlm-roberta-base-language-detection"), 880 name="text_router" 881 ) 882 p.add_component( 883 instance=PromptBuilder(template="Answer the question: {{query}}\nAnswer:"), 884 name="english_prompt_builder" 885 ) 886 p.add_component( 887 instance=PromptBuilder(template="Beantworte die Frage: {{query}}\nAntwort:"), 888 name="german_prompt_builder" 889 ) 890 891 p.add_component( 892 instance=HuggingFaceLocalGenerator(model="DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1"), 893 name="german_llm" 894 ) 895 p.add_component( 896 instance=HuggingFaceLocalGenerator(model="microsoft/Phi-3-mini-4k-instruct"), 897 name="english_llm" 898 ) 899 900 p.connect("text_router.en", "english_prompt_builder.query") 901 p.connect("text_router.de", "german_prompt_builder.query") 902 p.connect("english_prompt_builder.prompt", "english_llm.prompt") 903 p.connect("german_prompt_builder.prompt", "german_llm.prompt") 904 905 # English Example 906 print(p.run({"text_router": {"text": "What is the capital of Germany?"}})) 907 908 # German Example 909 print(p.run({"text_router": {"text": "Was ist die Hauptstadt von Deutschland?"}})) 910 ``` 911 912 #### __init__ 913 914 ```python 915 __init__( 916 model: str, 917 labels: list[str] | None = None, 918 device: ComponentDevice | None = None, 919 token: Secret | None = Secret.from_env_var( 920 ["HF_API_TOKEN", "HF_TOKEN"], strict=False 921 ), 922 huggingface_pipeline_kwargs: dict[str, Any] | None = None, 923 ) 924 ``` 925 926 Initializes the TransformersTextRouter component. 927 928 **Parameters:** 929 930 - **model** (<code>str</code>) – The name or path of a Hugging Face model for text classification. 931 - **labels** (<code>list\[str\] | None</code>) – The list of labels. If not provided, the component fetches the labels 932 from the model configuration file hosted on the Hugging Face Hub using 933 `transformers.AutoConfig.from_pretrained`. 934 - **device** (<code>ComponentDevice | None</code>) – The device for loading the model. If `None`, automatically selects the default device. 935 If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter. 936 - **token** (<code>Secret | None</code>) – The API token used to download private models from Hugging Face. 937 If `True`, uses either `HF_API_TOKEN` or `HF_TOKEN` environment variables. 938 To generate these tokens, run `transformers-cli login`. 939 - **huggingface_pipeline_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments for initializing the Hugging Face 940 text classification pipeline. 941 942 #### warm_up 943 944 ```python 945 warm_up() 946 ``` 947 948 Initializes the component. 949 950 #### to_dict 951 952 ```python 953 to_dict() -> dict[str, Any] 954 ``` 955 956 Serializes the component to a dictionary. 957 958 **Returns:** 959 960 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 961 962 #### from_dict 963 964 ```python 965 from_dict(data: dict[str, Any]) -> TransformersTextRouter 966 ``` 967 968 Deserializes the component from a dictionary. 969 970 **Parameters:** 971 972 - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from. 973 974 **Returns:** 975 976 - <code>TransformersTextRouter</code> – Deserialized component. 977 978 #### run 979 980 ```python 981 run(text: str) -> dict[str, str] 982 ``` 983 984 Routes the text strings to different connections based on a category label. 985 986 **Parameters:** 987 988 - **text** (<code>str</code>) – A string of text to route. 989 990 **Returns:** 991 992 - <code>dict\[str, str\]</code> – A dictionary with the label as key and the text as value. 993 994 **Raises:** 995 996 - <code>TypeError</code> – If the input is not a str. 997 998 ## zero_shot_text_router 999 1000 ### TransformersZeroShotTextRouter 1001 1002 Routes the text strings to different connections based on a category label. 1003 1004 Specify the set of labels for categorization when initializing the component. 1005 1006 ### Usage example 1007 1008 ```python 1009 from haystack import Document 1010 from haystack.document_stores.in_memory import InMemoryDocumentStore 1011 from haystack.core.pipeline import Pipeline 1012 from haystack.components.routers import TransformersZeroShotTextRouter 1013 from haystack.components.embedders import SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder 1014 from haystack.components.retrievers import InMemoryEmbeddingRetriever 1015 1016 document_store = InMemoryDocumentStore() 1017 doc_embedder = SentenceTransformersDocumentEmbedder(model="intfloat/e5-base-v2") 1018 docs = [ 1019 Document( 1020 content="Germany, officially the Federal Republic of Germany, is a country in the western region of " 1021 "Central Europe. The nation's capital and most populous city is Berlin and its main financial centre " 1022 "is Frankfurt; the largest urban area is the Ruhr." 1023 ), 1024 Document( 1025 content="France, officially the French Republic, is a country located primarily in Western Europe. " 1026 "France is a unitary semi-presidential republic with its capital in Paris, the country's largest city " 1027 "and main cultural and commercial centre; other major urban areas include Marseille, Lyon, Toulouse, " 1028 "Lille, Bordeaux, Strasbourg, Nantes and Nice." 1029 ) 1030 ] 1031 docs_with_embeddings = doc_embedder.run(docs) 1032 document_store.write_documents(docs_with_embeddings["documents"]) 1033 1034 p = Pipeline() 1035 p.add_component(instance=TransformersZeroShotTextRouter(labels=["passage", "query"]), name="text_router") 1036 p.add_component( 1037 instance=SentenceTransformersTextEmbedder(model="intfloat/e5-base-v2", prefix="passage: "), 1038 name="passage_embedder" 1039 ) 1040 p.add_component( 1041 instance=SentenceTransformersTextEmbedder(model="intfloat/e5-base-v2", prefix="query: "), 1042 name="query_embedder" 1043 ) 1044 p.add_component( 1045 instance=InMemoryEmbeddingRetriever(document_store=document_store), 1046 name="query_retriever" 1047 ) 1048 p.add_component( 1049 instance=InMemoryEmbeddingRetriever(document_store=document_store), 1050 name="passage_retriever" 1051 ) 1052 1053 p.connect("text_router.passage", "passage_embedder.text") 1054 p.connect("passage_embedder.embedding", "passage_retriever.query_embedding") 1055 p.connect("text_router.query", "query_embedder.text") 1056 p.connect("query_embedder.embedding", "query_retriever.query_embedding") 1057 1058 # Query Example 1059 p.run({"text_router": {"text": "What is the capital of Germany?"}}) 1060 1061 # Passage Example 1062 p.run({ 1063 "text_router":{ 1064 "text": "The United Kingdom of Great Britain and Northern Ireland, commonly known as the " "United Kingdom (UK) or Britain, is a country in Northwestern Europe, off the north-western coast of " "the continental mainland." 1065 } 1066 }) 1067 ``` 1068 1069 #### __init__ 1070 1071 ```python 1072 __init__( 1073 labels: list[str], 1074 multi_label: bool = False, 1075 model: str = "MoritzLaurer/deberta-v3-base-zeroshot-v1.1-all-33", 1076 device: ComponentDevice | None = None, 1077 token: Secret | None = Secret.from_env_var( 1078 ["HF_API_TOKEN", "HF_TOKEN"], strict=False 1079 ), 1080 huggingface_pipeline_kwargs: dict[str, Any] | None = None, 1081 ) 1082 ``` 1083 1084 Initializes the TransformersZeroShotTextRouter component. 1085 1086 **Parameters:** 1087 1088 - **labels** (<code>list\[str\]</code>) – The set of labels to use for classification. Can be a single label, 1089 a string of comma-separated labels, or a list of labels. 1090 - **multi_label** (<code>bool</code>) – Indicates if multiple labels can be true. 1091 If `False`, label scores are normalized so their sum equals 1 for each sequence. 1092 If `True`, the labels are considered independent and probabilities are normalized for each candidate by 1093 doing a softmax of the entailment score vs. the contradiction score. 1094 - **model** (<code>str</code>) – The name or path of a Hugging Face model for zero-shot text classification. 1095 - **device** (<code>ComponentDevice | None</code>) – The device for loading the model. If `None`, automatically selects the default device. 1096 If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter. 1097 - **token** (<code>Secret | None</code>) – The API token used to download private models from Hugging Face. 1098 If `True`, uses either `HF_API_TOKEN` or `HF_TOKEN` environment variables. 1099 To generate these tokens, run `transformers-cli login`. 1100 - **huggingface_pipeline_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments for initializing the Hugging Face 1101 zero shot text classification. 1102 1103 #### warm_up 1104 1105 ```python 1106 warm_up() 1107 ``` 1108 1109 Initializes the component. 1110 1111 #### to_dict 1112 1113 ```python 1114 to_dict() -> dict[str, Any] 1115 ``` 1116 1117 Serializes the component to a dictionary. 1118 1119 **Returns:** 1120 1121 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 1122 1123 #### from_dict 1124 1125 ```python 1126 from_dict(data: dict[str, Any]) -> TransformersZeroShotTextRouter 1127 ``` 1128 1129 Deserializes the component from a dictionary. 1130 1131 **Parameters:** 1132 1133 - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from. 1134 1135 **Returns:** 1136 1137 - <code>TransformersZeroShotTextRouter</code> – Deserialized component. 1138 1139 #### run 1140 1141 ```python 1142 run(text: str) -> dict[str, str] 1143 ``` 1144 1145 Routes the text strings to different connections based on a category label. 1146 1147 **Parameters:** 1148 1149 - **text** (<code>str</code>) – A string of text to route. 1150 1151 **Returns:** 1152 1153 - <code>dict\[str, str\]</code> – A dictionary with the label as key and the text as value. 1154 1155 **Raises:** 1156 1157 - <code>TypeError</code> – If the input is not a str.