routers_api.md
1 --- 2 title: "Routers" 3 id: routers-api 4 description: "Routers is a group of components that route queries or Documents to other components that can handle them best." 5 slug: "/routers-api" 6 --- 7 8 9 ## conditional_router 10 11 ### NoRouteSelectedException 12 13 Bases: <code>Exception</code> 14 15 Exception raised when no route is selected in ConditionalRouter. 16 17 ### RouteConditionException 18 19 Bases: <code>Exception</code> 20 21 Exception raised when there is an error parsing or evaluating the condition expression in ConditionalRouter. 22 23 ### ConditionalRouter 24 25 Routes data based on specific conditions. 26 27 You define these conditions in a list of dictionaries called `routes`. 28 Each dictionary in this list represents a single route. Each route has these four elements: 29 30 - `condition`: A Jinja2 string expression that determines if the route is selected. 31 - `output`: A Jinja2 expression defining the route's output value. 32 - `output_type`: The type of the output data (for example, `str`, `list[int]`). 33 - `output_name`: The name you want to use to publish `output`. This name is used to connect 34 the router to other components in the pipeline. 35 36 ### Usage example 37 38 ```python 39 from haystack.components.routers import ConditionalRouter 40 41 routes = [ 42 { 43 "condition": "{{streams|length > 2}}", 44 "output": "{{streams}}", 45 "output_name": "enough_streams", 46 "output_type": list[int], 47 }, 48 { 49 "condition": "{{streams|length <= 2}}", 50 "output": "{{streams}}", 51 "output_name": "insufficient_streams", 52 "output_type": list[int], 53 }, 54 ] 55 router = ConditionalRouter(routes) 56 # When 'streams' has more than 2 items, 'enough_streams' output will activate, emitting the list [1, 2, 3] 57 kwargs = {"streams": [1, 2, 3], "query": "Haystack"} 58 result = router.run(**kwargs) 59 assert result == {"enough_streams": [1, 2, 3]} 60 ``` 61 62 In this example, we configure two routes. The first route sends the 'streams' value to 'enough_streams' if the 63 stream count exceeds two. The second route directs 'streams' to 'insufficient_streams' if there 64 are two or fewer streams. 65 66 In the pipeline setup, the Router connects to other components using the output names. For example, 67 'enough_streams' might connect to a component that processes streams, while 68 'insufficient_streams' might connect to a component that fetches more streams. 69 70 Here is a pipeline that uses `ConditionalRouter` and routes the fetched `ByteStreams` to 71 different components depending on the number of streams fetched: 72 73 ```python 74 from haystack import Pipeline 75 from haystack.dataclasses import ByteStream 76 from haystack.components.routers import ConditionalRouter 77 78 routes = [ 79 {"condition": "{{count > 5}}", 80 "output": "Processing many items", 81 "output_name": "many_items", 82 "output_type": str, 83 }, 84 {"condition": "{{count <= 5}}", 85 "output": "Processing few items", 86 "output_name": "few_items", 87 "output_type": str, 88 }, 89 ] 90 91 pipe = Pipeline() 92 pipe.add_component("router", ConditionalRouter(routes)) 93 94 # Run with count > 5 95 result = pipe.run({"router": {"count": 10}}) 96 print(result) 97 # >> {'router': {'many_items': 'Processing many items'}} 98 99 # Run with count <= 5 100 result = pipe.run({"router": {"count": 3}}) 101 print(result) 102 # >> {'router': {'few_items': 'Processing few items'}} 103 ``` 104 105 #### __init__ 106 107 ```python 108 __init__( 109 routes: list[Route], 110 custom_filters: dict[str, Callable] | None = None, 111 unsafe: bool = False, 112 validate_output_type: bool = False, 113 optional_variables: list[str] | None = None, 114 ) -> None 115 ``` 116 117 Initializes the `ConditionalRouter` with a list of routes detailing the conditions for routing. 118 119 **Parameters:** 120 121 - **routes** (<code>list\[Route\]</code>) – A list of dictionaries, each defining a route. 122 Each route has these four elements: 123 - `condition`: A Jinja2 string expression that determines if the route is selected. 124 - `output`: A Jinja2 expression defining the route's output value. 125 - `output_type`: The type of the output data (for example, `str`, `list[int]`). 126 - `output_name`: The name you want to use to publish `output`. This name is used to connect 127 the router to other components in the pipeline. 128 - **custom_filters** (<code>dict\[str, Callable\] | None</code>) – A dictionary of custom Jinja2 filters used in the condition expressions. 129 For example, passing `{"my_filter": my_filter_fcn}` where: 130 - `my_filter` is the name of the custom filter. 131 - `my_filter_fcn` is a callable that takes `my_var:str` and returns `my_var[:3]`. 132 `{{ my_var|my_filter }}` can then be used inside a route condition expression: 133 `"condition": "{{ my_var|my_filter == 'foo' }}"`. 134 - **unsafe** (<code>bool</code>) – Enable execution of arbitrary code in the Jinja template. 135 This should only be used if you trust the source of the template as it can be lead to remote code execution. 136 - **validate_output_type** (<code>bool</code>) – Enable validation of routes' output. 137 If a route output doesn't match the declared type a ValueError is raised running. 138 - **optional_variables** (<code>list\[str\] | None</code>) – A list of variable names that are optional in your route conditions and outputs. 139 If these variables are not provided at runtime, they will be set to `None`. 140 This allows you to write routes that can handle missing inputs gracefully without raising errors. 141 142 Example usage with a default fallback route in a Pipeline: 143 144 ```python 145 from haystack import Pipeline 146 from haystack.components.routers import ConditionalRouter 147 148 routes = [ 149 { 150 "condition": '{{ path == "rag" }}', 151 "output": "{{ question }}", 152 "output_name": "rag_route", 153 "output_type": str 154 }, 155 { 156 "condition": "{{ True }}", # fallback route 157 "output": "{{ question }}", 158 "output_name": "default_route", 159 "output_type": str 160 } 161 ] 162 163 router = ConditionalRouter(routes, optional_variables=["path"]) 164 pipe = Pipeline() 165 pipe.add_component("router", router) 166 167 # When 'path' is provided in the pipeline: 168 result = pipe.run(data={"router": {"question": "What?", "path": "rag"}}) 169 assert result["router"] == {"rag_route": "What?"} 170 171 # When 'path' is not provided, fallback route is taken: 172 result = pipe.run(data={"router": {"question": "What?"}}) 173 assert result["router"] == {"default_route": "What?"} 174 ``` 175 176 This pattern is particularly useful when: 177 178 - You want to provide default/fallback behavior when certain inputs are missing 179 - Some variables are only needed for specific routing conditions 180 - You're building flexible pipelines where not all inputs are guaranteed to be present 181 182 #### to_dict 183 184 ```python 185 to_dict() -> dict[str, Any] 186 ``` 187 188 Serializes the component to a dictionary. 189 190 **Returns:** 191 192 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 193 194 #### from_dict 195 196 ```python 197 from_dict(data: dict[str, Any]) -> ConditionalRouter 198 ``` 199 200 Deserializes the component from a dictionary. 201 202 **Parameters:** 203 204 - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from. 205 206 **Returns:** 207 208 - <code>ConditionalRouter</code> – The deserialized component. 209 210 #### run 211 212 ```python 213 run(**kwargs: Any) -> dict[str, Any] 214 ``` 215 216 Executes the routing logic. 217 218 Executes the routing logic by evaluating the specified boolean condition expressions for each route in the 219 order they are listed. The method directs the flow of data to the output specified in the first route whose 220 `condition` is True. 221 222 **Parameters:** 223 224 - **kwargs** (<code>Any</code>) – All variables used in the `condition` expressed in the routes. When the component is used in a 225 pipeline, these variables are passed from the previous component's output. 226 227 **Returns:** 228 229 - <code>dict\[str, Any\]</code> – A dictionary where the key is the `output_name` of the selected route and the value is the `output` 230 of the selected route. 231 232 **Raises:** 233 234 - <code>NoRouteSelectedException</code> – If no `condition' in the routes is `True\`. 235 - <code>RouteConditionException</code> – If there is an error parsing or evaluating the `condition` expression in the routes. 236 - <code>ValueError</code> – If type validation is enabled and route type doesn't match actual value type. 237 238 ## document_length_router 239 240 ### DocumentLengthRouter 241 242 Categorizes documents based on the length of the `content` field and routes them to the appropriate output. 243 244 A common use case for DocumentLengthRouter is handling documents obtained from PDFs that contain non-text 245 content, such as scanned pages or images. This component can detect empty or low-content documents and route them to 246 components that perform OCR, generate captions, or compute image embeddings. 247 248 ### Usage example 249 250 ```python 251 from haystack.components.routers import DocumentLengthRouter 252 from haystack.dataclasses import Document 253 254 docs = [ 255 Document(content="Short"), 256 Document(content="Long document "*20), 257 ] 258 259 router = DocumentLengthRouter(threshold=10) 260 261 result = router.run(documents=docs) 262 print(result) 263 264 # { 265 # "short_documents": [Document(content="Short", ...)], 266 # "long_documents": [Document(content="Long document ...", ...)], 267 # } 268 ``` 269 270 #### __init__ 271 272 ```python 273 __init__(*, threshold: int = 10) -> None 274 ``` 275 276 Initialize the DocumentLengthRouter component. 277 278 **Parameters:** 279 280 - **threshold** (<code>int</code>) – The threshold for the number of characters in the document `content` field. Documents where `content` is 281 None or whose character count is less than or equal to the threshold will be routed to the `short_documents` 282 output. Otherwise, they will be routed to the `long_documents` output. 283 To route only documents with None content to `short_documents`, set the threshold to a negative number. 284 285 #### run 286 287 ```python 288 run(documents: list[Document]) -> dict[str, list[Document]] 289 ``` 290 291 Categorize input documents into groups based on the length of the `content` field. 292 293 **Parameters:** 294 295 - **documents** (<code>list\[Document\]</code>) – A list of documents to be categorized. 296 297 **Returns:** 298 299 - <code>dict\[str, list\[Document\]\]</code> – A dictionary with the following keys: 300 - `short_documents`: A list of documents where `content` is None or the length of `content` is less than or 301 equal to the threshold. 302 - `long_documents`: A list of documents where the length of `content` is greater than the threshold. 303 304 ## document_type_router 305 306 ### DocumentTypeRouter 307 308 Routes documents by their MIME types. 309 310 DocumentTypeRouter is used to dynamically route documents within a pipeline based on their MIME types. 311 It supports exact MIME type matches and regex patterns. 312 313 MIME types can be extracted directly from document metadata or inferred from file paths using standard or 314 user-supplied MIME type mappings. 315 316 ### Usage example 317 318 ```python 319 from haystack.components.routers import DocumentTypeRouter 320 from haystack.dataclasses import Document 321 322 docs = [ 323 Document(content="Example text", meta={"file_path": "example.txt"}), 324 Document(content="Another document", meta={"mime_type": "application/pdf"}), 325 Document(content="Unknown type") 326 ] 327 328 router = DocumentTypeRouter( 329 mime_type_meta_field="mime_type", 330 file_path_meta_field="file_path", 331 mime_types=["text/plain", "application/pdf"] 332 ) 333 334 result = router.run(documents=docs) 335 print(result) 336 ``` 337 338 Expected output: 339 340 ```python 341 { 342 "text/plain": [Document(...)], 343 "application/pdf": [Document(...)], 344 "unclassified": [Document(...)] 345 } 346 ``` 347 348 #### __init__ 349 350 ```python 351 __init__( 352 *, 353 mime_types: list[str], 354 mime_type_meta_field: str | None = None, 355 file_path_meta_field: str | None = None, 356 additional_mimetypes: dict[str, str] | None = None 357 ) -> None 358 ``` 359 360 Initialize the DocumentTypeRouter component. 361 362 **Parameters:** 363 364 - **mime_types** (<code>list\[str\]</code>) – A list of MIME types or regex patterns to classify the input documents. 365 (for example: `["text/plain", "audio/x-wav", "image/jpeg"]`). 366 - **mime_type_meta_field** (<code>str | None</code>) – Optional name of the metadata field that holds the MIME type. 367 - **file_path_meta_field** (<code>str | None</code>) – Optional name of the metadata field that holds the file path. Used to infer the MIME type if 368 `mime_type_meta_field` is not provided or missing in a document. 369 - **additional_mimetypes** (<code>dict\[str, str\] | None</code>) – Optional dictionary mapping MIME types to file extensions to enhance or override the standard 370 `mimetypes` module. Useful when working with uncommon or custom file types. 371 For example: `{"application/vnd.custom-type": ".custom"}`. 372 373 **Raises:** 374 375 - <code>ValueError</code> – If `mime_types` is empty or if both `mime_type_meta_field` and `file_path_meta_field` are 376 not provided. 377 378 #### run 379 380 ```python 381 run(documents: list[Document]) -> dict[str, list[Document]] 382 ``` 383 384 Categorize input documents into groups based on their MIME type. 385 386 MIME types can either be directly available in document metadata or derived from file paths using the 387 standard Python `mimetypes` module and custom mappings. 388 389 **Parameters:** 390 391 - **documents** (<code>list\[Document\]</code>) – A list of documents to be categorized. 392 393 **Returns:** 394 395 - <code>dict\[str, list\[Document\]\]</code> – A dictionary where the keys are MIME types (or `"unclassified"`) and the values are lists of documents. 396 397 ## file_type_router 398 399 ### FileTypeRouter 400 401 Categorizes files or byte streams by their MIME types, helping in context-based routing. 402 403 FileTypeRouter supports both exact MIME type matching and regex patterns. 404 405 For file paths, MIME types come from extensions, while byte streams use metadata. 406 You can use regex patterns in the `mime_types` parameter to set broad categories 407 (such as 'audio/*' or 'text/*') or specific types. 408 MIME types without regex patterns are treated as exact matches. 409 410 ### Usage example 411 412 ```python 413 from haystack.components.routers import FileTypeRouter 414 from pathlib import Path 415 416 # For exact MIME type matching 417 router = FileTypeRouter(mime_types=["text/plain", "application/pdf"]) 418 419 # For flexible matching using regex, to handle all audio types 420 router_with_regex = FileTypeRouter(mime_types=[r"audio/.*", r"text/plain"]) 421 422 sources = [Path("file.txt"), Path("document.pdf"), Path("song.mp3")] 423 print(router.run(sources=sources)) 424 print(router_with_regex.run(sources=sources)) 425 426 # Expected output: 427 # {'text/plain': [ 428 # PosixPath('file.txt')], 'application/pdf': [PosixPath('document.pdf')], 'unclassified': [PosixPath('song.mp3') 429 # ]} 430 # {'audio/.*': [ 431 # PosixPath('song.mp3')], 'text/plain': [PosixPath('file.txt')], 'unclassified': [PosixPath('document.pdf') 432 # ]} 433 ``` 434 435 #### __init__ 436 437 ```python 438 __init__( 439 mime_types: list[str], 440 additional_mimetypes: dict[str, str] | None = None, 441 raise_on_failure: bool = False, 442 ) -> None 443 ``` 444 445 Initialize the FileTypeRouter component. 446 447 **Parameters:** 448 449 - **mime_types** (<code>list\[str\]</code>) – A list of MIME types or regex patterns to classify the input files or byte streams. 450 (for example: `["text/plain", "audio/x-wav", "image/jpeg"]`). 451 - **additional_mimetypes** (<code>dict\[str, str\] | None</code>) – A dictionary containing the MIME type to add to the mimetypes package to prevent unsupported or non-native 452 packages from being unclassified. 453 (for example: `{"application/vnd.openxmlformats-officedocument.wordprocessingml.document": ".docx"}`). 454 - **raise_on_failure** (<code>bool</code>) – If True, raises FileNotFoundError when a file path doesn't exist. 455 If False (default), only emits a warning when a file path doesn't exist. 456 457 #### to_dict 458 459 ```python 460 to_dict() -> dict[str, Any] 461 ``` 462 463 Serializes the component to a dictionary. 464 465 **Returns:** 466 467 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 468 469 #### from_dict 470 471 ```python 472 from_dict(data: dict[str, Any]) -> FileTypeRouter 473 ``` 474 475 Deserializes the component from a dictionary. 476 477 **Parameters:** 478 479 - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from. 480 481 **Returns:** 482 483 - <code>FileTypeRouter</code> – The deserialized component. 484 485 #### run 486 487 ```python 488 run( 489 sources: list[str | Path | ByteStream], 490 meta: dict[str, Any] | list[dict[str, Any]] | None = None, 491 ) -> dict[str, list[ByteStream | Path]] 492 ``` 493 494 Categorize files or byte streams according to their MIME types. 495 496 **Parameters:** 497 498 - **sources** (<code>list\[str | Path | ByteStream\]</code>) – A list of file paths or byte streams to categorize. 499 - **meta** (<code>dict\[str, Any\] | list\[dict\[str, Any\]\] | None</code>) – Optional metadata to attach to the sources. 500 When provided, the sources are internally converted to ByteStream objects and the metadata is added. 501 This value can be a list of dictionaries or a single dictionary. 502 If it's a single dictionary, its content is added to the metadata of all ByteStream objects. 503 If it's a list, its length must match the number of sources, as they are zipped together. 504 505 **Returns:** 506 507 - <code>dict\[str, list\[ByteStream | Path\]\]</code> – A dictionary where the keys are MIME types and the values are lists of data sources. 508 Two extra keys may be returned: `"unclassified"` when a source's MIME type doesn't match any pattern 509 and `"failed"` when a source cannot be processed (for example, a file path that doesn't exist). 510 511 **Raises:** 512 513 - <code>TypeError</code> – If a source is not a Path, str, or ByteStream. 514 515 ## llm_messages_router 516 517 ### LLMMessagesRouter 518 519 Routes Chat Messages to different connections using a generative Language Model to perform classification. 520 521 This component can be used with general-purpose LLMs and with specialized LLMs for moderation like Llama Guard. 522 523 ### Usage example 524 525 ```python 526 from haystack.components.generators.chat import HuggingFaceAPIChatGenerator 527 from haystack.components.routers.llm_messages_router import LLMMessagesRouter 528 from haystack.dataclasses import ChatMessage 529 530 # initialize a Chat Generator with a generative model for moderation 531 chat_generator = HuggingFaceAPIChatGenerator( 532 api_type="serverless_inference_api", 533 api_params={"model": "openai/gpt-oss-safeguard-20b", "provider": "groq"}, 534 ) 535 536 router = LLMMessagesRouter(chat_generator=chat_generator, 537 output_names=["unsafe", "safe"], 538 output_patterns=["unsafe", "safe"]) 539 540 541 print(router.run([ChatMessage.from_user("How to rob a bank?")])) 542 543 # { 544 # 'chat_generator_text': 'unsafe\nS2', 545 # 'unsafe': [ 546 # ChatMessage( 547 # _role=<ChatRole.USER: 'user'>, 548 # _content=[TextContent(text='How to rob a bank?')], 549 # _name=None, 550 # _meta={} 551 # ) 552 # ] 553 # } 554 ``` 555 556 #### __init__ 557 558 ```python 559 __init__( 560 chat_generator: ChatGenerator, 561 output_names: list[str], 562 output_patterns: list[str], 563 system_prompt: str | None = None, 564 ) -> None 565 ``` 566 567 Initialize the LLMMessagesRouter component. 568 569 **Parameters:** 570 571 - **chat_generator** (<code>ChatGenerator</code>) – A ChatGenerator instance which represents the LLM. 572 - **output_names** (<code>list\[str\]</code>) – A list of output connection names. These can be used to connect the router to other 573 components. 574 - **output_patterns** (<code>list\[str\]</code>) – A list of regular expressions to be matched against the output of the LLM. Each pattern 575 corresponds to an output name. Patterns are evaluated in order. 576 When using moderation models, refer to the model card to understand the expected outputs. 577 - **system_prompt** (<code>str | None</code>) – An optional system prompt to customize the behavior of the LLM. 578 For moderation models, refer to the model card for supported customization options. 579 580 **Raises:** 581 582 - <code>ValueError</code> – If output_names and output_patterns are not non-empty lists of the same length. 583 584 #### warm_up 585 586 ```python 587 warm_up() -> None 588 ``` 589 590 Warm up the underlying LLM. 591 592 #### run 593 594 ```python 595 run(messages: list[ChatMessage]) -> dict[str, str | list[ChatMessage]] 596 ``` 597 598 Classify the messages based on LLM output and route them to the appropriate output connection. 599 600 **Parameters:** 601 602 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessages to be routed. Only user and assistant messages are supported. 603 604 **Returns:** 605 606 - <code>dict\[str, str | list\[ChatMessage\]\]</code> – A dictionary with the following keys: 607 - "chat_generator_text": The text output of the LLM, useful for debugging. 608 - "output_names": Each contains the list of messages that matched the corresponding pattern. 609 - "unmatched": The messages that did not match any of the output patterns. 610 611 **Raises:** 612 613 - <code>ValueError</code> – If messages is an empty list or contains messages with unsupported roles. 614 615 #### to_dict 616 617 ```python 618 to_dict() -> dict[str, Any] 619 ``` 620 621 Serialize this component to a dictionary. 622 623 **Returns:** 624 625 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 626 627 #### from_dict 628 629 ```python 630 from_dict(data: dict[str, Any]) -> LLMMessagesRouter 631 ``` 632 633 Deserialize this component from a dictionary. 634 635 **Parameters:** 636 637 - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component. 638 639 **Returns:** 640 641 - <code>LLMMessagesRouter</code> – The deserialized component instance. 642 643 ## metadata_router 644 645 ### MetadataRouter 646 647 Routes documents or byte streams to different connections based on their metadata fields. 648 649 Specify the routing rules in the `init` method. 650 If a document or byte stream does not match any of the rules, it's routed to a connection named "unmatched". 651 652 ### Usage examples 653 654 **Routing Documents by metadata:** 655 656 ```python 657 from haystack import Document 658 from haystack.components.routers import MetadataRouter 659 660 docs = [Document(content="Paris is the capital of France.", meta={"language": "en"}), 661 Document(content="Berlin ist die Haupststadt von Deutschland.", meta={"language": "de"})] 662 663 router = MetadataRouter(rules={"en": {"field": "meta.language", "operator": "==", "value": "en"}}) 664 665 print(router.run(documents=docs)) 666 # {'en': [Document(id=..., content: 'Paris is the capital of France.', meta: {'language': 'en'})], 667 # 'unmatched': [Document(id=..., content: 'Berlin ist die Haupststadt von Deutschland.', meta: {'language': 'de'})]} 668 ``` 669 670 **Routing ByteStreams by metadata:** 671 672 ```python 673 from haystack.dataclasses import ByteStream 674 from haystack.components.routers import MetadataRouter 675 676 streams = [ 677 ByteStream.from_string("Hello world", meta={"language": "en"}), 678 ByteStream.from_string("Bonjour le monde", meta={"language": "fr"}) 679 ] 680 681 router = MetadataRouter( 682 rules={"english": {"field": "meta.language", "operator": "==", "value": "en"}}, 683 output_type=list[ByteStream] 684 ) 685 686 result = router.run(documents=streams) 687 # {'english': [ByteStream(...)], 'unmatched': [ByteStream(...)]} 688 ``` 689 690 #### __init__ 691 692 ```python 693 __init__(rules: dict[str, dict], output_type: type = list[Document]) -> None 694 ``` 695 696 Initializes the MetadataRouter component. 697 698 **Parameters:** 699 700 - **rules** (<code>dict\[str, dict\]</code>) – A dictionary defining how to route documents or byte streams to output connections based on their 701 metadata. Keys are output connection names, and values are dictionaries of 702 [filtering expressions](https://docs.haystack.deepset.ai/docs/metadata-filtering) in Haystack. 703 For example: 704 705 ```python 706 { 707 "edge_1": { 708 "operator": "AND", 709 "conditions": [ 710 {"field": "meta.created_at", "operator": ">=", "value": "2023-01-01"}, 711 {"field": "meta.created_at", "operator": "<", "value": "2023-04-01"}, 712 ], 713 }, 714 "edge_2": { 715 "operator": "AND", 716 "conditions": [ 717 {"field": "meta.created_at", "operator": ">=", "value": "2023-04-01"}, 718 {"field": "meta.created_at", "operator": "<", "value": "2023-07-01"}, 719 ], 720 }, 721 "edge_3": { 722 "operator": "AND", 723 "conditions": [ 724 {"field": "meta.created_at", "operator": ">=", "value": "2023-07-01"}, 725 {"field": "meta.created_at", "operator": "<", "value": "2023-10-01"}, 726 ], 727 }, 728 "edge_4": { 729 "operator": "AND", 730 "conditions": [ 731 {"field": "meta.created_at", "operator": ">=", "value": "2023-10-01"}, 732 {"field": "meta.created_at", "operator": "<", "value": "2024-01-01"}, 733 ], 734 }, 735 } 736 ``` 737 738 :param output_type: The type of the output produced. Lists of Documents or ByteStreams can be specified. 739 740 #### run 741 742 ```python 743 run( 744 documents: list[Document] | list[ByteStream], 745 ) -> dict[str, list[Document] | list[ByteStream]] 746 ``` 747 748 Routes documents or byte streams to different connections based on their metadata fields. 749 750 If a document or byte stream does not match any of the rules, it's routed to a connection named "unmatched". 751 752 **Parameters:** 753 754 - **documents** (<code>list\[Document\] | list\[ByteStream\]</code>) – A list of `Document` or `ByteStream` objects to be routed based on their metadata. 755 756 **Returns:** 757 758 - <code>dict\[str, list\[Document\] | list\[ByteStream\]\]</code> – A dictionary where the keys are the names of the output connections (including `"unmatched"`) 759 and the values are lists of `Document` or `ByteStream` objects that matched the corresponding rules. 760 761 #### to_dict 762 763 ```python 764 to_dict() -> dict[str, Any] 765 ``` 766 767 Serialize this component to a dictionary. 768 769 **Returns:** 770 771 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 772 773 #### from_dict 774 775 ```python 776 from_dict(data: dict[str, Any]) -> MetadataRouter 777 ``` 778 779 Deserialize this component from a dictionary. 780 781 **Parameters:** 782 783 - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component. 784 785 **Returns:** 786 787 - <code>MetadataRouter</code> – The deserialized component instance. 788 789 ## text_language_router 790 791 ### TextLanguageRouter 792 793 Routes text strings to different output connections based on their language. 794 795 Provide a list of languages during initialization. If the document's text doesn't match any of the 796 specified languages, the metadata value is set to "unmatched". 797 For routing documents based on their language, use the DocumentLanguageClassifier component, 798 followed by the MetaDataRouter. 799 800 ### Usage example 801 802 ```python 803 from haystack import Pipeline, Document 804 from haystack.components.routers import TextLanguageRouter 805 from haystack.document_stores.in_memory import InMemoryDocumentStore 806 from haystack.components.retrievers.in_memory import InMemoryBM25Retriever 807 808 document_store = InMemoryDocumentStore() 809 document_store.write_documents([Document(content="Elvis Presley was an American singer and actor.")]) 810 811 p = Pipeline() 812 p.add_component(instance=TextLanguageRouter(languages=["en"]), name="text_language_router") 813 p.add_component(instance=InMemoryBM25Retriever(document_store=document_store), name="retriever") 814 p.connect("text_language_router.en", "retriever.query") 815 816 result = p.run({"text_language_router": {"text": "Who was Elvis Presley?"}}) 817 assert result["retriever"]["documents"][0].content == "Elvis Presley was an American singer and actor." 818 819 result = p.run({"text_language_router": {"text": "ένα ελληνικό κείμενο"}}) 820 assert result["text_language_router"]["unmatched"] == "ένα ελληνικό κείμενο" 821 ``` 822 823 #### __init__ 824 825 ```python 826 __init__(languages: list[str] | None = None) -> None 827 ``` 828 829 Initialize the TextLanguageRouter component. 830 831 **Parameters:** 832 833 - **languages** (<code>list\[str\] | None</code>) – A list of ISO language codes. 834 See the supported languages in [`langdetect` documentation](https://github.com/Mimino666/langdetect#languages). 835 If not specified, defaults to ["en"]. 836 837 #### run 838 839 ```python 840 run(text: str) -> dict[str, str] 841 ``` 842 843 Routes the text strings to different output connections based on their language. 844 845 If the document's text doesn't match any of the specified languages, the metadata value is set to "unmatched". 846 847 **Parameters:** 848 849 - **text** (<code>str</code>) – A text string to route. 850 851 **Returns:** 852 853 - <code>dict\[str, str\]</code> – A dictionary in which the key is the language (or `"unmatched"`), 854 and the value is the text. 855 856 **Raises:** 857 858 - <code>TypeError</code> – If the input is not a string. 859 860 ## transformers_text_router 861 862 ### TransformersTextRouter 863 864 Routes the text strings to different connections based on a category label. 865 866 The labels are specific to each model and can be found it its description on Hugging Face. 867 868 ### Usage example 869 870 <!-- test-ignore --> 871 872 ```python 873 from haystack.core.pipeline import Pipeline 874 from haystack.components.routers import TransformersTextRouter 875 from haystack.components.builders import PromptBuilder 876 from haystack.components.generators import HuggingFaceLocalGenerator 877 878 p = Pipeline() 879 p.add_component( 880 instance=TransformersTextRouter(model="papluca/xlm-roberta-base-language-detection"), 881 name="text_router" 882 ) 883 p.add_component( 884 instance=PromptBuilder(template="Answer the question: {{query}}\nAnswer:"), 885 name="english_prompt_builder" 886 ) 887 p.add_component( 888 instance=PromptBuilder(template="Beantworte die Frage: {{query}}\nAntwort:"), 889 name="german_prompt_builder" 890 ) 891 892 p.add_component( 893 instance=HuggingFaceLocalGenerator(model="DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1"), 894 name="german_llm" 895 ) 896 p.add_component( 897 instance=HuggingFaceLocalGenerator(model="microsoft/Phi-3-mini-4k-instruct"), 898 name="english_llm" 899 ) 900 901 p.connect("text_router.en", "english_prompt_builder.query") 902 p.connect("text_router.de", "german_prompt_builder.query") 903 p.connect("english_prompt_builder.prompt", "english_llm.prompt") 904 p.connect("german_prompt_builder.prompt", "german_llm.prompt") 905 906 # English Example 907 print(p.run({"text_router": {"text": "What is the capital of Germany?"}})) 908 909 # German Example 910 print(p.run({"text_router": {"text": "Was ist die Hauptstadt von Deutschland?"}})) 911 ``` 912 913 #### __init__ 914 915 ```python 916 __init__( 917 model: str, 918 labels: list[str] | None = None, 919 device: ComponentDevice | None = None, 920 token: Secret | None = Secret.from_env_var( 921 ["HF_API_TOKEN", "HF_TOKEN"], strict=False 922 ), 923 huggingface_pipeline_kwargs: dict[str, Any] | None = None, 924 ) -> None 925 ``` 926 927 Initializes the TransformersTextRouter component. 928 929 **Parameters:** 930 931 - **model** (<code>str</code>) – The name or path of a Hugging Face model for text classification. 932 - **labels** (<code>list\[str\] | None</code>) – The list of labels. If not provided, the component fetches the labels 933 from the model configuration file hosted on the Hugging Face Hub using 934 `transformers.AutoConfig.from_pretrained`. 935 - **device** (<code>ComponentDevice | None</code>) – The device for loading the model. If `None`, automatically selects the default device. 936 If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter. 937 - **token** (<code>Secret | None</code>) – The API token used to download private models from Hugging Face. 938 If `True`, uses either `HF_API_TOKEN` or `HF_TOKEN` environment variables. 939 To generate these tokens, run `transformers-cli login`. 940 - **huggingface_pipeline_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments for initializing the Hugging Face 941 text classification pipeline. 942 943 #### warm_up 944 945 ```python 946 warm_up() -> None 947 ``` 948 949 Initializes the component. 950 951 #### to_dict 952 953 ```python 954 to_dict() -> dict[str, Any] 955 ``` 956 957 Serializes the component to a dictionary. 958 959 **Returns:** 960 961 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 962 963 #### from_dict 964 965 ```python 966 from_dict(data: dict[str, Any]) -> TransformersTextRouter 967 ``` 968 969 Deserializes the component from a dictionary. 970 971 **Parameters:** 972 973 - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from. 974 975 **Returns:** 976 977 - <code>TransformersTextRouter</code> – Deserialized component. 978 979 #### run 980 981 ```python 982 run(text: str) -> dict[str, str] 983 ``` 984 985 Routes the text strings to different connections based on a category label. 986 987 **Parameters:** 988 989 - **text** (<code>str</code>) – A string of text to route. 990 991 **Returns:** 992 993 - <code>dict\[str, str\]</code> – A dictionary with the label as key and the text as value. 994 995 **Raises:** 996 997 - <code>TypeError</code> – If the input is not a str. 998 999 ## zero_shot_text_router 1000 1001 ### TransformersZeroShotTextRouter 1002 1003 Routes the text strings to different connections based on a category label. 1004 1005 Specify the set of labels for categorization when initializing the component. 1006 1007 ### Usage example 1008 1009 ```python 1010 from haystack import Document 1011 from haystack.document_stores.in_memory import InMemoryDocumentStore 1012 from haystack.core.pipeline import Pipeline 1013 from haystack.components.routers import TransformersZeroShotTextRouter 1014 from haystack.components.embedders import SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder 1015 from haystack.components.retrievers import InMemoryEmbeddingRetriever 1016 1017 document_store = InMemoryDocumentStore() 1018 doc_embedder = SentenceTransformersDocumentEmbedder(model="intfloat/e5-base-v2") 1019 docs = [ 1020 Document( 1021 content="Germany, officially the Federal Republic of Germany, is a country in the western region of " 1022 "Central Europe. The nation's capital and most populous city is Berlin and its main financial centre " 1023 "is Frankfurt; the largest urban area is the Ruhr." 1024 ), 1025 Document( 1026 content="France, officially the French Republic, is a country located primarily in Western Europe. " 1027 "France is a unitary semi-presidential republic with its capital in Paris, the country's largest city " 1028 "and main cultural and commercial centre; other major urban areas include Marseille, Lyon, Toulouse, " 1029 "Lille, Bordeaux, Strasbourg, Nantes and Nice." 1030 ) 1031 ] 1032 docs_with_embeddings = doc_embedder.run(docs) 1033 document_store.write_documents(docs_with_embeddings["documents"]) 1034 1035 p = Pipeline() 1036 p.add_component(instance=TransformersZeroShotTextRouter(labels=["passage", "query"]), name="text_router") 1037 p.add_component( 1038 instance=SentenceTransformersTextEmbedder(model="intfloat/e5-base-v2", prefix="passage: "), 1039 name="passage_embedder" 1040 ) 1041 p.add_component( 1042 instance=SentenceTransformersTextEmbedder(model="intfloat/e5-base-v2", prefix="query: "), 1043 name="query_embedder" 1044 ) 1045 p.add_component( 1046 instance=InMemoryEmbeddingRetriever(document_store=document_store), 1047 name="query_retriever" 1048 ) 1049 p.add_component( 1050 instance=InMemoryEmbeddingRetriever(document_store=document_store), 1051 name="passage_retriever" 1052 ) 1053 1054 p.connect("text_router.passage", "passage_embedder.text") 1055 p.connect("passage_embedder.embedding", "passage_retriever.query_embedding") 1056 p.connect("text_router.query", "query_embedder.text") 1057 p.connect("query_embedder.embedding", "query_retriever.query_embedding") 1058 1059 # Query Example 1060 p.run({"text_router": {"text": "What is the capital of Germany?"}}) 1061 1062 # Passage Example 1063 p.run({ 1064 "text_router":{ 1065 "text": "The United Kingdom of Great Britain and Northern Ireland, commonly known as the " "United Kingdom (UK) or Britain, is a country in Northwestern Europe, off the north-western coast of " "the continental mainland." 1066 } 1067 }) 1068 ``` 1069 1070 #### __init__ 1071 1072 ```python 1073 __init__( 1074 labels: list[str], 1075 multi_label: bool = False, 1076 model: str = "MoritzLaurer/deberta-v3-base-zeroshot-v1.1-all-33", 1077 device: ComponentDevice | None = None, 1078 token: Secret | None = Secret.from_env_var( 1079 ["HF_API_TOKEN", "HF_TOKEN"], strict=False 1080 ), 1081 huggingface_pipeline_kwargs: dict[str, Any] | None = None, 1082 ) -> None 1083 ``` 1084 1085 Initializes the TransformersZeroShotTextRouter component. 1086 1087 **Parameters:** 1088 1089 - **labels** (<code>list\[str\]</code>) – The set of labels to use for classification. Can be a single label, 1090 a string of comma-separated labels, or a list of labels. 1091 - **multi_label** (<code>bool</code>) – Indicates if multiple labels can be true. 1092 If `False`, label scores are normalized so their sum equals 1 for each sequence. 1093 If `True`, the labels are considered independent and probabilities are normalized for each candidate by 1094 doing a softmax of the entailment score vs. the contradiction score. 1095 - **model** (<code>str</code>) – The name or path of a Hugging Face model for zero-shot text classification. 1096 - **device** (<code>ComponentDevice | None</code>) – The device for loading the model. If `None`, automatically selects the default device. 1097 If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter. 1098 - **token** (<code>Secret | None</code>) – The API token used to download private models from Hugging Face. 1099 If `True`, uses either `HF_API_TOKEN` or `HF_TOKEN` environment variables. 1100 To generate these tokens, run `transformers-cli login`. 1101 - **huggingface_pipeline_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments for initializing the Hugging Face 1102 zero shot text classification. 1103 1104 #### warm_up 1105 1106 ```python 1107 warm_up() -> None 1108 ``` 1109 1110 Initializes the component. 1111 1112 #### to_dict 1113 1114 ```python 1115 to_dict() -> dict[str, Any] 1116 ``` 1117 1118 Serializes the component to a dictionary. 1119 1120 **Returns:** 1121 1122 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 1123 1124 #### from_dict 1125 1126 ```python 1127 from_dict(data: dict[str, Any]) -> TransformersZeroShotTextRouter 1128 ``` 1129 1130 Deserializes the component from a dictionary. 1131 1132 **Parameters:** 1133 1134 - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from. 1135 1136 **Returns:** 1137 1138 - <code>TransformersZeroShotTextRouter</code> – Deserialized component. 1139 1140 #### run 1141 1142 ```python 1143 run(text: str) -> dict[str, str] 1144 ``` 1145 1146 Routes the text strings to different connections based on a category label. 1147 1148 **Parameters:** 1149 1150 - **text** (<code>str</code>) – A string of text to route. 1151 1152 **Returns:** 1153 1154 - <code>dict\[str, str\]</code> – A dictionary with the label as key and the text as value. 1155 1156 **Raises:** 1157 1158 - <code>TypeError</code> – If the input is not a str.