supercomponents.mdx
1 --- 2 title: "SuperComponents" 3 id: supercomponents 4 slug: "/supercomponents" 5 description: "`SuperComponent` lets you wrap a complete pipeline and use it like a single component. This is helpful when you want to simplify the interface of a complex pipeline, reuse it in different contexts, or expose only the necessary inputs and outputs." 6 --- 7 8 # SuperComponents 9 10 `SuperComponent` lets you wrap a complete pipeline and use it like a single component. This is helpful when you want to simplify the interface of a complex pipeline, reuse it in different contexts, or expose only the necessary inputs and outputs. 11 12 ## `@super_component` decorator (recommended) 13 14 Haystack now provides a simple `@super_component` decorator for wrapping a pipeline as a component. All you need is to create a class with the decorator, and to include an `pipeline` attribute. 15 16 With this decorator, the `to_dict` and `from_dict` serialization is optional, as is the input and output mapping. 17 18 ### Example 19 20 The custom HybridRetriever example SuperComponent below turns your query into embeddings, then runs both a BM25 search and an embedding-based search at the same time. It finally merges those two result sets and returns the combined documents. 21 22 ```python 23 ## pip install haystack-ai datasets "sentence-transformers>=3.0.0" 24 25 from haystack import Document, Pipeline, super_component 26 from haystack.components.joiners import DocumentJoiner 27 from haystack.components.embedders import SentenceTransformersTextEmbedder 28 from haystack.components.retrievers import ( 29 InMemoryBM25Retriever, 30 InMemoryEmbeddingRetriever, 31 ) 32 from haystack.document_stores.in_memory import InMemoryDocumentStore 33 34 from datasets import load_dataset 35 36 37 @super_component 38 class HybridRetriever: 39 def __init__( 40 self, 41 document_store: InMemoryDocumentStore, 42 embedder_model: str = "BAAI/bge-small-en-v1.5", 43 ): 44 embedding_retriever = InMemoryEmbeddingRetriever(document_store) 45 bm25_retriever = InMemoryBM25Retriever(document_store) 46 text_embedder = SentenceTransformersTextEmbedder(embedder_model) 47 document_joiner = DocumentJoiner() 48 49 self.pipeline = Pipeline() 50 self.pipeline.add_component("text_embedder", text_embedder) 51 self.pipeline.add_component("embedding_retriever", embedding_retriever) 52 self.pipeline.add_component("bm25_retriever", bm25_retriever) 53 self.pipeline.add_component("document_joiner", document_joiner) 54 55 self.pipeline.connect("text_embedder", "embedding_retriever") 56 self.pipeline.connect("bm25_retriever", "document_joiner") 57 self.pipeline.connect("embedding_retriever", "document_joiner") 58 59 60 dataset = load_dataset("HaystackBot/medrag-pubmed-chunk-with-embeddings", split="train") 61 docs = [ 62 Document(content=doc["contents"], embedding=doc["embedding"]) for doc in dataset 63 ] 64 document_store = InMemoryDocumentStore() 65 document_store.write_documents(docs) 66 67 query = "What treatments are available for chronic bronchitis?" 68 69 result = HybridRetriever(document_store).run(text=query, query=query) 70 print(result) 71 ``` 72 73 ### Input Mapping 74 75 You can optionally map the input names of your SuperComponent to the actual sockets inside the pipeline. 76 77 ```python 78 input_mapping = {"query": ["retriever.query", "prompt.query"]} 79 ``` 80 81 ### Output Mapping 82 83 You can also map the pipeline's output sockets that you want to expose to the SuperComponent's output names. 84 85 ```python 86 output_mapping = {"llm.replies": "replies"} 87 ``` 88 89 If you don’t provide mappings, SuperComponent will try to auto-detect them. So, if multiple components have outputs with the same name, we recommend using `output_mapping` to avoid conflicts. 90 91 ## SuperComponent class 92 93 Haystack also gives you an option to inherit from SuperComponent class. This option requires `to_dict` and `from_dict` serialization, as well as the input and output mapping described above. 94 95 ### Example 96 97 Here is a simple example of initializing a `SuperComponent` with a pipeline: 98 99 ```python 100 from haystack import Pipeline, SuperComponent 101 102 with open("pipeline.yaml", "r") as file: 103 pipeline = Pipeline.load(file) 104 105 super_component = SuperComponent(pipeline) 106 ``` 107 108 The example pipeline below retrieves relevant documents based on a user query, builds a custom prompt using those documents, then sends the prompt to an `OpenAIChatGenerator` to create an answer. The `SuperComponent` wraps the pipeline so it can be run with a simple input (`query`) and returns a clean output (`replies`). 109 110 ```python 111 from haystack import Pipeline, SuperComponent 112 from haystack.components.generators.chat import OpenAIChatGenerator 113 from haystack.components.builders import ChatPromptBuilder 114 from haystack.components.retrievers import InMemoryBM25Retriever 115 from haystack.dataclasses.chat_message import ChatMessage 116 from haystack.document_stores.in_memory import InMemoryDocumentStore 117 from haystack.dataclasses import Document 118 119 document_store = InMemoryDocumentStore() 120 documents = [ 121 Document(content="Paris is the capital of France."), 122 Document(content="London is the capital of England."), 123 ] 124 document_store.write_documents(documents) 125 126 prompt_template = [ 127 ChatMessage.from_user( 128 ''' 129 According to the following documents: 130 {% for document in documents %} 131 {{document.content}} 132 {% endfor %} 133 Answer the given question: {{query}} 134 Answer: 135 ''' 136 ) 137 ] 138 139 prompt_builder = ChatPromptBuilder(template=prompt_template, required_variables="*") 140 141 pipeline = Pipeline() 142 pipeline.add_component("retriever", InMemoryBM25Retriever(document_store=document_store)) 143 pipeline.add_component("prompt_builder", prompt_builder) 144 pipeline.add_component("llm", OpenAIChatGenerator()) 145 pipeline.connect("retriever.documents", "prompt_builder.documents") 146 pipeline.connect("prompt_builder.prompt", "llm.messages") 147 148 ## Create a super component with simplified input/output mapping 149 wrapper = SuperComponent( 150 pipeline=pipeline, 151 input_mapping={ 152 "query": ["retriever.query", "prompt_builder.query"], 153 }, 154 output_mapping={ 155 "llm.replies": "replies", 156 "retriever.documents": "documents" 157 } 158 ) 159 160 ## Run the pipeline with simplified interface 161 result = wrapper.run(query="What is the capital of France?") 162 print(result) 163 {'replies': [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, 164 _content=[TextContent(text='The capital of France is Paris.')],...) 165 ``` 166 167 ## Type Checking and Static Code Analysis 168 169 Creating SuperComponents using the @super_component decorator can induce type or linting errors. One way to avoid these issues is to add the exposed public methods to your SuperComponent. Here's an example: 170 171 ```python 172 from typing import TYPE_CHECKING 173 174 if TYPE_CHECKING: 175 176 def run(self, *, documents: list[Document]) -> dict[str, list[Document]]: ... 177 def warm_up(self) -> None: # noqa: D102 178 ... 179 ``` 180 181 ## Ready-Made SuperComponents 182 183 You can see two implementations of SuperComponents already integrated in Haystack: 184 185 - [DocumentPreprocessor](../../pipeline-components/preprocessors/documentpreprocessor.mdx) 186 - [MultiFileConverter](../../pipeline-components/converters/multifileconverter.mdx) 187 - [OpenSearchHybridRetriever](../../pipeline-components/retrievers/opensearchhybridretriever.mdx)