Cradicle Explorer

/ docs-website / versioned_docs / version-2.25 / concepts / components / supercomponents.mdx
supercomponents.mdx
  1  ---
  2  title: "SuperComponents"
  3  id: supercomponents
  4  slug: "/supercomponents"
  5  description: "`SuperComponent` lets you wrap a complete pipeline and use it like a single component. This is helpful when you want to simplify the interface of a complex pipeline, reuse it in different contexts, or expose only the necessary inputs and outputs."
  6  ---
  7  
  8  # SuperComponents
  9  
 10  `SuperComponent` lets you wrap a complete pipeline and use it like a single component. This is helpful when you want to simplify the interface of a complex pipeline, reuse it in different contexts, or expose only the necessary inputs and outputs.
 11  
 12  ## `@super_component` decorator (recommended)
 13  
 14  Haystack now provides a simple `@super_component` decorator for wrapping a pipeline as a component. All you need is to create a class with the decorator, and to include an `pipeline` attribute.
 15  
 16  With this decorator, the `to_dict` and `from_dict` serialization is optional, as is the input and output mapping.
 17  
 18  ### Example
 19  
 20  The custom HybridRetriever example SuperComponent below turns your query into embeddings, then runs both a BM25 search and an embedding-based search at the same time. It finally merges those two result sets and returns the combined documents.
 21  
 22  ```python
 23  ## pip install haystack-ai datasets "sentence-transformers>=3.0.0"
 24  
 25  from haystack import Document, Pipeline, super_component
 26  from haystack.components.joiners import DocumentJoiner
 27  from haystack.components.embedders import SentenceTransformersTextEmbedder
 28  from haystack.components.retrievers import (
 29      InMemoryBM25Retriever,
 30      InMemoryEmbeddingRetriever,
 31  )
 32  from haystack.document_stores.in_memory import InMemoryDocumentStore
 33  
 34  from datasets import load_dataset
 35  
 36  
 37  @super_component
 38  class HybridRetriever:
 39      def __init__(
 40          self,
 41          document_store: InMemoryDocumentStore,
 42          embedder_model: str = "BAAI/bge-small-en-v1.5",
 43      ):
 44          embedding_retriever = InMemoryEmbeddingRetriever(document_store)
 45          bm25_retriever = InMemoryBM25Retriever(document_store)
 46          text_embedder = SentenceTransformersTextEmbedder(embedder_model)
 47          document_joiner = DocumentJoiner()
 48  
 49          self.pipeline = Pipeline()
 50          self.pipeline.add_component("text_embedder", text_embedder)
 51          self.pipeline.add_component("embedding_retriever", embedding_retriever)
 52          self.pipeline.add_component("bm25_retriever", bm25_retriever)
 53          self.pipeline.add_component("document_joiner", document_joiner)
 54  
 55          self.pipeline.connect("text_embedder", "embedding_retriever")
 56          self.pipeline.connect("bm25_retriever", "document_joiner")
 57          self.pipeline.connect("embedding_retriever", "document_joiner")
 58  
 59  
 60  dataset = load_dataset("HaystackBot/medrag-pubmed-chunk-with-embeddings", split="train")
 61  docs = [
 62      Document(content=doc["contents"], embedding=doc["embedding"]) for doc in dataset
 63  ]
 64  document_store = InMemoryDocumentStore()
 65  document_store.write_documents(docs)
 66  
 67  query = "What treatments are available for chronic bronchitis?"
 68  
 69  result = HybridRetriever(document_store).run(text=query, query=query)
 70  print(result)
 71  ```
 72  
 73  ### Input Mapping
 74  
 75  You can optionally map the input names of your SuperComponent to the actual sockets inside the pipeline.
 76  
 77  ```python
 78  input_mapping = {"query": ["retriever.query", "prompt.query"]}
 79  ```
 80  
 81  ### Output Mapping
 82  
 83  You can also map the pipeline's output sockets that you want to expose to the SuperComponent's output names.
 84  
 85  ```python
 86  output_mapping = {"llm.replies": "replies"}
 87  ```
 88  
 89  If you don’t provide mappings, SuperComponent will try to auto-detect them. So, if multiple components have outputs with the same name, we recommend using `output_mapping` to avoid conflicts.
 90  
 91  ## SuperComponent class
 92  
 93  Haystack also gives you an option to inherit from SuperComponent class. This option requires `to_dict` and `from_dict` serialization, as well as the input and output mapping described above.
 94  
 95  ### Example
 96  
 97  Here is a simple example of initializing a `SuperComponent` with a pipeline:
 98  
 99  ```python
100  from haystack import Pipeline, SuperComponent
101  
102  with open("pipeline.yaml", "r") as file:
103      pipeline = Pipeline.load(file)
104  
105  super_component = SuperComponent(pipeline)
106  ```
107  
108  The example pipeline below retrieves relevant documents based on a user query, builds a custom prompt using those documents, then sends the prompt to an `OpenAIChatGenerator` to create an answer. The `SuperComponent` wraps the pipeline so it can be run with a simple input (`query`) and returns a clean output (`replies`).
109  
110  ```python
111  from haystack import Pipeline, SuperComponent
112  from haystack.components.generators.chat import OpenAIChatGenerator
113  from haystack.components.builders import ChatPromptBuilder
114  from haystack.components.retrievers import InMemoryBM25Retriever
115  from haystack.dataclasses.chat_message import ChatMessage
116  from haystack.document_stores.in_memory import InMemoryDocumentStore
117  from haystack.dataclasses import Document
118  
119  document_store = InMemoryDocumentStore()
120  documents = [
121      Document(content="Paris is the capital of France."),
122      Document(content="London is the capital of England."),
123  ]
124  document_store.write_documents(documents)
125  
126  prompt_template = [
127      ChatMessage.from_user(
128      '''
129      According to the following documents:
130      {% for document in documents %}
131      {{document.content}}
132      {% endfor %}
133      Answer the given question: {{query}}
134      Answer:
135      '''
136      )
137  ]
138  
139  prompt_builder = ChatPromptBuilder(template=prompt_template, required_variables="*")
140  
141  pipeline = Pipeline()
142  pipeline.add_component("retriever", InMemoryBM25Retriever(document_store=document_store))
143  pipeline.add_component("prompt_builder", prompt_builder)
144  pipeline.add_component("llm", OpenAIChatGenerator())
145  pipeline.connect("retriever.documents", "prompt_builder.documents")
146  pipeline.connect("prompt_builder.prompt", "llm.messages")
147  
148  ## Create a super component with simplified input/output mapping
149  wrapper = SuperComponent(
150      pipeline=pipeline,
151      input_mapping={
152          "query": ["retriever.query", "prompt_builder.query"],
153      },
154      output_mapping={
155          "llm.replies": "replies",
156          "retriever.documents": "documents"
157      }
158  )
159  
160  ## Run the pipeline with simplified interface
161  result = wrapper.run(query="What is the capital of France?")
162  print(result)
163  {'replies': [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>,
164   _content=[TextContent(text='The capital of France is Paris.')],...)
165  ```
166  
167  ## Type Checking and Static Code Analysis
168  
169  Creating SuperComponents using the @super_component decorator can induce type or linting errors. One way to avoid these issues is to add the exposed public methods to your SuperComponent. Here's an example:
170  
171  ```python
172  from typing import TYPE_CHECKING
173  
174  if TYPE_CHECKING:
175  
176      def run(self, *, documents: list[Document]) -> dict[str, list[Document]]: ...
177      def warm_up(self) -> None:  # noqa: D102
178          ...
179  ```
180  
181  ## Ready-Made SuperComponents
182  
183  You can see two implementations of SuperComponents already integrated in Haystack:
184  
185  - [DocumentPreprocessor](../../pipeline-components/preprocessors/documentpreprocessor.mdx)
186  - [MultiFileConverter](../../pipeline-components/converters/multifileconverter.mdx)
187  - [OpenSearchHybridRetriever](../../pipeline-components/retrievers/opensearchhybridretriever.mdx)