migration.mdx
1 --- 2 title: "Migration Guide" 3 id: migration 4 slug: "/migration" 5 description: "Learn how to make the move to Haystack 2.x from Haystack 1.x." 6 --- 7 8 # Migration Guide 9 10 Learn how to make the move to Haystack 2.x from Haystack 1.x. 11 12 This guide is designed for those with previous experience with Haystack and who are interested in understanding the differences between Haystack 1.x and Haystack 2.x. If you're new to Haystack, skip this page and proceed directly to Haystack 2.x [documentation](get-started.mdx). 13 14 ## Major Changes 15 16 Haystack 2.x represents a significant overhaul of Haystack 1.x, and it's important to note that certain key concepts outlined in this section don't have a direct correlation between the two versions. 17 18 ### Package Name 19 20 Haystack 1.x was distributed with a package called `farm-haystack`. To migrate your application, you must uninstall `farm-haystack` and install the new `haystack-ai` package for Haystack 2.x. 21 22 :::warning 23 Two versions of the project cannot coexist in the same Python environment. 24 25 One of the options is to remove both packages if they are installed in the same environment, followed by installing only one of them: 26 27 ```bash 28 pip uninstall -y farm-haystack haystack-ai 29 pip install haystack-ai 30 ``` 31 ::: 32 33 ### Nodes 34 35 While Haystack 2.x continues to rely on the `Pipeline` abstraction, the elements linked in a pipeline graph are now referred to as just _components_, replacing the terms _nodes_ and _pipeline components_ used in the previous versions. The [_Migrating Components_](#migrating-components) paragraph below outlines which component in Haystack 2.x can be used as a replacement for a specific 1.x node. 36 37 ### Pipelines 38 39 Pipelines continue to serve as the fundamental structure of all Haystack applications. While the concept of `Pipeline` abstraction remains consistent, Haystack 2.x introduces significant enhancements that address various limitations of its predecessor. For instance, the pipelines now support loops. Pipelines also offer greater flexibility in their input, which is no longer restricted to queries. The pipeline now allows to route the output of a component to multiple recipients. This increases flexibility, however, comes with notable differences in the pipeline definition process in Haystack 2.x compared to the previous version. 40 41 In Haystack 1.x, a pipeline was built by adding one node after the other. In the resulting pipeline graph, edges are automatically added to connect those nodes in the order they were added. 42 43 Building a pipeline in Haystack 2.x is a two-step process: 44 45 1. Initially, components are added to the pipeline without any specific order by calling the `add_component` method. 46 2. Subsequently, the components must be explicitly connected by calling the `connect` method to define the final graph. 47 48 To migrate an existing pipeline, the first step is to go through the nodes and identify their counterparts in Haystack 2.x (see the following section, [_Migrating Components_](#migrating-components), for guidance). If all the nodes can be replaced by corresponding components, they have to be added to the pipeline with `add_component` and explicitly connected with the appropriate calls to `connect`. Here is an example: 49 50 **Haystack 1.x** 51 52 ```python 53 pipeline = Pipeline() 54 55 node_1 = SomeNode() 56 node_2 = AnotherNode() 57 58 pipeline.add_node(node_1, name="Node_1", inputs=["Query"]) 59 pipeline.add_node(node_2, name="Node_2", inputs=["Node_1"]) 60 ``` 61 62 **Haystack 2.x** 63 64 ```python 65 pipeline = Pipeline() 66 67 component_1 = SomeComponent() 68 component_2 = AnotherComponent() 69 70 pipeline.add_component("Comp_1", component_1) 71 pipeline.add_component("Comp_2", component_2) 72 73 pipeline.connect("Comp_1", "Comp_2") 74 ``` 75 76 In case a specific replacement component is not available for one of your nodes, migrating the pipeline might still be possible by: 77 78 - Either [creating a custom component](../concepts/components/custom-components.mdx), or 79 - Changing the pipeline logic, as the last resort. 80 81 :::note 82 Check out the [Pipelines](../concepts/pipelines.mdx) section of our 2.x documentation to understand how new pipelines work more granularly. 83 84 ::: 85 86 ### Document Stores 87 88 The fundamental concept of Document Stores as gateways to access text and metadata stored in a database didn’t change in Haystack 2.x, but there are significant differences against Haystack 1.x. 89 90 In Haystack 1.x, Document Stores were a special type of node that you can use in two ways: 91 92 - As the last node in an indexing pipeline (such as a pipeline whose ultimate goal is storing data in a database). 93 - As a normal Python instance passed to a Retriever node. 94 95 In Haystack 2.x, the Document Store is not a component, so to migrate the two use cases above to version 2.x, you can respectively: 96 97 - Replace the Document Store at the end of the pipeline with a [`DocumentWriter`](../pipeline-components/writers/documentwriter.mdx) component. 98 - Identify the right Retriever component and create it passing the Document Store instance, same as it is in Haystack 1.x. 99 100 ### Retrievers 101 102 Haystack 1.x provided a set of nodes that filter relevant documents from different data sources according to a given query. Each of those nodes implements a certain retrieval algorithm and supports one or more types of Document Stores. For example, the `BM25Retriever` node in Haystack 1.x can work seamlessly with OpenSearch and Elasticsearch but not with Qdrant; the `EmbeddingRetriever`, on the contrary, can work with all the three databases. 103 104 In Haystack 2.x, the concept is flipped, and each Document Store provides one or more retriever components, depending on which retrieval methods the underlying vector database supports. For example, the `OpenSearchDocumentStore` comes with [two Retriever components](../document-stores/opensearch-document-store.mdx#supported-retrievers), one relying on BM25, and the other on vector similarity. 105 106 To migrate a 1.x retrieval pipeline to 2.x, the first step is to identify the Document Store being used and replace the Retriever node with the corresponding Retriever component from Haystack 2.x with the Document Store of choice. For example, a `BM25Retriever` node using Elasticsearch in a Haystack 1.x pipeline should be replaced with the [`ElasticsearchBM25Retriever`](../pipeline-components/retrievers/elasticsearchbm25retriever.mdx) component. 107 108 ### PromptNode 109 110 The `PromptNode` in Haystack 1.x represented the gateway to any Large Language Model (LLM) inference provider, whether it is locally available or remote. Based on the name of the model, Haystack infers the right provider to call and forward the query. 111 112 In Haystack 2.x, the task of using LLMs is assigned to [Generators](../pipeline-components/generators.mdx). These are a set of components that are highly specialized and tailored for each inference provider. 113 114 The first step when migrating a pipeline with a `PromptNode` is to identify the model provider used and to replace the node with two components: 115 116 - A Generator component for the model provider of choice, 117 - A `PromptBuilder` or `ChatPromptBuilder` component to build the prompt to be used. 118 119 The [_Migration examples_](#migration-examples) section below shows how to port a `PromptNode` using OpenAI with a prompt template to a corresponding Haystack 2.x pipeline using the `OpenAIGenerator` in conjunction with a `PromptBuilder` component. 120 121 ### Agents 122 123 The agentic approach facilitates the answering of questions that are significantly more complex than those typically addressed by extractive or generative question answering techniques. 124 125 Haystack 1.x provided Agents, enabling the use of LLMs in a loop. 126 127 Currently in Haystack 2.x, you can build Agents using three main elements in a pipeline: Chat Generators, ToolInvoker component, and Tools. A standalone Agent abstraction in Haystack 2.x is in an experimental phase. 128 129 :::note[Agents Documentation Page] 130 131 Take a look at our 2.x [Agents](../concepts/agents.mdx) documentation page for more information and detailed examples. 132 ::: 133 134 ### REST API 135 136 Haystack 1.x enabled the deployment of pipelines through a RESTful API over HTTP. This feature is facilitated by a separate application named `rest_api` which is exclusively accessible in the form of a [source code on GitHub](https://github.com/deepset-ai/haystack/tree/v1.x/rest_api). 137 138 Haystack 2.x takes the same RESTful approach, but in this case, the application to be used to deploy pipelines is called [Hayhooks](../development/hayhooks.mdx) and can be installed with `pip install hayhooks`. 139 140 At the moment, porting an existing Haystack 1.x deployment using the `rest_api` project to Hayhooks would require a complete rewrite of the application. 141 142 ## Dependencies 143 144 In order to minimize runtime errors, Haystack 1.x was distributed in a package that’s quite large, as it tries to set up the Python environment with as many dependencies as possible. 145 146 In contrast, Haystack 2.x strives for a more streamlined approach, offering a minimal set of dependencies right out of the box. It features a system that issues a warning when an additional dependency is required, thereby providing the user with the necessary instructions. 147 148 To make sure all the dependencies are satisfied when migrating a Haystack 1.x application to version 2.x, a good strategy is to run end-to-end tests and cover all the execution paths to ensure all the required dependencies are available in the target Python environment. 149 150 ## Migrating Components 151 152 This table outlines which component (or a group of components) can be used to replace a certain node when porting a Haystack 1.x pipeline to the latest 2.x version. It’s important to note that when a Haystack 2.x replacement is not available, this doesn’t necessarily mean we are planning this feature. 153 154 If you need help migrating a 1.x node without a 2.x counterpart, open an [issue](https://github.com/deepset-ai/haystack/issues) in Haystack GitHub repository. 155 156 ### Data Handling 157 158 | Haystack 1.x | Description | Haystack 2.x | 159 | -------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------ | 160 | Crawler | Scrapes text from websites. **Example usage:** To run searches on your website content. | Not Available | 161 | DocumentClassifier | Classifies documents by attaching metadata to them. **Example usage:** Labeling documents by their characteristic (for example, sentiment). | [TransformersZeroShotDocumentClassifier](../pipeline-components/classifiers/transformerszeroshotdocumentclassifier.mdx) | 162 | DocumentLanguageClassifier | Detects the language of the documents you pass to it and adds it to the document metadata. | [DocumentLanguageClassifier](../pipeline-components/classifiers/documentlanguageclassifier.mdx) | 163 | EntityExtractor | Extracts predefined entities out of a piece of text. **Example usage:** Named entity extraction (NER). | [NamedEntityExtractor](../pipeline-components/extractors/namedentityextractor.mdx) | 164 | FileClassifier | Distinguishes between text, PDF, Markdown, Docx, and HTML files. **Example usage:** Routing files to appropriate converters (for example, it routes PDF files to `PDFToTextConverter`). | [FileTypeRouter](../pipeline-components/routers/filetyperouter.mdx) | 165 | FileConverter | Cleans and splits documents in different formats. **Example usage:** In indexing pipelines, extracting text from a file and casting it into the Document class format. | [Converters](../pipeline-components/converters.mdx) | 166 | PreProcessor | Cleans and splits documents. **Example usage:** Normalizing white spaces, getting rid of headers and footers, splitting documents into smaller ones. | [PreProcessors](../pipeline-components/preprocessors.mdx) | 167 168 ### Semantic Search 169 170 | Haystack 1.x | Description | Haystack 2.x | 171 | ----------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------- | 172 | Ranker | Orders documents based on how relevant they are to the query. **Example usage:** In a query pipeline, after a keyword-based Retriever to rank the documents it returns. | [Rankers](../pipeline-components/rankers.mdx) | 173 | Reader | Finds an answer by selecting a text span in documents. **Example usage:** In a query pipeline when you want to know the location of the answer. | [ExtractiveReader](../pipeline-components/readers/extractivereader.mdx) | 174 | Retriever | Fetches relevant documents from the Document Store. **Example usage:** Coupling Retriever with a Reader in a query pipeline to speed up the search (the Reader only goes through the documents it gets from the Retriever). | [Retrievers](../pipeline-components/retrievers.mdx) | 175 | QuestionGenerator | When given a document, it generates questions this document can answer. **Example usage:** Auto-suggested questions in your search app. | Prompt [Builders](../pipeline-components/builders.mdx) with dedicated prompt, [Generators](../pipeline-components/generators.mdx) | 176 177 ### Prompts and LLMs 178 179 | Haystack 1.x | Description | Haystack 2.x | 180 | ------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------- | 181 | PromptNode | Uses large language models to perform various NLP tasks in a pipeline or on its own. **Example usage:** It's a very versatile component that can perform tasks like summarization, question answering, translation, and more. | Prompt [Builders](../pipeline-components/builders.mdx),[Generators](../pipeline-components/generators.mdx) | 182 183 ### Routing 184 185 | Haystack 1.x | Description | Haystack 2.x | 186 | --- | --- | --- | 187 | QueryClassifier | Categorizes queries. **Example usage:** Distinguishing between keyword queries and natural language questions and routing them to the Retrievers that can handle them best. | [TransformersZeroShotTextRouter](../pipeline-components/routers/transformerszeroshottextrouter.mdx) <br />[TransformersTextRouter](../pipeline-components/routers/transformerstextrouter.mdx) | 188 | RouteDocuments | Routes documents to different branches of your pipeline based on their content type or metadata field. **Example usage:** Routing table data to `TableReader` and text data to `TransfomersReader` for better handling. | [Routers](../pipeline-components/routers.mdx) | 189 190 ### Utility Components 191 192 | Haystack 1.x | Description | Haystack 2.x | 193 | ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------- | 194 | DocumentMerger | Concatenates multiple documents into a single one. **Example usage: **Merge the documents to summarize in a summarization pipeline. | Prompt [Builders](../pipeline-components/builders.mdx) | 195 | Docs2Answers | Converts Documents into Answers. **Example usage:** When using REST API for document retrieval. REST API expects Answer as output, you can use `Doc2Answer` as the last node to convert the retrieved documents to answers. | [AnswerBuilder](../pipeline-components/builders/answerbuilder.mdx) | 196 | JoinAnswers | Takes answers returned by multiple components and joins them in a single list of answers. **Example usage:** For running queries on different document types (for example, tables and text), where the documents are routed to different readers, and each reader returns a separate list of answers. | [AnswerJoiner](../pipeline-components/joiners/answerjoiner.mdx) | 197 | JoinDocuments | Takes documents returned by different components and joins them to form one list of documents. **Example usage:** In document retrieval pipelines, where there are different types of documents, each routed to a different Retriever. Each Retriever returns a separate list of documents, and you can join them into one list using `JoinDocuments`. | [DocumentJoiner](../pipeline-components/joiners/documentjoiner.mdx) | 198 | Shaper | Currently functions mostly as `PromptNode` helper making sure the `PromptNode` input or output is correct. **Example usage:** In a question answering pipeline using `PromptNode`, where the `PromptTemplate` expects questions as input, while Haystack pipelines use query. You can use Shaper to rename queries to questions. | Prompt [Builders](../pipeline-components/builders.mdx) | 199 | Summarizer | Creates an overview of a document. **Example usage:** To get a glimpse of the documents the Retriever is returning. | Prompt [Builders](../pipeline-components/builders.mdx) with dedicated prompt, [Generators](../pipeline-components/generators.mdx) | 200 | TransformersImageToText | Generates captions for images. **Example usage:** Automatically generate captions for a list of images that you can later use in your knowledge base. | [VertexAIImageQA](../pipeline-components/generators/vertexaiimageqa.mdx) | 201 | Translator | Translates text from one language into another. **Example usage:** Running searches on documents in other languages. | Prompt [Builders](../pipeline-components/builders.mdx) with dedicated prompt, [Generators](../pipeline-components/generators.mdx) | 202 203 ### Extras 204 205 | Haystack 1.x | Description | Haystack 2.x | 206 | ---------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------ | 207 | AnswerToSpeech | Converts text answers into speech answers. **Example usage:** Improving accessibility of your search system by providing a way to have the answer and its context read out loud. | [ElevenLabs](https://haystack.deepset.ai/integrations/elevenlabs) Integration | 208 | DocumentToSpeech | Converts text documents to speech documents. **Example usage:** Improving accessibility of a document retrieval pipeline by providing the option to read documents out loud. | [ElevenLabs](https://haystack.deepset.ai/integrations/elevenlabs) Integration | 209 210 ## Migration examples 211 212 :::note 213 This section might grow as we assist users with their use cases. 214 215 ::: 216 217 ### Indexing Pipeline 218 219 <details> 220 221 <summary>Haystack 1.x</summary> 222 223 ```python 224 from haystack.document_stores import InMemoryDocumentStore 225 from haystack.nodes.file_classifier import FileTypeClassifier 226 from haystack.nodes.file_converter import TextConverter 227 from haystack.nodes.preprocessor import PreProcessor 228 from haystack.pipelines import Pipeline 229 230 ## Initialize a DocumentStore 231 document_store = InMemoryDocumentStore() 232 233 ## Indexing Pipeline 234 indexing_pipeline = Pipeline() 235 236 ## Makes sure the file is a TXT file (FileTypeClassifier node) 237 classifier = FileTypeClassifier() 238 indexing_pipeline.add_node(classifier, name="Classifier", inputs=["File"]) 239 240 ## Converts a file into text and performs basic cleaning (TextConverter node) 241 text_converter = TextConverter(remove_numeric_tables=True) 242 indexing_pipeline.add_node( 243 text_converter, 244 name="Text_converter", 245 inputs=["Classifier.output_1"], 246 ) 247 248 ## Pre-processes the text by performing splits and adding metadata to the text (Preprocessor node) 249 preprocessor = PreProcessor( 250 clean_whitespace=True, 251 clean_empty_lines=True, 252 split_length=100, 253 split_overlap=50, 254 split_respect_sentence_boundary=True, 255 ) 256 indexing_pipeline.add_node(preprocessor, name="Preprocessor", inputs=["Text_converter"]) 257 258 ## - Writes the resulting documents into the document store 259 indexing_pipeline.add_node( 260 document_store, 261 name="Document_Store", 262 inputs=["Preprocessor"], 263 ) 264 265 ## Then we run it with the documents and their metadata as input 266 result = indexing_pipeline.run(file_paths=file_paths, meta=files_metadata) 267 ``` 268 269 </details> 270 271 <details> 272 273 <summary>Haystack 2.x</summary> 274 275 ```python 276 from haystack import Pipeline 277 from haystack.components.routers import FileTypeRouter 278 from haystack.document_stores.in_memory import InMemoryDocumentStore 279 from haystack.components.converters import TextFileToDocument 280 from haystack.components.preprocessors import DocumentCleaner, DocumentSplitter 281 from haystack.components.writers import DocumentWriter 282 283 ## Initialize a DocumentStore 284 document_store = InMemoryDocumentStore() 285 286 ## Indexing Pipeline 287 indexing_pipeline = Pipeline() 288 289 ## Makes sure the file is a TXT file (FileTypeRouter component) 290 classifier = FileTypeRouter(mime_types=["text/plain"]) 291 indexing_pipeline.add_component("file_type_router", classifier) 292 293 ## Converts a file into a Document (TextFileToDocument component) 294 text_converter = TextFileToDocument() 295 indexing_pipeline.add_component("text_converter", text_converter) 296 297 ## Performs basic cleaning (DocumentCleaner component) 298 cleaner = DocumentCleaner( 299 remove_empty_lines=True, 300 remove_extra_whitespaces=True, 301 ) 302 indexing_pipeline.add_component("cleaner", cleaner) 303 304 ## Pre-processes the text by performing splits and adding metadata to the text (DocumentSplitter component) 305 preprocessor = DocumentSplitter(split_by="passage", split_length=100, split_overlap=50) 306 indexing_pipeline.add_component("preprocessor", preprocessor) 307 308 ## - Writes the resulting documents into the document store 309 indexing_pipeline.add_component("writer", DocumentWriter(document_store)) 310 311 ## Connect all the components 312 indexing_pipeline.connect("file_type_router.text/plain", "text_converter") 313 indexing_pipeline.connect("text_converter", "cleaner") 314 indexing_pipeline.connect("cleaner", "preprocessor") 315 indexing_pipeline.connect("preprocessor", "writer") 316 317 ## Then we run it with the documents and their metadata as input 318 result = indexing_pipeline.run({"file_type_router": {"sources": file_paths}}) 319 ``` 320 321 </details> 322 323 ### Query Pipeline 324 325 <details> 326 327 <summary>Haystack 1.x</summary> 328 329 ```python 330 from haystack.document_stores import InMemoryDocumentStore 331 from haystack.pipelines import ExtractiveQAPipeline 332 from haystack import Document 333 from haystack.nodes import BM25Retriever 334 from haystack.nodes import FARMReader 335 336 document_store = InMemoryDocumentStore(use_bm25=True) 337 document_store.write_documents( 338 [ 339 Document(content="Paris is the capital of France."), 340 Document(content="Berlin is the capital of Germany."), 341 Document(content="Rome is the capital of Italy."), 342 Document(content="Madrid is the capital of Spain."), 343 ], 344 ) 345 346 retriever = BM25Retriever(document_store=document_store) 347 reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2") 348 extractive_qa_pipeline = ExtractiveQAPipeline(reader, retriever) 349 350 query = "What is the capital of France?" 351 result = extractive_qa_pipeline.run( 352 query=query, 353 params={"Retriever": {"top_k": 10}, "Reader": {"top_k": 5}}, 354 ) 355 ``` 356 357 </details> 358 359 <details> 360 361 <summary>Haystack 2.x</summary> 362 363 ```python 364 from haystack.document_stores.in_memory import InMemoryDocumentStore 365 from haystack import Document, Pipeline 366 from haystack.components.retrievers.in_memory import InMemoryBM25Retriever 367 from haystack.components.readers import ExtractiveReader 368 369 document_store = InMemoryDocumentStore() 370 document_store.write_documents( 371 [ 372 Document(content="Paris is the capital of France."), 373 Document(content="Berlin is the capital of Germany."), 374 Document(content="Rome is the capital of Italy."), 375 Document(content="Madrid is the capital of Spain."), 376 ], 377 ) 378 379 retriever = InMemoryBM25Retriever(document_store) 380 reader = ExtractiveReader(model="deepset/roberta-base-squad2") 381 extractive_qa_pipeline = Pipeline() 382 extractive_qa_pipeline.add_component("retriever", retriever) 383 extractive_qa_pipeline.add_component("reader", reader) 384 extractive_qa_pipeline.connect("retriever", "reader") 385 386 query = "What is the capital of France?" 387 result = extractive_qa_pipeline.run( 388 data={ 389 "retriever": {"query": query, "top_k": 3}, 390 "reader": {"query": query, "top_k": 2}, 391 }, 392 ) 393 ``` 394 395 </details> 396 397 ### RAG Pipeline 398 399 <details> 400 401 <summary>Haystack 1.x</summary> 402 403 ```python 404 from datasets import load_dataset 405 406 from haystack.pipelines import Pipeline 407 from haystack.document_stores import InMemoryDocumentStore 408 from haystack.nodes import EmbeddingRetriever, PromptNode, PromptTemplate, AnswerParser 409 410 document_store = InMemoryDocumentStore(embedding_dim=384) 411 dataset = load_dataset("bilgeyucel/seven-wonders", split="train") 412 document_store.write_documents(dataset) 413 retriever = EmbeddingRetriever( 414 embedding_model="sentence-transformers/all-MiniLM-L6-v2", 415 document_store=document_store, 416 top_k=2, 417 ) 418 document_store.update_embeddings(retriever) 419 420 rag_prompt = PromptTemplate( 421 prompt="""Synthesize a comprehensive answer from the following text for the given question. 422 Provide a clear and concise response that summarizes the key points and information presented in the text. 423 Your answer should be in your own words and be no longer than 50 words. 424 \n\n Related text: {join(documents)} \n\n Question: {query} \n\n Answer:""", 425 output_parser=AnswerParser(), 426 ) 427 428 prompt_node = PromptNode( 429 model_name_or_path="gpt-3.5-turbo", 430 api_key=OPENAI_API_KEY, 431 default_prompt_template=rag_prompt, 432 ) 433 434 pipe = Pipeline() 435 pipe.add_node(component=retriever, name="retriever", inputs=["Query"]) 436 pipe.add_node(component=prompt_node, name="prompt_node", inputs=["retriever"]) 437 438 output = pipe.run(query="What does Rhodes Statue look like?") 439 ``` 440 441 </details> 442 443 <details> 444 445 <summary>Haystack 2.x</summary> 446 447 ```python 448 from datasets import load_dataset 449 450 from haystack import Document, Pipeline 451 from haystack.document_stores.in_memory import InMemoryDocumentStore 452 from haystack.components.builders import PromptBuilder 453 from haystack.components.generators import OpenAIGenerator 454 from haystack.components.embedders import SentenceTransformersDocumentEmbedder 455 from haystack.components.embedders import SentenceTransformersTextEmbedder 456 from haystack.components.retrievers import InMemoryEmbeddingRetriever 457 458 document_store = InMemoryDocumentStore() 459 dataset = load_dataset("bilgeyucel/seven-wonders", split="train") 460 embedder = SentenceTransformersDocumentEmbedder( 461 "sentence-transformers/all-MiniLM-L6-v2", 462 ) 463 embedder.warm_up() 464 output = embedder.run([Document(**ds) for ds in dataset]) 465 document_store.write_documents(output.get("documents")) 466 467 template = """ 468 Given the following information, answer the question. 469 470 Context: 471 {% for document in documents %} 472 {{ document.content }} 473 {% endfor %} 474 475 Question: {{question}} 476 Answer: 477 """ 478 prompt_builder = PromptBuilder(template=template) 479 480 retriever = InMemoryEmbeddingRetriever(document_store=document_store, top_k=2) 481 generator = OpenAIGenerator(model="gpt-3.5-turbo") 482 query_embedder = SentenceTransformersTextEmbedder( 483 model="sentence-transformers/all-MiniLM-L6-v2", 484 ) 485 486 basic_rag_pipeline = Pipeline() 487 basic_rag_pipeline.add_component("text_embedder", query_embedder) 488 basic_rag_pipeline.add_component("retriever", retriever) 489 basic_rag_pipeline.add_component("prompt_builder", prompt_builder) 490 basic_rag_pipeline.add_component("llm", generator) 491 492 basic_rag_pipeline.connect("text_embedder.embedding", "retriever.query_embedding") 493 basic_rag_pipeline.connect("retriever", "prompt_builder.documents") 494 basic_rag_pipeline.connect("prompt_builder", "llm") 495 496 query = "What does Rhodes Statue look like?" 497 output = basic_rag_pipeline.run( 498 {"text_embedder": {"text": query}, "prompt_builder": {"question": query}}, 499 ) 500 ``` 501 502 </details> 503 504 ## Documentation and Tutorials for Haystack 1.x 505 506 You can access old tutorials in the [GitHub history](https://github.com/deepset-ai/haystack-tutorials/tree/5917718cbfbb61410aab4121ee6fe754040a5dc7) and download the Haystack 1.x documentation as a [ZIP file](https://core-engineering.s3.eu-central-1.amazonaws.com/public/docs/haystack-v1-docs.zip). 507 508 The ZIP file contains documentation for all minor releases from version 1.0 to 1.26. 509 510 To download documentation for a specific release, replace the version number in the following URL: `https://core-engineering.s3.eu-central-1.amazonaws.com/public/docs/v1.26.zip`.