weaviatebm25retriever.mdx
1 --- 2 title: "WeaviateBM25Retriever" 3 id: weaviatebm25retriever 4 slug: "/weaviatebm25retriever" 5 description: "This is a keyword-based Retriever that fetches Documents matching a query from the Weaviate Document Store." 6 --- 7 8 # WeaviateBM25Retriever 9 10 This is a keyword-based Retriever that fetches Documents matching a query from the Weaviate Document Store. 11 12 <div className="key-value-table"> 13 14 | | | 15 | --- | --- | 16 | **Most common position in a pipeline** | 1. Before a [`PromptBuilder`](../builders/promptbuilder.mdx) in a RAG pipeline 2. The last component in the semantic search pipeline 3. Before an [`ExtractiveReader`](../readers/extractivereader.mdx) in an extractive QA pipeline | 17 | **Mandatory init variables** | `document_store`: An instance of a [WeaviateDocumentStore](../../document-stores/weaviatedocumentstore.mdx) | 18 | **Mandatory run variables** | `query`: A string | 19 | **Output variables** | `documents`: A list of documents (matching the query) | 20 | **API reference** | [Weaviate](/reference/integrations-weaviate) | 21 | **GitHub link** | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/weaviate | 22 23 </div> 24 25 ## Overview 26 27 `WeaviateBM25Retriever` is a keyword-based Retriever that fetches Documents matching a query from [`WeaviateDocumentStore`](../../document-stores/weaviatedocumentstore.mdx). It determines the similarity between Documents and the query based on the BM25 algorithm, which computes a weighted word overlap between the 28 two strings. 29 30 Since the `WeaviateBM25Retriever` matches strings based on word overlap, it’s often used to find exact matches to names of persons or products, IDs, or well-defined error messages. The BM25 algorithm is very lightweight and simple. Beating it with more complex embedding-based approaches on out-of-domain data can be hard. 31 32 If you want a semantic match between a query and documents, use the [`WeaviateEmbeddingRetriever`](weaviateembeddingretriever.mdx), which uses vectors created by embedding models to retrieve relevant information. 33 34 ### Parameters 35 36 In addition to the `query`, the `WeaviateBM25Retriever` accepts other optional parameters, including `top_k` (the maximum number of Documents to retrieve) and `filters` to narrow down the search space. 37 38 ### Usage 39 40 ### Installation 41 42 To start using Weaviate with Haystack, install the package with: 43 44 ```shell 45 pip install weaviate-haystack 46 ``` 47 48 #### On its own 49 50 This Retriever needs an instance of `WeaviateDocumentStore` and indexed Documents to run. 51 52 ```python 53 from haystack_integrations.document_stores.weaviate.document_store import ( 54 WeaviateDocumentStore, 55 ) 56 from haystack_integrations.components.retrievers.weaviate import WeaviateBM25Retriever 57 58 document_store = WeaviateDocumentStore(url="http://localhost:8080") 59 60 retriever = WeaviateBM25Retriever(document_store=document_store) 61 62 retriever.run(query="How to make a pizza", top_k=3) 63 ``` 64 65 #### In a Pipeline 66 67 ```python 68 from haystack_integrations.document_stores.weaviate.document_store import ( 69 WeaviateDocumentStore, 70 ) 71 from haystack_integrations.components.retrievers.weaviate import ( 72 WeaviateBM25Retriever, 73 ) 74 75 from haystack import Document 76 from haystack import Pipeline 77 from haystack.components.builders.answer_builder import AnswerBuilder 78 from haystack.components.builders.prompt_builder import PromptBuilder 79 from haystack.components.generators import OpenAIGenerator 80 from haystack.document_stores.types import DuplicatePolicy 81 82 ## Create a RAG query pipeline 83 prompt_template = """ 84 Given these documents, answer the question.\nDocuments: 85 {% for doc in documents %} 86 {{ doc.content }} 87 {% endfor %} 88 89 \nQuestion: {{question}} 90 \nAnswer: 91 """ 92 93 document_store = WeaviateDocumentStore(url="http://localhost:8080") 94 95 ## Add Documents 96 documents = [ 97 Document(content="There are over 7,000 languages spoken around the world today."), 98 Document( 99 content="Elephants have been observed to behave in a way that indicates a high level of self-awareness, such as recognizing themselves in mirrors.", 100 ), 101 Document( 102 content="In certain parts of the world, like the Maldives, Puerto Rico, and San Diego, you can witness the phenomenon of bioluminescent waves.", 103 ), 104 ] 105 106 ## DuplicatePolicy.SKIP param is optional, but useful to run the script multiple times without throwing errors 107 document_store.write_documents(documents=documents, policy=DuplicatePolicy.SKIP) 108 109 rag_pipeline = Pipeline() 110 rag_pipeline.add_component( 111 name="retriever", 112 instance=WeaviateBM25Retriever(document_store=document_store), 113 ) 114 rag_pipeline.add_component( 115 instance=PromptBuilder(template=prompt_template), 116 name="prompt_builder", 117 ) 118 rag_pipeline.add_component(instance=OpenAIGenerator(), name="llm") 119 rag_pipeline.add_component(instance=AnswerBuilder(), name="answer_builder") 120 rag_pipeline.connect("retriever", "prompt_builder.documents") 121 rag_pipeline.connect("prompt_builder", "llm") 122 rag_pipeline.connect("llm.replies", "answer_builder.replies") 123 rag_pipeline.connect("llm.metadata", "answer_builder.metadata") 124 rag_pipeline.connect("retriever", "answer_builder.documents") 125 126 question = "How many languages are spoken around the world today?" 127 result = rag_pipeline.run( 128 { 129 "retriever": {"query": question}, 130 "prompt_builder": {"question": question}, 131 "answer_builder": {"query": question}, 132 }, 133 ) 134 print(result["answer_builder"]["answers"][0]) 135 ```