supabase.md
1 --- 2 title: "Supabase" 3 id: integrations-supabase 4 description: "Supabase integration for Haystack" 5 slug: "/integrations-supabase" 6 --- 7 8 9 ## haystack_integrations.components.retrievers.supabase.embedding_retriever 10 11 ### SupabasePgvectorEmbeddingRetriever 12 13 Bases: <code>PgvectorEmbeddingRetriever</code> 14 15 Retrieves documents from the `SupabasePgvectorDocumentStore`, based on their dense embeddings. 16 17 This is a thin wrapper around `PgvectorEmbeddingRetriever`, adapted for use with 18 `SupabasePgvectorDocumentStore`. 19 20 Example usage: 21 22 # Set an environment variable `SUPABASE_DB_URL` with the connection string to your Supabase database. 23 24 ```bash 25 export SUPABASE_DB_URL=postgresql://postgres:postgres@localhost:5432/postgres 26 ``` 27 28 ```python 29 from haystack import Document, Pipeline 30 from haystack.document_stores.types.policy import DuplicatePolicy 31 from haystack.components.embedders import SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder 32 33 from haystack_integrations.document_stores.supabase import SupabasePgvectorDocumentStore 34 from haystack_integrations.components.retrievers.supabase import SupabasePgvectorEmbeddingRetriever 35 36 document_store = SupabasePgvectorDocumentStore( 37 embedding_dimension=768, 38 vector_function="cosine_similarity", 39 recreate_table=True, 40 ) 41 42 documents = [Document(content="There are over 7,000 languages spoken around the world today."), 43 Document(content="Elephants have been observed to behave in a way that indicates..."), 44 Document(content="In certain places, you can witness the phenomenon of bioluminescent waves.")] 45 46 document_embedder = SentenceTransformersDocumentEmbedder() 47 document_embedder.warm_up() 48 documents_with_embeddings = document_embedder.run(documents) 49 document_store.write_documents(documents_with_embeddings.get("documents"), policy=DuplicatePolicy.OVERWRITE) 50 51 query_pipeline = Pipeline() 52 query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder()) 53 query_pipeline.add_component("retriever", SupabasePgvectorEmbeddingRetriever(document_store=document_store)) 54 query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding") 55 56 query = "How many languages are there?" 57 58 res = query_pipeline.run({"text_embedder": {"text": query}}) 59 print(res['retriever']['documents'][0].content) 60 # >> "There are over 7,000 languages spoken around the world today." 61 ``` 62 63 #### __init__ 64 65 ```python 66 __init__( 67 *, 68 document_store: SupabasePgvectorDocumentStore, 69 filters: dict[str, Any] | None = None, 70 top_k: int = 10, 71 vector_function: ( 72 Literal["cosine_similarity", "inner_product", "l2_distance"] | None 73 ) = None, 74 filter_policy: str | FilterPolicy = FilterPolicy.REPLACE 75 ) -> None 76 ``` 77 78 Initialize the SupabasePgvectorEmbeddingRetriever. 79 80 **Parameters:** 81 82 - **document_store** (<code>SupabasePgvectorDocumentStore</code>) – An instance of `SupabasePgvectorDocumentStore`. 83 - **filters** (<code>dict\[str, Any\] | None</code>) – Filters applied to the retrieved Documents. 84 - **top_k** (<code>int</code>) – Maximum number of Documents to return. 85 - **vector_function** (<code>Literal['cosine_similarity', 'inner_product', 'l2_distance'] | None</code>) – The similarity function to use when searching for similar embeddings. 86 Defaults to the one set in the `document_store` instance. 87 `"cosine_similarity"` and `"inner_product"` are similarity functions and 88 higher scores indicate greater similarity between the documents. 89 `"l2_distance"` returns the straight-line distance between vectors, 90 and the most similar documents are the ones with the smallest score. 91 **Important**: if the document store is using the `"hnsw"` search strategy, the vector function 92 should match the one utilized during index creation to take advantage of the index. 93 - **filter_policy** (<code>str | FilterPolicy</code>) – Policy to determine how filters are applied. 94 95 **Raises:** 96 97 - <code>ValueError</code> – If `document_store` is not an instance of `SupabasePgvectorDocumentStore` or if 98 `vector_function` is not one of the valid options. 99 100 #### to_dict 101 102 ```python 103 to_dict() -> dict[str, Any] 104 ``` 105 106 Serializes the component to a dictionary. 107 108 **Returns:** 109 110 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 111 112 #### from_dict 113 114 ```python 115 from_dict(data: dict[str, Any]) -> SupabasePgvectorEmbeddingRetriever 116 ``` 117 118 Deserializes the component from a dictionary. 119 120 **Parameters:** 121 122 - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from. 123 124 **Returns:** 125 126 - <code>SupabasePgvectorEmbeddingRetriever</code> – Deserialized component. 127 128 ## haystack_integrations.components.retrievers.supabase.keyword_retriever 129 130 ### SupabasePgvectorKeywordRetriever 131 132 Bases: <code>PgvectorKeywordRetriever</code> 133 134 Retrieves documents from the `SupabasePgvectorDocumentStore`, based on keywords. 135 136 This is a thin wrapper around `PgvectorKeywordRetriever`, adapted for use with 137 `SupabasePgvectorDocumentStore`. 138 139 To rank the documents, the `ts_rank_cd` function of PostgreSQL is used. 140 It considers how often the query terms appear in the document, how close together the terms are in the document, 141 and how important is the part of the document where they occur. 142 143 Example usage: 144 145 # Set an environment variable `SUPABASE_DB_URL` with the connection string to your Supabase database. 146 147 ```bash 148 export SUPABASE_DB_URL=postgresql://postgres:postgres@localhost:5432/postgres 149 ``` 150 151 ```python 152 from haystack import Document, Pipeline 153 from haystack.document_stores.types.policy import DuplicatePolicy 154 155 from haystack_integrations.document_stores.supabase import SupabasePgvectorDocumentStore 156 from haystack_integrations.components.retrievers.supabase import SupabasePgvectorKeywordRetriever 157 158 document_store = SupabasePgvectorDocumentStore( 159 embedding_dimension=768, 160 recreate_table=True, 161 ) 162 163 documents = [Document(content="There are over 7,000 languages spoken around the world today."), 164 Document(content="Elephants have been observed to behave in a way that indicates..."), 165 Document(content="In certain places, you can witness the phenomenon of bioluminescent waves.")] 166 167 document_store.write_documents(documents, policy=DuplicatePolicy.OVERWRITE) 168 retriever = SupabasePgvectorKeywordRetriever(document_store=document_store) 169 result = retriever.run(query="languages") 170 171 print(result['documents'][0].content) 172 # >> "There are over 7,000 languages spoken around the world today." 173 ``` 174 175 #### __init__ 176 177 ```python 178 __init__( 179 *, 180 document_store: SupabasePgvectorDocumentStore, 181 filters: dict[str, Any] | None = None, 182 top_k: int = 10, 183 filter_policy: str | FilterPolicy = FilterPolicy.REPLACE 184 ) -> None 185 ``` 186 187 Initialize the SupabasePgvectorKeywordRetriever. 188 189 **Parameters:** 190 191 - **document_store** (<code>SupabasePgvectorDocumentStore</code>) – An instance of `SupabasePgvectorDocumentStore`. 192 - **filters** (<code>dict\[str, Any\] | None</code>) – Filters applied to the retrieved Documents. 193 - **top_k** (<code>int</code>) – Maximum number of Documents to return. 194 - **filter_policy** (<code>str | FilterPolicy</code>) – Policy to determine how filters are applied. 195 196 **Raises:** 197 198 - <code>ValueError</code> – If `document_store` is not an instance of `SupabasePgvectorDocumentStore`. 199 200 #### to_dict 201 202 ```python 203 to_dict() -> dict[str, Any] 204 ``` 205 206 Serializes the component to a dictionary. 207 208 **Returns:** 209 210 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 211 212 #### from_dict 213 214 ```python 215 from_dict(data: dict[str, Any]) -> SupabasePgvectorKeywordRetriever 216 ``` 217 218 Deserializes the component from a dictionary. 219 220 **Parameters:** 221 222 - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from. 223 224 **Returns:** 225 226 - <code>SupabasePgvectorKeywordRetriever</code> – Deserialized component. 227 228 ## haystack_integrations.document_stores.supabase.document_store 229 230 ### SupabasePgvectorDocumentStore 231 232 Bases: <code>PgvectorDocumentStore</code> 233 234 A Document Store for Supabase, using PostgreSQL with the pgvector extension. 235 236 It should be used with Supabase installed. 237 238 This is a thin wrapper around `PgvectorDocumentStore` with Supabase-specific defaults: 239 240 - Reads the connection string from the `SUPABASE_DB_URL` environment variable. 241 - Defaults `create_extension` to `False` since pgvector is pre-installed on Supabase. 242 243 **Connection notes:** Supabase offers two pooler ports — transaction mode (6543) and session mode (5432). 244 For best compatibility with pgvector operations, use session mode (port 5432) or a direct connection. 245 246 Example usage: 247 248 # Set an environment variable `SUPABASE_DB_URL` with the connection string to your Supabase database. 249 250 ```bash 251 export SUPABASE_DB_URL=postgresql://postgres:postgres@localhost:5432/postgres 252 ``` 253 254 ```python 255 from haystack_integrations.document_stores.supabase import SupabasePgvectorDocumentStore 256 257 document_store = SupabasePgvectorDocumentStore( 258 embedding_dimension=768, 259 vector_function="cosine_similarity", 260 recreate_table=True, 261 ) 262 ``` 263 264 #### __init__ 265 266 ```python 267 __init__( 268 *, 269 connection_string: Secret = Secret.from_env_var("SUPABASE_DB_URL"), 270 create_extension: bool = False, 271 schema_name: str = "public", 272 table_name: str = "haystack_documents", 273 language: str = "english", 274 embedding_dimension: int = 768, 275 vector_type: Literal["vector", "halfvec"] = "vector", 276 vector_function: Literal[ 277 "cosine_similarity", "inner_product", "l2_distance" 278 ] = "cosine_similarity", 279 recreate_table: bool = False, 280 search_strategy: Literal[ 281 "exact_nearest_neighbor", "hnsw" 282 ] = "exact_nearest_neighbor", 283 hnsw_recreate_index_if_exists: bool = False, 284 hnsw_index_creation_kwargs: dict[str, int] | None = None, 285 hnsw_index_name: str = "haystack_hnsw_index", 286 hnsw_ef_search: int | None = None, 287 keyword_index_name: str = "haystack_keyword_index" 288 ) -> None 289 ``` 290 291 Creates a new SupabasePgvectorDocumentStore instance. 292 293 **Parameters:** 294 295 - **connection_string** (<code>Secret</code>) – The connection string for the Supabase PostgreSQL database, defined as an 296 environment variable. Default: `SUPABASE_DB_URL`. Format: 297 `postgresql://postgres.[project-ref]:[password]@aws-0-[region].pooler.supabase.com:5432/postgres` 298 - **create_extension** (<code>bool</code>) – Whether to create the pgvector extension if it doesn't exist. 299 Defaults to `False` since Supabase has pgvector pre-installed. 300 - **schema_name** (<code>str</code>) – The name of the schema the table is created in. 301 - **table_name** (<code>str</code>) – The name of the table to use to store Haystack documents. 302 - **language** (<code>str</code>) – The language to be used to parse query and document content in keyword retrieval. 303 - **embedding_dimension** (<code>int</code>) – The dimension of the embedding. 304 - **vector_type** (<code>Literal['vector', 'halfvec']</code>) – The type of vector used for embedding storage. `"vector"` or `"halfvec"`. 305 - **vector_function** (<code>Literal['cosine_similarity', 'inner_product', 'l2_distance']</code>) – The similarity function to use when searching for similar embeddings. 306 - **recreate_table** (<code>bool</code>) – Whether to recreate the table if it already exists. 307 - **search_strategy** (<code>Literal['exact_nearest_neighbor', 'hnsw']</code>) – The search strategy to use: `"exact_nearest_neighbor"` or `"hnsw"`. 308 - **hnsw_recreate_index_if_exists** (<code>bool</code>) – Whether to recreate the HNSW index if it already exists. 309 - **hnsw_index_creation_kwargs** (<code>dict\[str, int\] | None</code>) – Additional keyword arguments for HNSW index creation. 310 - **hnsw_index_name** (<code>str</code>) – Index name for the HNSW index. 311 - **hnsw_ef_search** (<code>int | None</code>) – The `ef_search` parameter to use at query time for HNSW. 312 - **keyword_index_name** (<code>str</code>) – Index name for the Keyword index. 313 314 #### to_dict 315 316 ```python 317 to_dict() -> dict[str, Any] 318 ``` 319 320 Serializes the component to a dictionary. 321 322 **Returns:** 323 324 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 325 326 #### from_dict 327 328 ```python 329 from_dict(data: dict[str, Any]) -> SupabasePgvectorDocumentStore 330 ``` 331 332 Deserializes the component from a dictionary. 333 334 **Parameters:** 335 336 - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from. 337 338 **Returns:** 339 340 - <code>SupabasePgvectorDocumentStore</code> – Deserialized component.