Cradicle Explorer

/ docs-website / reference_versioned_docs / version-2.21 / integrations-api / supabase.md
supabase.md
  1  ---
  2  title: "Supabase"
  3  id: integrations-supabase
  4  description: "Supabase integration for Haystack"
  5  slug: "/integrations-supabase"
  6  ---
  7  
  8  
  9  ## haystack_integrations.components.retrievers.supabase.embedding_retriever
 10  
 11  ### SupabasePgvectorEmbeddingRetriever
 12  
 13  Bases: <code>PgvectorEmbeddingRetriever</code>
 14  
 15  Retrieves documents from the `SupabasePgvectorDocumentStore`, based on their dense embeddings.
 16  
 17  This is a thin wrapper around `PgvectorEmbeddingRetriever`, adapted for use with
 18  `SupabasePgvectorDocumentStore`.
 19  
 20  Example usage:
 21  
 22  # Set an environment variable `SUPABASE_DB_URL` with the connection string to your Supabase database.
 23  
 24  ```bash
 25  export SUPABASE_DB_URL=postgresql://postgres:postgres@localhost:5432/postgres
 26  ```
 27  
 28  ```python
 29  from haystack import Document, Pipeline
 30  from haystack.document_stores.types.policy import DuplicatePolicy
 31  from haystack.components.embedders import SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder
 32  
 33  from haystack_integrations.document_stores.supabase import SupabasePgvectorDocumentStore
 34  from haystack_integrations.components.retrievers.supabase import SupabasePgvectorEmbeddingRetriever
 35  
 36  document_store = SupabasePgvectorDocumentStore(
 37      embedding_dimension=768,
 38      vector_function="cosine_similarity",
 39      recreate_table=True,
 40  )
 41  
 42  documents = [Document(content="There are over 7,000 languages spoken around the world today."),
 43               Document(content="Elephants have been observed to behave in a way that indicates..."),
 44               Document(content="In certain places, you can witness the phenomenon of bioluminescent waves.")]
 45  
 46  document_embedder = SentenceTransformersDocumentEmbedder()
 47  document_embedder.warm_up()
 48  documents_with_embeddings = document_embedder.run(documents)
 49  document_store.write_documents(documents_with_embeddings.get("documents"), policy=DuplicatePolicy.OVERWRITE)
 50  
 51  query_pipeline = Pipeline()
 52  query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder())
 53  query_pipeline.add_component("retriever", SupabasePgvectorEmbeddingRetriever(document_store=document_store))
 54  query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
 55  
 56  query = "How many languages are there?"
 57  
 58  res = query_pipeline.run({"text_embedder": {"text": query}})
 59  print(res['retriever']['documents'][0].content)
 60  # >> "There are over 7,000 languages spoken around the world today."
 61  ```
 62  
 63  #### __init__
 64  
 65  ```python
 66  __init__(
 67      *,
 68      document_store: SupabasePgvectorDocumentStore,
 69      filters: dict[str, Any] | None = None,
 70      top_k: int = 10,
 71      vector_function: (
 72          Literal["cosine_similarity", "inner_product", "l2_distance"] | None
 73      ) = None,
 74      filter_policy: str | FilterPolicy = FilterPolicy.REPLACE
 75  ) -> None
 76  ```
 77  
 78  Initialize the SupabasePgvectorEmbeddingRetriever.
 79  
 80  **Parameters:**
 81  
 82  - **document_store** (<code>SupabasePgvectorDocumentStore</code>) – An instance of `SupabasePgvectorDocumentStore`.
 83  - **filters** (<code>dict\[str, Any\] | None</code>) – Filters applied to the retrieved Documents.
 84  - **top_k** (<code>int</code>) – Maximum number of Documents to return.
 85  - **vector_function** (<code>Literal['cosine_similarity', 'inner_product', 'l2_distance'] | None</code>) – The similarity function to use when searching for similar embeddings.
 86    Defaults to the one set in the `document_store` instance.
 87    `"cosine_similarity"` and `"inner_product"` are similarity functions and
 88    higher scores indicate greater similarity between the documents.
 89    `"l2_distance"` returns the straight-line distance between vectors,
 90    and the most similar documents are the ones with the smallest score.
 91    **Important**: if the document store is using the `"hnsw"` search strategy, the vector function
 92    should match the one utilized during index creation to take advantage of the index.
 93  - **filter_policy** (<code>str | FilterPolicy</code>) – Policy to determine how filters are applied.
 94  
 95  **Raises:**
 96  
 97  - <code>ValueError</code> – If `document_store` is not an instance of `SupabasePgvectorDocumentStore` or if
 98    `vector_function` is not one of the valid options.
 99  
100  #### to_dict
101  
102  ```python
103  to_dict() -> dict[str, Any]
104  ```
105  
106  Serializes the component to a dictionary.
107  
108  **Returns:**
109  
110  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
111  
112  #### from_dict
113  
114  ```python
115  from_dict(data: dict[str, Any]) -> SupabasePgvectorEmbeddingRetriever
116  ```
117  
118  Deserializes the component from a dictionary.
119  
120  **Parameters:**
121  
122  - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from.
123  
124  **Returns:**
125  
126  - <code>SupabasePgvectorEmbeddingRetriever</code> – Deserialized component.
127  
128  ## haystack_integrations.components.retrievers.supabase.keyword_retriever
129  
130  ### SupabasePgvectorKeywordRetriever
131  
132  Bases: <code>PgvectorKeywordRetriever</code>
133  
134  Retrieves documents from the `SupabasePgvectorDocumentStore`, based on keywords.
135  
136  This is a thin wrapper around `PgvectorKeywordRetriever`, adapted for use with
137  `SupabasePgvectorDocumentStore`.
138  
139  To rank the documents, the `ts_rank_cd` function of PostgreSQL is used.
140  It considers how often the query terms appear in the document, how close together the terms are in the document,
141  and how important is the part of the document where they occur.
142  
143  Example usage:
144  
145  # Set an environment variable `SUPABASE_DB_URL` with the connection string to your Supabase database.
146  
147  ```bash
148  export SUPABASE_DB_URL=postgresql://postgres:postgres@localhost:5432/postgres
149  ```
150  
151  ```python
152  from haystack import Document, Pipeline
153  from haystack.document_stores.types.policy import DuplicatePolicy
154  
155  from haystack_integrations.document_stores.supabase import SupabasePgvectorDocumentStore
156  from haystack_integrations.components.retrievers.supabase import SupabasePgvectorKeywordRetriever
157  
158  document_store = SupabasePgvectorDocumentStore(
159      embedding_dimension=768,
160      recreate_table=True,
161  )
162  
163  documents = [Document(content="There are over 7,000 languages spoken around the world today."),
164               Document(content="Elephants have been observed to behave in a way that indicates..."),
165               Document(content="In certain places, you can witness the phenomenon of bioluminescent waves.")]
166  
167  document_store.write_documents(documents, policy=DuplicatePolicy.OVERWRITE)
168  retriever = SupabasePgvectorKeywordRetriever(document_store=document_store)
169  result = retriever.run(query="languages")
170  
171  print(result['documents'][0].content)
172  # >> "There are over 7,000 languages spoken around the world today."
173  ```
174  
175  #### __init__
176  
177  ```python
178  __init__(
179      *,
180      document_store: SupabasePgvectorDocumentStore,
181      filters: dict[str, Any] | None = None,
182      top_k: int = 10,
183      filter_policy: str | FilterPolicy = FilterPolicy.REPLACE
184  ) -> None
185  ```
186  
187  Initialize the SupabasePgvectorKeywordRetriever.
188  
189  **Parameters:**
190  
191  - **document_store** (<code>SupabasePgvectorDocumentStore</code>) – An instance of `SupabasePgvectorDocumentStore`.
192  - **filters** (<code>dict\[str, Any\] | None</code>) – Filters applied to the retrieved Documents.
193  - **top_k** (<code>int</code>) – Maximum number of Documents to return.
194  - **filter_policy** (<code>str | FilterPolicy</code>) – Policy to determine how filters are applied.
195  
196  **Raises:**
197  
198  - <code>ValueError</code> – If `document_store` is not an instance of `SupabasePgvectorDocumentStore`.
199  
200  #### to_dict
201  
202  ```python
203  to_dict() -> dict[str, Any]
204  ```
205  
206  Serializes the component to a dictionary.
207  
208  **Returns:**
209  
210  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
211  
212  #### from_dict
213  
214  ```python
215  from_dict(data: dict[str, Any]) -> SupabasePgvectorKeywordRetriever
216  ```
217  
218  Deserializes the component from a dictionary.
219  
220  **Parameters:**
221  
222  - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from.
223  
224  **Returns:**
225  
226  - <code>SupabasePgvectorKeywordRetriever</code> – Deserialized component.
227  
228  ## haystack_integrations.document_stores.supabase.document_store
229  
230  ### SupabasePgvectorDocumentStore
231  
232  Bases: <code>PgvectorDocumentStore</code>
233  
234  A Document Store for Supabase, using PostgreSQL with the pgvector extension.
235  
236  It should be used with Supabase installed.
237  
238  This is a thin wrapper around `PgvectorDocumentStore` with Supabase-specific defaults:
239  
240  - Reads the connection string from the `SUPABASE_DB_URL` environment variable.
241  - Defaults `create_extension` to `False` since pgvector is pre-installed on Supabase.
242  
243  **Connection notes:** Supabase offers two pooler ports — transaction mode (6543) and session mode (5432).
244  For best compatibility with pgvector operations, use session mode (port 5432) or a direct connection.
245  
246  Example usage:
247  
248  # Set an environment variable `SUPABASE_DB_URL` with the connection string to your Supabase database.
249  
250  ```bash
251  export SUPABASE_DB_URL=postgresql://postgres:postgres@localhost:5432/postgres
252  ```
253  
254  ```python
255  from haystack_integrations.document_stores.supabase import SupabasePgvectorDocumentStore
256  
257  document_store = SupabasePgvectorDocumentStore(
258      embedding_dimension=768,
259      vector_function="cosine_similarity",
260      recreate_table=True,
261  )
262  ```
263  
264  #### __init__
265  
266  ```python
267  __init__(
268      *,
269      connection_string: Secret = Secret.from_env_var("SUPABASE_DB_URL"),
270      create_extension: bool = False,
271      schema_name: str = "public",
272      table_name: str = "haystack_documents",
273      language: str = "english",
274      embedding_dimension: int = 768,
275      vector_type: Literal["vector", "halfvec"] = "vector",
276      vector_function: Literal[
277          "cosine_similarity", "inner_product", "l2_distance"
278      ] = "cosine_similarity",
279      recreate_table: bool = False,
280      search_strategy: Literal[
281          "exact_nearest_neighbor", "hnsw"
282      ] = "exact_nearest_neighbor",
283      hnsw_recreate_index_if_exists: bool = False,
284      hnsw_index_creation_kwargs: dict[str, int] | None = None,
285      hnsw_index_name: str = "haystack_hnsw_index",
286      hnsw_ef_search: int | None = None,
287      keyword_index_name: str = "haystack_keyword_index"
288  ) -> None
289  ```
290  
291  Creates a new SupabasePgvectorDocumentStore instance.
292  
293  **Parameters:**
294  
295  - **connection_string** (<code>Secret</code>) – The connection string for the Supabase PostgreSQL database, defined as an
296    environment variable. Default: `SUPABASE_DB_URL`. Format:
297    `postgresql://postgres.[project-ref]:[password]@aws-0-[region].pooler.supabase.com:5432/postgres`
298  - **create_extension** (<code>bool</code>) – Whether to create the pgvector extension if it doesn't exist.
299    Defaults to `False` since Supabase has pgvector pre-installed.
300  - **schema_name** (<code>str</code>) – The name of the schema the table is created in.
301  - **table_name** (<code>str</code>) – The name of the table to use to store Haystack documents.
302  - **language** (<code>str</code>) – The language to be used to parse query and document content in keyword retrieval.
303  - **embedding_dimension** (<code>int</code>) – The dimension of the embedding.
304  - **vector_type** (<code>Literal['vector', 'halfvec']</code>) – The type of vector used for embedding storage. `"vector"` or `"halfvec"`.
305  - **vector_function** (<code>Literal['cosine_similarity', 'inner_product', 'l2_distance']</code>) – The similarity function to use when searching for similar embeddings.
306  - **recreate_table** (<code>bool</code>) – Whether to recreate the table if it already exists.
307  - **search_strategy** (<code>Literal['exact_nearest_neighbor', 'hnsw']</code>) – The search strategy to use: `"exact_nearest_neighbor"` or `"hnsw"`.
308  - **hnsw_recreate_index_if_exists** (<code>bool</code>) – Whether to recreate the HNSW index if it already exists.
309  - **hnsw_index_creation_kwargs** (<code>dict\[str, int\] | None</code>) – Additional keyword arguments for HNSW index creation.
310  - **hnsw_index_name** (<code>str</code>) – Index name for the HNSW index.
311  - **hnsw_ef_search** (<code>int | None</code>) – The `ef_search` parameter to use at query time for HNSW.
312  - **keyword_index_name** (<code>str</code>) – Index name for the Keyword index.
313  
314  #### to_dict
315  
316  ```python
317  to_dict() -> dict[str, Any]
318  ```
319  
320  Serializes the component to a dictionary.
321  
322  **Returns:**
323  
324  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
325  
326  #### from_dict
327  
328  ```python
329  from_dict(data: dict[str, Any]) -> SupabasePgvectorDocumentStore
330  ```
331  
332  Deserializes the component from a dictionary.
333  
334  **Parameters:**
335  
336  - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from.
337  
338  **Returns:**
339  
340  - <code>SupabasePgvectorDocumentStore</code> – Deserialized component.