ann.md
  1  # ANN
  2  
  3  Approximate Nearest Neighbor (ANN) index configuration for storing vector embeddings.
  4  
  5  ## backend
  6  ```yaml
  7  backend: faiss|hnsw|annoy|ggml|numpy|torch|pgvector|sqlite|custom
  8  ```
  9  
 10  Sets the ANN backend. Defaults to `faiss`. Additional backends are available via the [ann](../../../install/#ann) extras package. Set custom backends via setting this parameter to the fully resolvable class string.
 11  
 12  Backend-specific settings are set with a corresponding configuration object having the same name as the backend (i.e. annoy, faiss, or hnsw). These are optional and set to defaults if omitted.
 13  
 14  ### faiss
 15  ```yaml
 16  faiss:
 17      components: comma separated list of components - defaults to "IDMap,Flat" for small
 18                  indices and "IVFx,Flat" for larger indexes where
 19                  x = min(4 * sqrt(embeddings count), embeddings count / 39)
 20                  automatically calculates number of IVF cells when omitted (supports "IVF,Flat")
 21      nprobe: search probe setting (int) - defaults to x/16 (as defined above)
 22              for larger indexes
 23      nflip: same as nprobe - only used with binary hash indexes
 24      quantize: store vectors with x-bit precision vs 32-bit (boolean|int)
 25                true sets 8-bit precision, false disables, int sets specified
 26                precision
 27      mmap: load as on-disk index (boolean) - trade query response time for a
 28            smaller RAM footprint, defaults to false
 29      sample: percent of data to use for model training (0.0 - 1.0)
 30              reduces indexing time for larger (>1M+ row) indexes, defaults to 1.0
 31  ```
 32  
 33  Faiss supports both floating point and binary indexes. Floating point indexes are the default. Binary indexes are used when indexing scalar-quantized datasets.
 34  
 35  See the following Faiss documentation links for more information.
 36  
 37  - [Guidelines for choosing an index](https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index)
 38  - [Index configuration summary](https://github.com/facebookresearch/faiss/wiki/Faiss-indexes)
 39  - [Index Factory](https://github.com/facebookresearch/faiss/wiki/The-index-factory)
 40  - [Binary Indexes](https://github.com/facebookresearch/faiss/wiki/Binary-indexes)
 41  - [Search Tuning](https://github.com/facebookresearch/faiss/wiki/Faster-search)
 42  
 43  Note: For macOS users, an existing bug in an upstream package restricts the number of processing threads to 1. This limitation is managed internally to prevent system crashes.
 44  
 45  ### hnsw
 46  ```yaml
 47  hnsw:
 48      efconstruction:  ef_construction param for init_index (int) - defaults to 200
 49      m: M param for init_index (int) - defaults to 16
 50      randomseed: random-seed param for init_index (int) - defaults to 100
 51      efsearch: ef search param (int) - defaults to None and not set
 52  ```
 53  
 54  See [Hnswlib documentation](https://github.com/nmslib/hnswlib/blob/master/ALGO_PARAMS.md) for more information on these parameters.
 55  
 56  ### annoy
 57  ```yaml
 58  annoy:
 59      ntrees: number of trees (int) - defaults to 10
 60      searchk: search_k search setting (int) - defaults to -1
 61  ```
 62  
 63  See [Annoy documentation](https://github.com/spotify/annoy#full-python-api) for more information on these parameters. Note that annoy indexes can not be modified after creation, upserts/deletes and other modifications are not supported.
 64  
 65  ### ggml
 66  ```yaml
 67  ggml:
 68      gpu: enable GPU - defaults to True
 69      quantize: sets the tensor quantization - defaults to F32
 70      querysize: query buffer size - defaults to 64
 71  ```
 72  
 73  The [GGML](https://github.com/ggml-org/ggml) backend is a k-nearest neighbors backend. It stores tensors using GGML and [GGUF](https://huggingface.co/docs/hub/en/gguf). It supports GPU-enabled operations and supports quantization. GGML is the framework used by [llama.cpp](https://github.com/ggml-org/llama.cpp).
 74  
 75  [See this](https://github.com/ggml-org/ggml/blob/master/include/ggml.h#L379) for a list of quantization types.
 76  
 77  ### numpy
 78  
 79  The NumPy backend is a k-nearest neighbors backend. It's designed for simplicity and works well with smaller datasets that fit into memory.
 80  
 81  ```yaml
 82  numpy:
 83      safetensors: stores vectors using the safetensors format
 84                   defaults to NumPy array storage
 85  ```
 86  
 87  ### torch
 88  
 89  The Torch backend is a k-nearest neighbors backend like NumPy. It supports GPU-enabled operations. It also has support for quantization which enables fitting larger arrays into GPU memory.
 90  
 91  When quantization is enabled, vectors are _always_ stored in safetensors. _Note that macOS support for quantization is limited._
 92  
 93  ```yaml
 94  torch:
 95      safetensors: stores vectors using the safetensors format - defaults
 96                   to NumPy array storage if quantization is disabled
 97      quantize:
 98          type: quantization type (fp4, nf4, int8)
 99          blocksize: quantization block size parameter
100  ```
101  
102  ### pgvector
103  ```yaml
104  pgvector:
105      url: database url connection string, alternatively can be set via
106           ANN_URL environment variable
107      schema: database schema to store vectors - defaults to being
108              determined by the database
109      table: database table to store vectors - defaults to `vectors`
110      precision: vector float precision (half or full) - defaults to `full`
111      efconstruction:  ef_construction param (int) - defaults to 200
112      m: M param for init_index (int) - defaults to 16
113  ```
114  
115  The pgvector backend stores embeddings in a Postgres database. See the [pgvector documentation](https://github.com/pgvector/pgvector-python?tab=readme-ov-file#sqlalchemy) for more information on these parameters. See the [SQLAlchemy](https://docs.sqlalchemy.org/en/20/core/engines.html#database-urls) documentation for more information on how to construct url connection strings.
116  
117  ### sqlite
118  ```yaml
119  sqlite:
120      quantize: store vectors with x-bit precision vs 32-bit (boolean|int)
121                true sets 8-bit precision, false disables, int sets specified
122                precision
123      table: database table to store vectors - defaults to `vectors`
124  ```
125  
126  The SQLite backend stores embeddings in a SQLite database using [sqlite-vec](https://github.com/asg017/sqlite-vec). This backend supports 1-bit and 8-bit quantization at the storage level.
127  
128  See [this note](https://alexgarcia.xyz/sqlite-vec/python.html#macos-blocks-sqlite-extensions-by-default) on how to run this ANN on MacOS.