ann.md
1 # ANN 2 3 Approximate Nearest Neighbor (ANN) index configuration for storing vector embeddings. 4 5 ## backend 6 ```yaml 7 backend: faiss|hnsw|annoy|ggml|numpy|torch|pgvector|sqlite|custom 8 ``` 9 10 Sets the ANN backend. Defaults to `faiss`. Additional backends are available via the [ann](../../../install/#ann) extras package. Set custom backends via setting this parameter to the fully resolvable class string. 11 12 Backend-specific settings are set with a corresponding configuration object having the same name as the backend (i.e. annoy, faiss, or hnsw). These are optional and set to defaults if omitted. 13 14 ### faiss 15 ```yaml 16 faiss: 17 components: comma separated list of components - defaults to "IDMap,Flat" for small 18 indices and "IVFx,Flat" for larger indexes where 19 x = min(4 * sqrt(embeddings count), embeddings count / 39) 20 automatically calculates number of IVF cells when omitted (supports "IVF,Flat") 21 nprobe: search probe setting (int) - defaults to x/16 (as defined above) 22 for larger indexes 23 nflip: same as nprobe - only used with binary hash indexes 24 quantize: store vectors with x-bit precision vs 32-bit (boolean|int) 25 true sets 8-bit precision, false disables, int sets specified 26 precision 27 mmap: load as on-disk index (boolean) - trade query response time for a 28 smaller RAM footprint, defaults to false 29 sample: percent of data to use for model training (0.0 - 1.0) 30 reduces indexing time for larger (>1M+ row) indexes, defaults to 1.0 31 ``` 32 33 Faiss supports both floating point and binary indexes. Floating point indexes are the default. Binary indexes are used when indexing scalar-quantized datasets. 34 35 See the following Faiss documentation links for more information. 36 37 - [Guidelines for choosing an index](https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index) 38 - [Index configuration summary](https://github.com/facebookresearch/faiss/wiki/Faiss-indexes) 39 - [Index Factory](https://github.com/facebookresearch/faiss/wiki/The-index-factory) 40 - [Binary Indexes](https://github.com/facebookresearch/faiss/wiki/Binary-indexes) 41 - [Search Tuning](https://github.com/facebookresearch/faiss/wiki/Faster-search) 42 43 Note: For macOS users, an existing bug in an upstream package restricts the number of processing threads to 1. This limitation is managed internally to prevent system crashes. 44 45 ### hnsw 46 ```yaml 47 hnsw: 48 efconstruction: ef_construction param for init_index (int) - defaults to 200 49 m: M param for init_index (int) - defaults to 16 50 randomseed: random-seed param for init_index (int) - defaults to 100 51 efsearch: ef search param (int) - defaults to None and not set 52 ``` 53 54 See [Hnswlib documentation](https://github.com/nmslib/hnswlib/blob/master/ALGO_PARAMS.md) for more information on these parameters. 55 56 ### annoy 57 ```yaml 58 annoy: 59 ntrees: number of trees (int) - defaults to 10 60 searchk: search_k search setting (int) - defaults to -1 61 ``` 62 63 See [Annoy documentation](https://github.com/spotify/annoy#full-python-api) for more information on these parameters. Note that annoy indexes can not be modified after creation, upserts/deletes and other modifications are not supported. 64 65 ### ggml 66 ```yaml 67 ggml: 68 gpu: enable GPU - defaults to True 69 quantize: sets the tensor quantization - defaults to F32 70 querysize: query buffer size - defaults to 64 71 ``` 72 73 The [GGML](https://github.com/ggml-org/ggml) backend is a k-nearest neighbors backend. It stores tensors using GGML and [GGUF](https://huggingface.co/docs/hub/en/gguf). It supports GPU-enabled operations and supports quantization. GGML is the framework used by [llama.cpp](https://github.com/ggml-org/llama.cpp). 74 75 [See this](https://github.com/ggml-org/ggml/blob/master/include/ggml.h#L379) for a list of quantization types. 76 77 ### numpy 78 79 The NumPy backend is a k-nearest neighbors backend. It's designed for simplicity and works well with smaller datasets that fit into memory. 80 81 ```yaml 82 numpy: 83 safetensors: stores vectors using the safetensors format 84 defaults to NumPy array storage 85 ``` 86 87 ### torch 88 89 The Torch backend is a k-nearest neighbors backend like NumPy. It supports GPU-enabled operations. It also has support for quantization which enables fitting larger arrays into GPU memory. 90 91 When quantization is enabled, vectors are _always_ stored in safetensors. _Note that macOS support for quantization is limited._ 92 93 ```yaml 94 torch: 95 safetensors: stores vectors using the safetensors format - defaults 96 to NumPy array storage if quantization is disabled 97 quantize: 98 type: quantization type (fp4, nf4, int8) 99 blocksize: quantization block size parameter 100 ``` 101 102 ### pgvector 103 ```yaml 104 pgvector: 105 url: database url connection string, alternatively can be set via 106 ANN_URL environment variable 107 schema: database schema to store vectors - defaults to being 108 determined by the database 109 table: database table to store vectors - defaults to `vectors` 110 precision: vector float precision (half or full) - defaults to `full` 111 efconstruction: ef_construction param (int) - defaults to 200 112 m: M param for init_index (int) - defaults to 16 113 ``` 114 115 The pgvector backend stores embeddings in a Postgres database. See the [pgvector documentation](https://github.com/pgvector/pgvector-python?tab=readme-ov-file#sqlalchemy) for more information on these parameters. See the [SQLAlchemy](https://docs.sqlalchemy.org/en/20/core/engines.html#database-urls) documentation for more information on how to construct url connection strings. 116 117 ### sqlite 118 ```yaml 119 sqlite: 120 quantize: store vectors with x-bit precision vs 32-bit (boolean|int) 121 true sets 8-bit precision, false disables, int sets specified 122 precision 123 table: database table to store vectors - defaults to `vectors` 124 ``` 125 126 The SQLite backend stores embeddings in a SQLite database using [sqlite-vec](https://github.com/asg017/sqlite-vec). This backend supports 1-bit and 8-bit quantization at the storage level. 127 128 See [this note](https://alexgarcia.xyz/sqlite-vec/python.html#macos-blocks-sqlite-extensions-by-default) on how to run this ANN on MacOS.