README.md
1 # quantize 2 3 You can also use the [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space on Hugging Face to build your own quants without any setup. 4 5 Note: It is synced from llama.cpp `main` every 6 hours. 6 7 ## Llama 2 7B 8 9 | Quantization | Bits per Weight (BPW) | 10 |--------------|-----------------------| 11 | Q2_K | 3.35 | 12 | Q3_K_S | 3.50 | 13 | Q3_K_M | 3.91 | 14 | Q3_K_L | 4.27 | 15 | Q4_K_S | 4.58 | 16 | Q4_K_M | 4.84 | 17 | Q5_K_S | 5.52 | 18 | Q5_K_M | 5.68 | 19 | Q6_K | 6.56 | 20 21 ## Llama 2 13B 22 Quantization | Bits per Weight (BPW) 23 -- | -- 24 Q2_K | 3.34 25 Q3_K_S | 3.48 26 Q3_K_M | 3.89 27 Q3_K_L | 4.26 28 Q4_K_S | 4.56 29 Q4_K_M | 4.83 30 Q5_K_S | 5.51 31 Q5_K_M | 5.67 32 Q6_K | 6.56 33 34 # Llama 2 70B 35 36 Quantization | Bits per Weight (BPW) 37 -- | -- 38 Q2_K | 3.40 39 Q3_K_S | 3.47 40 Q3_K_M | 3.85 41 Q3_K_L | 4.19 42 Q4_K_S | 4.53 43 Q4_K_M | 4.80 44 Q5_K_S | 5.50 45 Q5_K_M | 5.65 46 Q6_K | 6.56