/ examples / quantize / README.md
README.md
 1  # quantize
 2  
 3  You can also use the [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space on Hugging Face to build your own quants without any setup.
 4  
 5  Note: It is synced from llama.cpp `main` every 6 hours.
 6  
 7  ## Llama 2 7B
 8  
 9  | Quantization | Bits per Weight (BPW) |
10  |--------------|-----------------------|
11  | Q2_K         | 3.35                  |
12  | Q3_K_S       | 3.50                  |
13  | Q3_K_M       | 3.91                  |
14  | Q3_K_L       | 4.27                  |
15  | Q4_K_S       | 4.58                  |
16  | Q4_K_M       | 4.84                  |
17  | Q5_K_S       | 5.52                  |
18  | Q5_K_M       | 5.68                  |
19  | Q6_K         | 6.56                  |
20  
21  ## Llama 2 13B
22  Quantization | Bits per Weight (BPW)
23  -- | --
24  Q2_K | 3.34
25  Q3_K_S | 3.48
26  Q3_K_M | 3.89
27  Q3_K_L | 4.26
28  Q4_K_S | 4.56
29  Q4_K_M | 4.83
30  Q5_K_S | 5.51
31  Q5_K_M | 5.67
32  Q6_K | 6.56
33  
34  # Llama 2 70B
35  
36  Quantization | Bits per Weight (BPW)
37  -- | --
38  Q2_K | 3.40
39  Q3_K_S | 3.47
40  Q3_K_M | 3.85
41  Q3_K_L | 4.19
42  Q4_K_S | 4.53
43  Q4_K_M | 4.80
44  Q5_K_S | 5.50
45  Q5_K_M | 5.65
46  Q6_K | 6.56