/ README.md
README.md
1 <a name="readme-top"></a> 2 3 <p align="center"> 4 <img src="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2.5/Qwen2.5-Coder/qwen2.5-coder-logo" width="400"/> 5 <p> 6 7 <p align="center"> 8 <img src="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2.5/Qwen2.5-Coder-Family/main_fig_32b_white.jpg" width="400"/> 9 <p> 10 11 12 <p align="center"> 13 ๐ค <a href="https://huggingface.co/collections/Qwen/qwen25-coder-66eaa22e6f99801bf65b0c2f">Hugging Face</a>   |   ๐ค <a href="https://modelscope.cn/organization/qwen">ModelScope</a>   |   ๐ป <a href="https://www.kaggle.com/models/qwen-lm/qwen2.5-coder">Kaggle</a>   |    ๐ <a href="https://qwenlm.github.io/blog/qwen2.5-coder-family">Blog</a>    ๏ฝ   ๐ <a href="https://qwen.readthedocs.io/">Documentation</a> 14 <br> 15 ๐ฅ๏ธ <a href="https://huggingface.co/spaces/Qwen/Qwen2.5-Coder-demo">Demo</a>   | ๐ผ <a href="https://huggingface.co/spaces/Qwen/Qwen2.5-Coder-Artifacts">Artifacts</a>   |   ๐ฌ <a href="https://github.com/QwenLM/Qwen/blob/main/assets/wechat.png">WeChat (ๅพฎไฟก)</a>   |   ๐ซจ <a href="https://discord.gg/CV4E9rpNSD">Discord</a>   |    ๐<a href="https://arxiv.org/abs/2409.12186">Arxiv</a>   16 </p> 17 18 19 Visit our Hugging Face or ModelScope organization (click links above), search checkpoints with names starting with `Qwen2.5-Coder-`, and you will find all you need! Enjoy! 20 21 # Qwen2.5-Coder Series: Powerful, Diverse, Practical. 22 23 ## Introduction 24 25 Today, we are excited to open source the โPowerfulโ, โDiverseโ, and โPracticalโ **Qwen2.5-Coder** series (formerly known as CodeQwen1.5), dedicated to continuously promoting the development of Open CodeLLMs. 26 27 ๐ป Powerful: Qwen2.5-Coder-32B-Instruct has become the current SOTA open-source code model, matching the coding capabilities of GPT-4o. While demonstrating strong and comprehensive coding abilities, it also possesses good general and mathematical skills; 28 29 ๐ Diverse: Building on the previously open-sourced two sizes of 1.5B / 7B, this release brings four model sizes, including 0.5B / 3B / 14B / 32B. As of now, Qwen2.5-Coder has covered six mainstream model sizes to meet the needs of different developers; 30 31 ๐ Practical: We explore the practicality of Qwen2.5-Coder in two scenarios, including code assistants and Artifacts, with some examples showcasing the potential applications of Qwen2.5-Coder in real-world scenarios; 32 33 ## Basic information 34 35 1. โจ Supporting long context understanding and generation with the context length of 128K tokens; 36 2. โจ Supporting 92 coding languages; 37 ``` 38 ['ada', 'agda', 'alloy', 'antlr', 'applescript', 'assembly', 'augeas', 'awk', 'batchfile', 'bluespec', 'c', 'c#', 'c++', 'clojure', 'cmake', 'coffeescript', 'common-lisp', 'css', 'cuda', 'dart', 'dockerfile', 'elixir', 'elm', 'emacs-lisp', 'erlang', 'f#', 'fortran', 'glsl', 'go', 'groovy', 'haskell', 'html', 'idris', 'isabelle', 'java', 'java-server-pages', 'javascript', 'json', 'julia', 'jupyter-notebook', 'kotlin', 'lean', 'literate-agda', 'literate-coffeescript', 'literate-haskell', 'lua', 'makefile', 'maple', 'markdown', 'mathematica', 'matlab', 'objectc++', 'ocaml', 'pascal', 'perl', 'php', 'powershell', 'prolog', 'protocol-buffer', 'python', 'r', 'racket', 'restructuredtext', 'rmarkdown', 'ruby', 'rust', 'sas', 'scala', 'scheme', 'shell', 'smalltalk', 'solidity', 'sparql', 'sql', 'stan', 'standard-ml', 'stata', 'swift', 'systemverilog', 'tcl', 'tcsh', 'tex', 'thrift', 'typescript', 'verilog', 'vhdl', 'visual-basic', 'vue', 'xslt', 'yacc', 'yaml', 'zig'] 39 ``` 40 3. โจ Retain strengths in math and general capabilities from base model 41 42 > [!Important] 43 > We updates both the special tokens and their corresponding token ids, in order to maintain consistency with Qwen2.5. The new special tokens are as the following: 44 45 ```json 46 { 47 "<|fim_prefix|>": 151659, 48 "<|fim_middle|>": 151660, 49 "<|fim_suffix|>": 151661, 50 "<|fim_pad|>": 151662, 51 "<|repo_name|>": 151663, 52 "<|file_sep|>": 151664, 53 "<|im_start|>": 151644, 54 "<|im_end|>": 151645 55 } 56 ``` 57 58 | model name | type | length | Download | 59 |-----------------------------|----------|--------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| 60 | Qwen2.5-Coder-0.5B | base | 32k | ๐ค [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B) โข ๐ค [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-Coder-0.5B) | 61 | Qwen2.5-Coder-1.5B | base | 32k | ๐ค [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B) โข ๐ค [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-Coder-1.5B) | 62 | Qwen2.5-Coder-3B | base | 32k | ๐ค [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-3B) โข ๐ค [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-Coder-3B) | 63 | Qwen2.5-Coder-7B | base | 128k | ๐ค [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-7B) โข ๐ค [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-Coder-7B) | 64 | Qwen2.5-Coder-14B | base | 128k | ๐ค [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-14B) โข ๐ค [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-Coder-14B) | 65 | Qwen2.5-Coder-32B | base | 128k | ๐ค [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-32B) โข ๐ค [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-Coder-32B) | 66 | Qwen2.5-Coder-0.5B-instruct | instruct | 32k | ๐ค [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct) โข ๐ค [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-Coder-0.5B-Instruct) | 67 | Qwen2.5-Coder-1.5B-instruct | instruct | 32k | ๐ค [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct) โข ๐ค [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-Coder-1.5B-Instruct) | 68 | Qwen2.5-Coder-3B-instruct | instruct | 32k | ๐ค [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct) โข ๐ค [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-Coder-3B-Instruct) | 69 | Qwen2.5-Coder-7B-instruct | instruct | 128k | ๐ค [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) โข ๐ค [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-Coder-7B-Instruct) | 70 | Qwen2.5-Coder-14B-instruct | instruct | 128k | ๐ค [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-14B-Instruct) โข ๐ค [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-Coder-14B-Instruct) | 71 | Qwen2.5-Coder-32B-instruct | instruct | 128k | ๐ค [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) โข ๐ค [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-Coder-32B-Instruct) | 72 | Qwen2.5-Coder-0.5B-Instruct-AWQ | instruct | 32k | ๐ค [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct-AWQ) โข ๐ค [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-Coder-0.5B-Instruct-AWQ) | 73 | Qwen2.5-Coder-0.5B-Instruct-GGUF | instruct | 32k | ๐ค [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct-GGUF) โข ๐ค [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-Coder-0.5B-Instruct-GGUF) | 74 | Qwen2.5-Coder-0.5B-Instruct-GPTQ-Int4 | instruct | 32k | ๐ค [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct-GPTQ-Int4) โข ๐ค [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-Coder-0.5B-Instruct-GPTQ-Int4) | 75 | Qwen2.5-Coder-0.5B-Instruct-GPTQ-Int8 | instruct | 32k | ๐ค [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct-GPTQ-Int8) โข ๐ค [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-Coder-0.5B-Instruct-GPTQ-Int8) | 76 | Qwen2.5-Coder-1.5B-Instruct-AWQ | instruct | 32k | ๐ค [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct-AWQ) โข ๐ค [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-Coder-1.5B-Instruct-AWQ) | 77 | Qwen2.5-Coder-1.5B-Instruct-GGUF | instruct | 32k | ๐ค [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct-GGUF) โข ๐ค [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-Coder-1.5B-Instruct-GGUF) | 78 | Qwen2.5-Coder-1.5B-Instruct-GPTQ-Int4 | instruct | 32k | ๐ค [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct-GPTQ-Int4) โข ๐ค [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-Coder-1.5B-Instruct-GPTQ-Int4) | 79 | Qwen2.5-Coder-1.5B-Instruct-GPTQ-Int8 | instruct | 32k | ๐ค [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct-GPTQ-Int8) โข ๐ค [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-Coder-1.5B-Instruct-GPTQ-Int8) | 80 | Qwen2.5-Coder-3B-Instruct-AWQ | instruct | 32k | ๐ค [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct-AWQ) โข ๐ค [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-Coder-3B-Instruct-AWQ) | 81 | Qwen2.5-Coder-3B-Instruct-GGUF | instruct | 32k | ๐ค [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct-GGUF) โข ๐ค [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-Coder-3B-Instruct-GGUF) | 82 | Qwen2.5-Coder-3B-Instruct-GPTQ-Int4 | instruct | 32k | ๐ค [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct-GPTQ-Int4) โข ๐ค [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-Coder-3B-Instruct-GPTQ-Int4) | 83 | Qwen2.5-Coder-3B-Instruct-GPTQ-Int8 | instruct | 32k | ๐ค [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct-GPTQ-Int8) โข ๐ค [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-Coder-3B-Instruct-GPTQ-Int8) | 84 | Qwen2.5-Coder-7B-Instruct-AWQ | instruct | 128k | ๐ค [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct-AWQ) โข ๐ค [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-Coder-7B-Instruct-AWQ) | 85 | Qwen2.5-Coder-7B-Instruct-GGUF | instruct | 128k | ๐ค [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF) โข ๐ค [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF) | 86 | Qwen2.5-Coder-7B-Instruct-GPTQ-Int4 | instruct | 128k | ๐ค [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct-GPTQ-Int4) โข ๐ค [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-Coder-7B-Instruct-GPTQ-Int4) | 87 | Qwen2.5-Coder-7B-Instruct-GPTQ-Int8 | instruct | 128k | ๐ค [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct-GPTQ-Int8) โข ๐ค [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-Coder-7B-Instruct-GPTQ-Int8) | 88 | Qwen2.5-Coder-14B-Instruct-AWQ | instruct | 128k | ๐ค [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-14B-Instruct-AWQ) โข ๐ค [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-Coder-14B-Instruct-AWQ) | 89 | Qwen2.5-Coder-14B-Instruct-GGUF | instruct | 128k | ๐ค [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-14B-Instruct-GGUF) โข ๐ค [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-Coder-14B-Instruct-GGUF) | 90 | Qwen2.5-Coder-14B-Instruct-GPTQ-Int4 | instruct | 128k | ๐ค [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-14B-Instruct-GPTQ-Int4) โข ๐ค [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-Coder-14B-Instruct-GPTQ-Int4) | 91 | Qwen2.5-Coder-14B-Instruct-GPTQ-Int8 | instruct | 128k | ๐ค [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-14B-Instruct-GPTQ-Int8) โข ๐ค [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-Coder-14B-Instruct-GPTQ-Int8) | 92 | Qwen2.5-Coder-32B-Instruct-AWQ | instruct | 128k | ๐ค [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct-AWQ) โข ๐ค [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-Coder-32B-Instruct-AWQ) | 93 | Qwen2.5-Coder-32B-Instruct-GGUF | instruct | 128k | ๐ค [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct-GGUF) โข ๐ค [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-Coder-32B-Instruct-GGUF) | 94 | Qwen2.5-Coder-32B-Instruct-GPTQ-Int4 | instruct | 128k | ๐ค [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct-GPTQ-Int4) โข ๐ค [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-Coder-32B-Instruct-GPTQ-Int4) | 95 | Qwen2.5-Coder-32B-Instruct-GPTQ-Int8 | instruct | 128k | ๐ค [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct-GPTQ-Int8) โข ๐ค [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-Coder-32B-Instruct-GPTQ-Int8) | 96 97 98 Detailed performance and introduction are shown in this <a href="https://qwenlm.github.io/blog/qwen2.5-coder-family"> ๐ blog</a>. 99 100 ## Requirements 101 * `python>=3.9` 102 * `transformers>4.37.0` for Qwen2.5 dense models. 103 104 > [!Warning] 105 > <div align="center"> 106 > <b> 107 > ๐จ This is a must because `transformers` integrated Qwen2 codes since `4.37.0`. 108 > </b> 109 > </div> 110 111 You can install the required packages with the following command: 112 ```bash 113 pip install -r requirements.txt 114 ``` 115 116 ## Quick Start 117 118 > [!Important] 119 > **Qwen2.5-Coder-\[0.5-32\]B-Instruct** are instruction models for chatting; 120 > 121 > **Qwen2.5-Coder-\[0.5-32\]B** is a base model typically used for completion, serving as a better starting point for fine-tuning. 122 > 123 ### ๐๐ป Chat with Qwen2.5-Coder-32B-Instruct 124 You can just write several lines of code with `transformers` to chat with Qwen2.5-Coder-32B-Instruct. Essentially, we build the tokenizer and the model with `from_pretrained` method, and we use generate method to perform chatting with the help of chat template provided by the tokenizer. Below is an example of how to chat with Qwen2.5-Coder-32B-Instruct: 125 126 ```python 127 from transformers import AutoModelForCausalLM, AutoTokenizer 128 129 model_name = "Qwen/Qwen2.5-Coder-32B-Instruct" 130 131 model = AutoModelForCausalLM.from_pretrained( 132 model_name, 133 torch_dtype="auto", 134 device_map="auto" 135 ) 136 tokenizer = AutoTokenizer.from_pretrained(model_name) 137 138 prompt = "write a quick sort algorithm." 139 messages = [ 140 {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."}, 141 {"role": "user", "content": prompt} 142 ] 143 text = tokenizer.apply_chat_template( 144 messages, 145 tokenize=False, 146 add_generation_prompt=True 147 ) 148 model_inputs = tokenizer([text], return_tensors="pt").to(model.device) 149 150 generated_ids = model.generate( 151 **model_inputs, 152 max_new_tokens=512 153 ) 154 generated_ids = [ 155 output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) 156 ] 157 158 response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] 159 ``` 160 The `apply_chat_template()` function is used to convert the messages into a format that the model can understand. 161 The `add_generation_prompt` argument is used to add a generation prompt, which refers to `<|im_start|>assistant\n` to the input. Notably, we apply ChatML template for chat models following our previous practice. 162 The `max_new_tokens` argument is used to set the maximum length of the response. The `tokenizer.batch_decode()` function is used to decode the response. In terms of the input, the above messages is an example to show how to format your dialog history and system prompt. 163 You can use the other size of instruct model in the same way. 164 165 ### ๐๐ป Code with Qwen2.5-Coder-32B 166 167 #### 1. Basic Usage 168 The model completes the code snippets according to the given prompts, without any additional formatting, which is usually termed as `code completion` in the code generation tasks. 169 170 Essentially, we build the tokenizer and the model with `from_pretrained` method, and we use generate method to perform code completion. Below is an example on how to chat with Qwen2.5-Coder-32B: 171 ```python 172 from transformers import AutoTokenizer, AutoModelForCausalLM 173 174 device = "cuda" # the device to load the model onto 175 176 # Now you do not need to add "trust_remote_code=True" 177 TOKENIZER = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-32B") 178 MODEL = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-32B", device_map="auto").eval() 179 180 # tokenize the input into tokens 181 input_text = "#write a quick sort algorithm" 182 model_inputs = TOKENIZER([input_text], return_tensors="pt").to(device) 183 184 # Use `max_new_tokens` to control the maximum output length. 185 generated_ids = MODEL.generate(model_inputs.input_ids, max_new_tokens=512, do_sample=False)[0] 186 # The generated_ids include prompt_ids, so we only need to decode the tokens after prompt_ids. 187 output_text = TOKENIZER.decode(generated_ids[len(model_inputs.input_ids[0]):], skip_special_tokens=True) 188 189 print(f"Prompt: {input_text}\n\nGenerated text: {output_text}") 190 ``` 191 The `max_new_tokens` argument is used to set the maximum length of the response. 192 The `input_text` could be any text that you would like model to continue with. 193 194 195 #### 2. Processing Long Texts 196 197 The current `config.json` is set for context length up to 32,768 tokens. 198 To handle extensive inputs exceeding 32,768 tokens, we utilize [YaRN](https://arxiv.org/abs/2309.00071), a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts. 199 200 For supported frameworks, you could add the following to `config.json` to enable YaRN: 201 ```json 202 { 203 ..., 204 "rope_scaling": { 205 "factor": 4.0, 206 "original_max_position_embeddings": 32768, 207 "type": "yarn" 208 } 209 } 210 ``` 211 212 #### 3. File-Level Code Completion (Fill in the middle) 213 The code insertion task, also referred to as the "fill-in-the-middle" challenge, requires the insertion of code segments in a manner that bridges the gaps within a given code context. 214 For an approach aligned with best practices, we recommend adhering to the formatting guidelines outlined in the paper "Efficient Training of Language Models to Fill in the Middle"[[arxiv](https://arxiv.org/abs/2207.14255)]. This involves the use of three specialized tokens`<fim_prefix>`, `<fim_suffix>`, and `<fim_middle>` to denote the respective segments of the code structure. 215 The prompt should be structured as follows: 216 ```python 217 prompt = '<|fim_prefix|>' + prefix_code + '<|fim_suffix|>' + suffix_code + '<|fim_middle|>' 218 ``` 219 Following the approach mentioned, an example would be structured in this manner: 220 221 ```python 222 from transformers import AutoTokenizer, AutoModelForCausalLM 223 # load model 224 device = "cuda" # the device to load the model onto 225 226 TOKENIZER = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-32B") 227 MODEL = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-32B", device_map="auto").eval() 228 229 input_text = """<|fim_prefix|>def quicksort(arr): 230 if len(arr) <= 1: 231 return arr 232 pivot = arr[len(arr) // 2] 233 <|fim_suffix|> 234 middle = [x for x in arr if x == pivot] 235 right = [x for x in arr if x > pivot] 236 return quicksort(left) + middle + quicksort(right)<|fim_middle|>""" 237 238 model_inputs = TOKENIZER([input_text], return_tensors="pt").to(device) 239 240 # Use `max_new_tokens` to control the maximum output length. 241 generated_ids = MODEL.generate(model_inputs.input_ids, max_new_tokens=512, do_sample=False)[0] 242 # The generated_ids include prompt_ids, we only need to decode the tokens after prompt_ids. 243 output_text = TOKENIZER.decode(generated_ids[len(model_inputs.input_ids[0]):], skip_special_tokens=True) 244 245 print(f"Prompt: {input_text}\n\nGenerated text: {output_text}") 246 ``` 247 248 #### 4. Repository-Level Code Completion 249 The repository level code completion task involves feeding the model the content of multiple files from the same repository. This enables the model to understand the interrelationships between different calls within these files, thereby facilitating the completion of code content. 250 We recommend using the two special tokens `<|repo_name|>` and `<|file_sep|>` to indicate the repository structure. 251 For example, assuming the repository name is stored in `repo_name`, and it contains files with their respective paths and contents listed as [(`file_path1`, `file_content1`), (`file_path2`, `file_content2`)], the format of the final input prompt would be as follows: 252 ```python 253 input_text = f'''<|repo_name|>{repo_name} 254 <|file_sep|>{file_path1} 255 {file_content1} 256 <|file_sep|>{file_path2} 257 {file_content2}''' 258 ``` 259 260 <details><summary>๐๐ป Below is a complete example of a repository level code completion task: <i>:: click to expand ::</i></summary> 261 <div> 262 263 ```python 264 from transformers import AutoTokenizer, AutoModelForCausalLM 265 device = "cuda" # the device to load the model onto 266 267 # Now you do not need to add "trust_remote_code=True" 268 TOKENIZER = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-32B") 269 MODEL = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-32B", device_map="auto").eval() 270 271 # tokenize the input into tokens 272 input_text = """<|repo_name|>library-system 273 <|file_sep|>library.py 274 class Book: 275 def __init__(self, title, author, isbn, copies): 276 self.title = title 277 self.author = author 278 self.isbn = isbn 279 self.copies = copies 280 281 def __str__(self): 282 return f"Title: {self.title}, Author: {self.author}, ISBN: {self.isbn}, Copies: {self.copies}" 283 284 class Library: 285 def __init__(self): 286 self.books = [] 287 288 def add_book(self, title, author, isbn, copies): 289 book = Book(title, author, isbn, copies) 290 self.books.append(book) 291 292 def find_book(self, isbn): 293 for book in self.books: 294 if book.isbn == isbn: 295 return book 296 return None 297 298 def list_books(self): 299 return self.books 300 301 <|file_sep|>student.py 302 class Student: 303 def __init__(self, name, id): 304 self.name = name 305 self.id = id 306 self.borrowed_books = [] 307 308 def borrow_book(self, book, library): 309 if book and book.copies > 0: 310 self.borrowed_books.append(book) 311 book.copies -= 1 312 return True 313 return False 314 315 def return_book(self, book, library): 316 if book in self.borrowed_books: 317 self.borrowed_books.remove(book) 318 book.copies += 1 319 return True 320 return False 321 322 <|file_sep|>main.py 323 from library import Library 324 from student import Student 325 326 def main(): 327 # Set up the library with some books 328 library = Library() 329 library.add_book("The Great Gatsby", "F. Scott Fitzgerald", "1234567890", 3) 330 library.add_book("To Kill a Mockingbird", "Harper Lee", "1234567891", 2) 331 332 # Set up a student 333 student = Student("Alice", "S1") 334 335 # Student borrows a book 336 """ 337 model_inputs = TOKENIZER([input_text], return_tensors="pt").to(device) 338 339 # Use `max_new_tokens` to control the maximum output length. 340 generated_ids = MODEL.generate(model_inputs.input_ids, max_new_tokens=1024, do_sample=False)[0] 341 # The generated_ids include prompt_ids, so we only need to decode the tokens after prompt_ids. 342 output_text = TOKENIZER.decode(generated_ids[len(model_inputs.input_ids[0]):], skip_special_tokens=True) 343 344 print(f"Prompt: \n{input_text}\n\nGenerated text: \n{output_text}") 345 346 ``` 347 The expected output as following: 348 ```python 349 Generated text: 350 book = library.find_book("1234567890") 351 if student.borrow_book(book, library): 352 print(f"{student.name} borrowed {book.title}") 353 else: 354 print(f"{student.name} could not borrow {book.title}") 355 356 # Student returns a book 357 if student.return_book(book, library): 358 print(f"{student.name} returned {book.title}") 359 else: 360 print(f"{student.name} could not return {book.title}") 361 362 # List all books in the library 363 print("All books in the library:") 364 for book in library.list_books(): 365 print(book) 366 367 if __name__ == "__main__": 368 main() 369 ``` 370 371 </div> 372 </details> 373 374 ### ๐๐ป Deploying Qwen2.5-Coder with vLLM 375 As a family member of Qwen2.5, Qwen2.5-Coder are supported by vLLM. The detail tutorial could be found in [Qwen tutorial](https://qwen.readthedocs.io/en/latest/deployment/vllm.html). 376 Here, we give you an simple example of offline batched inference in vLLM. 377 378 #### Offline Batched Inference 379 ```python 380 from transformers import AutoTokenizer 381 from vllm import LLM, SamplingParams 382 # Initialize the tokenizer 383 tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-32B") 384 385 # Pass the default decoding hyperparameters of Qwen1.5-32B-Chat 386 # max_tokens is for the maximum length for generation. 387 sampling_params = SamplingParams(temperature=0.7, top_p=0.8, repetition_penalty=1.05, max_tokens=1024) 388 389 # Input the model name or path. Can be GPTQ or AWQ models. 390 llm = LLM(model="Qwen/Qwen2.5-Coder-32B") 391 392 # Prepare your prompts 393 prompt = "#write a quick sort algorithm.\ndef quick_sort(" 394 395 # generate outputs 396 outputs = llm.generate([prompt], sampling_params) 397 398 # Print the outputs. 399 for output in outputs: 400 prompt = output.prompt 401 generated_text = output.outputs[0].text 402 print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") 403 ``` 404 405 #### Multi-GPU Distributed Serving 406 To scale up your serving throughputs, distributed serving helps you by leveraging more GPU devices. 407 When using ultra-long sequences for inference, it might cause insufficient GPU memory. Here, we demonstrate how to run Qwen2.5-Coder-32B with tensor parallelism just by passing in the argument `tensor_parallel_size`. 408 ```python 409 llm = LLM(model="Qwen/Qwen2.5-Coder-32B", tensor_parallel_size=8) 410 ``` 411 ### ๐๐ป Gradio interface ๐ค 412 413 We also provide a Gradio <a href='https://github.com/gradio-app/gradio'><img src='https://img.shields.io/github/stars/gradio-app/gradio'></a> interface for a better experience, just run by: 414 415 ```bash 416 cd demo/chatbot/ 417 # For Linux and Windows users (and macOS with Intel??) 418 python app.py 419 420 # For macOS with Apple Silicon users, Intel not supported, this maybe 20x slower than RTX 4090 421 PYTORCH_ENABLE_MPS_FALLBACK=1 python app.py 422 ``` 423 424 We also provide a Gradio interface of artifacts mode: 425 ```bash 426 cd demo/artifacts/ 427 python app.py 428 ``` 429 430 You can specify the `--server_port`, `--share`, `--server_name` arguments to satisfy your needs! 431 432 **Or, try it out effortlessly on HuggingFace: [ใchatbot demoใ](https://huggingface.co/spaces/Qwen/Qwen2.5-Coder-demo) ๐ค [ใartifacts demoใ](https://huggingface.co/spaces/Qwen/Qwen2.5-Coder-Artifacts)** 433 434 ## Performance 435 For more information, please refer to the <a href="https://arxiv.org/abs/2409.12186">Qwen2.5-Coder Technical Report</a>. 436 437 ## Star History 438 439 [](https://star-history.com/#QwenLM/Qwen2.5-Coder&Date) 440 441 ## Citation 442 If you find our work helpful, feel free to give us a cite. 443 444 ```bibtex 445 @article{hui2024qwen2, 446 title={Qwen2. 5-Coder Technical Report}, 447 author={Hui, Binyuan and Yang, Jian and Cui, Zeyu and Yang, Jiaxi and Liu, Dayiheng and Zhang, Lei and Liu, Tianyu and Zhang, Jiajun and Yu, Bowen and Dang, Kai and others}, 448 journal={arXiv preprint arXiv:2409.12186}, 449 year={2024} 450 } 451 @article{qwen2, 452 title={Qwen2 Technical Report}, 453 author={An Yang and Baosong Yang and Binyuan Hui and Bo Zheng and Bowen Yu and Chang Zhou and Chengpeng Li and Chengyuan Li and Dayiheng Liu and Fei Huang and Guanting Dong and Haoran Wei and Huan Lin and Jialong Tang and Jialin Wang and Jian Yang and Jianhong Tu and Jianwei Zhang and Jianxin Ma and Jin Xu and Jingren Zhou and Jinze Bai and Jinzheng He and Junyang Lin and Kai Dang and Keming Lu and Keqin Chen and Kexin Yang and Mei Li and Mingfeng Xue and Na Ni and Pei Zhang and Peng Wang and Ru Peng and Rui Men and Ruize Gao and Runji Lin and Shijie Wang and Shuai Bai and Sinan Tan and Tianhang Zhu and Tianhao Li and Tianyu Liu and Wenbin Ge and Xiaodong Deng and Xiaohuan Zhou and Xingzhang Ren and Xinyu Zhang and Xipin Wei and Xuancheng Ren and Yang Fan and Yang Yao and Yichang Zhang and Yu Wan and Yunfei Chu and Yuqiong Liu and Zeyu Cui and Zhenru Zhang and Zhihao Fan}, 454 journal={arXiv preprint arXiv:2407.10671}, 455 year={2024} 456 } 457 ``` 458 459 ## Contact Us 460 If you are interested to leave a message to either our research team or product team, join our [Discord](https://discord.gg/z3GAxXZ9Ce) or [WeChat groups](https://github.com/QwenLM/Qwen/blob/main/assets/wechat.png)! 461 462 <p align="right" style="font-size: 14px; color: #555; margin-top: 20px;"> 463 <a href="#readme-top" style="text-decoration: none; color: #007bff; font-weight: bold;"> 464 โ Back to Top โ 465 </a> 466 </p>