/ examples / lllama-cpp-multiple-fn.ipynb
lllama-cpp-multiple-fn.ipynb
  1  {
  2   "cells": [
  3    {
  4     "cell_type": "markdown",
  5     "metadata": {},
  6     "source": [
  7      "## Multiple Connected Function Calling with llama.cpp\n",
  8      "### Adapted from the Ollama Notebook\n",
  9      "\n",
 10      "### Requirements\n",
 11      "\n",
 12      "#### 1. Install llama.cpp\n",
 13      "llama.cpp installation instructions per OS (macOS, Linux, Windows) can be found on [their website](https://llama-cpp-python.readthedocs.io/en/latest/). "
 14     ]
 15    },
 16    {
 17     "cell_type": "markdown",
 18     "metadata": {},
 19     "source": [
 20      "#### 2. Python llama.cpp Library\n",
 21      "\n",
 22      "For that:"
 23     ]
 24    },
 25    {
 26     "cell_type": "code",
 27     "execution_count": null,
 28     "metadata": {},
 29     "outputs": [],
 30     "source": [
 31      "%pip install llama-cpp-python"
 32     ]
 33    },
 34    {
 35     "cell_type": "markdown",
 36     "metadata": {},
 37     "source": [
 38      "#### 3. Pull the model from HuggingFace\n",
 39      "\n",
 40      "Download the GGUF NousHermes-2-Pro-Mistral-7B from HuggingFace (uploaded by adrienbrault) [here](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF). :"
 41     ]
 42    },
 43    {
 44     "cell_type": "markdown",
 45     "metadata": {},
 46     "source": [
 47      "### Usage\n",
 48      "\n",
 49      "#### 1. Define Tools"
 50     ]
 51    },
 52    {
 53     "cell_type": "code",
 54     "execution_count": 1,
 55     "metadata": {},
 56     "outputs": [],
 57     "source": [
 58      "import random\n",
 59      "\n",
 60      "def get_weather_forecast(location: str) -> dict[str, str]:\n",
 61      "    \"\"\"Retrieves the weather forecast for a given location\"\"\"\n",
 62      "    # Mock values for test\n",
 63      "    return {\n",
 64      "        \"location\": location,\n",
 65      "        \"forecast\": \"sunny\",\n",
 66      "        \"temperature\": \"25°C\",\n",
 67      "    }\n",
 68      "\n",
 69      "def get_random_city() -> str:\n",
 70      "    \"\"\"Retrieves a random city from a list of cities\"\"\"\n",
 71      "    cities = [\"Groningen\", \"Enschede\", \"Amsterdam\", \"Istanbul\", \"Baghdad\", \"Rio de Janeiro\", \"Tokyo\", \"Kampala\"]\n",
 72      "    return random.choice(cities)\n",
 73      "\n",
 74      "def get_random_number() -> int:\n",
 75      "    \"\"\"Retrieves a random number\"\"\"\n",
 76      "    # Mock value for test\n",
 77      "    return 31"
 78     ]
 79    },
 80    {
 81     "cell_type": "markdown",
 82     "metadata": {},
 83     "source": [
 84      "#### 2. Define Function Caller\n",
 85      "\n",
 86      "For this example in Jupyter format, I'm simply putting the functions in a list. In a python project, you can use the implementation here as an inspiration: https://github.com/AtakanTekparmak/ollama_langhcain_fn_calling/tree/main"
 87     ]
 88    },
 89    {
 90     "cell_type": "code",
 91     "execution_count": 2,
 92     "metadata": {},
 93     "outputs": [],
 94     "source": [
 95      "import inspect\n",
 96      "\n",
 97      "class FunctionCaller:\n",
 98      "    \"\"\"\n",
 99      "    A class to call functions from tools.py.\n",
100      "    \"\"\"\n",
101      "\n",
102      "    def __init__(self):\n",
103      "        # Initialize the functions dictionary\n",
104      "        self.functions = {\n",
105      "            \"get_weather_forecast\": get_weather_forecast,\n",
106      "            \"get_random_city\": get_random_city,\n",
107      "            \"get_random_number\": get_random_number,\n",
108      "        }\n",
109      "        self.outputs = {}\n",
110      "\n",
111      "    def create_functions_metadata(self) -> list[dict]:\n",
112      "        \"\"\"Creates the functions metadata for the prompt. \"\"\"\n",
113      "        def format_type(p_type: str) -> str:\n",
114      "            \"\"\"Format the type of the parameter.\"\"\"\n",
115      "            # If p_type begins with \"<class\", then it is a class type\n",
116      "            if p_type.startswith(\"<class\"):\n",
117      "                # Get the class name from the type\n",
118      "                p_type = p_type.split(\"'\")[1]\n",
119      "            \n",
120      "            return p_type\n",
121      "            \n",
122      "        functions_metadata = []\n",
123      "        i = 0\n",
124      "        for name, function in self.functions.items():\n",
125      "            i += 1\n",
126      "            descriptions = function.__doc__.split(\"\\n\")\n",
127      "            print(descriptions)\n",
128      "            functions_metadata.append({\n",
129      "                \"name\": name,\n",
130      "                \"description\": descriptions[0],\n",
131      "                \"parameters\": {\n",
132      "                    \"properties\": [ # Get the parameters for the function\n",
133      "                        {   \n",
134      "                            \"name\": param_name,\n",
135      "                            \"type\": format_type(str(param_type)),\n",
136      "                        }\n",
137      "                        # Remove the return type from the parameters\n",
138      "                        for param_name, param_type in function.__annotations__.items() if param_name != \"return\"\n",
139      "                    ],\n",
140      "                    \n",
141      "                    \"required\": [param_name for param_name in function.__annotations__ if param_name != \"return\"],\n",
142      "                } if function.__annotations__ else {},\n",
143      "                \"returns\": [\n",
144      "                    {\n",
145      "                        \"name\": name + \"_output\",\n",
146      "                        \"type\": {param_name: format_type(str(param_type)) for param_name, param_type in function.__annotations__.items() if param_name == \"return\"}[\"return\"]\n",
147      "                    }\n",
148      "                ]\n",
149      "            })\n",
150      "\n",
151      "        return functions_metadata\n",
152      "\n",
153      "    def call_function(self, function):\n",
154      "        \"\"\"\n",
155      "        Call the function from the given input.\n",
156      "\n",
157      "        Args:\n",
158      "            function (dict): A dictionary containing the function details.\n",
159      "        \"\"\"\n",
160      "    \n",
161      "        def check_if_input_is_output(input: dict) -> dict:\n",
162      "            \"\"\"Check if the input is an output from a previous function.\"\"\"\n",
163      "            for key, value in input.items():\n",
164      "                if value in self.outputs:\n",
165      "                    input[key] = self.outputs[value]\n",
166      "            return input\n",
167      "\n",
168      "        # Get the function name from the function dictionary\n",
169      "        function_name = function[\"name\"]\n",
170      "        \n",
171      "        # Get the function params from the function dictionary\n",
172      "        function_input = function[\"params\"] if \"params\" in function else None\n",
173      "        function_input = check_if_input_is_output(function_input) if function_input else None\n",
174      "    \n",
175      "        # Call the function from tools.py with the given input\n",
176      "        # pass all the arguments to the function from the function_input\n",
177      "        output = self.functions[function_name](**function_input) if function_input else self.functions[function_name]()\n",
178      "        self.outputs[function[\"output\"]] = output\n",
179      "        return output\n",
180      "\n",
181      "    "
182     ]
183    },
184    {
185     "cell_type": "markdown",
186     "metadata": {},
187     "source": [
188      "#### 3. Setup The Function Caller and Prompt"
189     ]
190    },
191    {
192     "cell_type": "code",
193     "execution_count": 3,
194     "metadata": {},
195     "outputs": [
196      {
197       "name": "stdout",
198       "output_type": "stream",
199       "text": [
200        "['Retrieves the weather forecast for a given location']\n",
201        "['Retrieves a random city from a list of cities']\n",
202        "['Retrieves a random number']\n"
203       ]
204      }
205     ],
206     "source": [
207      "# Initialize the FunctionCaller \n",
208      "function_caller = FunctionCaller()\n",
209      "\n",
210      "# Create the functions metadata\n",
211      "functions_metadata = function_caller.create_functions_metadata()"
212     ]
213    },
214    {
215     "cell_type": "code",
216     "execution_count": 4,
217     "metadata": {},
218     "outputs": [],
219     "source": [
220      "import json\n",
221      "\n",
222      "# Create the system prompt\n",
223      "prompt_beginning = \"\"\"\n",
224      "You are an AI assistant that can help the user with a variety of tasks. You have access to the following functions:\n",
225      "\n",
226      "\"\"\"\n",
227      "\n",
228      "system_prompt_end = \"\"\"\n",
229      "\n",
230      "When the user asks you a question, if you need to use functions, provide ONLY the function calls, and NOTHING ELSE, in the format:\n",
231      "<function_calls>    \n",
232      "[\n",
233      "    { \"name\": \"function_name_1\", \"params\": { \"param_1\": \"value_1\", \"param_2\": \"value_2\" }, \"output\": \"The output variable name, to be possibly used as input for another function},\n",
234      "    { \"name\": \"function_name_2\", \"params\": { \"param_3\": \"value_3\", \"param_4\": \"output_1\"}, \"output\": \"The output variable name, to be possibly used as input for another function\"},\n",
235      "    ...\n",
236      "]\n",
237      "\"\"\"\n",
238      "system_prompt = prompt_beginning + f\"<tools> {json.dumps(functions_metadata, indent=4)} </tools>\" + system_prompt_end"
239     ]
240    },
241    {
242     "cell_type": "markdown",
243     "metadata": {},
244     "source": [
245      "#### 4. Load the model"
246     ]
247    },
248    {
249     "cell_type": "code",
250     "execution_count": 6,
251     "metadata": {},
252     "outputs": [
253      {
254       "name": "stderr",
255       "output_type": "stream",
256       "text": [
257        "llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from Hermes-2-Pro-Llama-3-Instruct-Merged-DPO-F16.gguf (version GGUF V3 (latest))\n",
258        "llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.\n",
259        "llama_model_loader: - kv   0:                       general.architecture str              = llama\n",
260        "llama_model_loader: - kv   1:                               general.name str              = Hermes-2-Pro-Llama-3-Instruct-Merged-DPO\n",
261        "llama_model_loader: - kv   2:                          llama.block_count u32              = 32\n",
262        "llama_model_loader: - kv   3:                       llama.context_length u32              = 8192\n",
263        "llama_model_loader: - kv   4:                     llama.embedding_length u32              = 4096\n",
264        "llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336\n",
265        "llama_model_loader: - kv   6:                 llama.attention.head_count u32              = 32\n",
266        "llama_model_loader: - kv   7:              llama.attention.head_count_kv u32              = 8\n",
267        "llama_model_loader: - kv   8:                       llama.rope.freq_base f32              = 500000.000000\n",
268        "llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010\n",
269        "llama_model_loader: - kv  10:                          general.file_type u32              = 1\n",
270        "llama_model_loader: - kv  11:                           llama.vocab_size u32              = 128256\n",
271        "llama_model_loader: - kv  12:                 llama.rope.dimension_count u32              = 128\n",
272        "llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = gpt2\n",
273        "llama_model_loader: - kv  14:                         tokenizer.ggml.pre str              = llama-bpe\n",
274        "llama_model_loader: - kv  15:                      tokenizer.ggml.tokens arr[str,128256]  = [\"!\", \"\\\"\", \"#\", \"$\", \"%\", \"&\", \"'\", ...\n",
275        "llama_model_loader: - kv  16:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...\n",
276        "llama_model_loader: - kv  17:                      tokenizer.ggml.merges arr[str,280147]  = [\"Ġ Ġ\", \"Ġ ĠĠĠ\", \"ĠĠ ĠĠ\", \"...\n",
277        "llama_model_loader: - kv  18:                tokenizer.ggml.bos_token_id u32              = 128000\n",
278        "llama_model_loader: - kv  19:                tokenizer.ggml.eos_token_id u32              = 128003\n",
279        "llama_model_loader: - kv  20:            tokenizer.ggml.padding_token_id u32              = 128001\n",
280        "llama_model_loader: - kv  21:                    tokenizer.chat_template str              = {{bos_token}}{% for message in messag...\n",
281        "llama_model_loader: - type  f32:   65 tensors\n",
282        "llama_model_loader: - type  f16:  226 tensors\n",
283        "llm_load_vocab: special tokens definition check successful ( 256/128256 ).\n",
284        "llm_load_print_meta: format           = GGUF V3 (latest)\n",
285        "llm_load_print_meta: arch             = llama\n",
286        "llm_load_print_meta: vocab type       = BPE\n",
287        "llm_load_print_meta: n_vocab          = 128256\n",
288        "llm_load_print_meta: n_merges         = 280147\n",
289        "llm_load_print_meta: n_ctx_train      = 8192\n",
290        "llm_load_print_meta: n_embd           = 4096\n",
291        "llm_load_print_meta: n_head           = 32\n",
292        "llm_load_print_meta: n_head_kv        = 8\n",
293        "llm_load_print_meta: n_layer          = 32\n",
294        "llm_load_print_meta: n_rot            = 128\n",
295        "llm_load_print_meta: n_embd_head_k    = 128\n",
296        "llm_load_print_meta: n_embd_head_v    = 128\n",
297        "llm_load_print_meta: n_gqa            = 4\n",
298        "llm_load_print_meta: n_embd_k_gqa     = 1024\n",
299        "llm_load_print_meta: n_embd_v_gqa     = 1024\n",
300        "llm_load_print_meta: f_norm_eps       = 0.0e+00\n",
301        "llm_load_print_meta: f_norm_rms_eps   = 1.0e-05\n",
302        "llm_load_print_meta: f_clamp_kqv      = 0.0e+00\n",
303        "llm_load_print_meta: f_max_alibi_bias = 0.0e+00\n",
304        "llm_load_print_meta: f_logit_scale    = 0.0e+00\n",
305        "llm_load_print_meta: n_ff             = 14336\n",
306        "llm_load_print_meta: n_expert         = 0\n",
307        "llm_load_print_meta: n_expert_used    = 0\n",
308        "llm_load_print_meta: causal attn      = 1\n",
309        "llm_load_print_meta: pooling type     = 0\n",
310        "llm_load_print_meta: rope type        = 0\n",
311        "llm_load_print_meta: rope scaling     = linear\n",
312        "llm_load_print_meta: freq_base_train  = 500000.0\n",
313        "llm_load_print_meta: freq_scale_train = 1\n",
314        "llm_load_print_meta: n_yarn_orig_ctx  = 8192\n",
315        "llm_load_print_meta: rope_finetuned   = unknown\n",
316        "llm_load_print_meta: ssm_d_conv       = 0\n",
317        "llm_load_print_meta: ssm_d_inner      = 0\n",
318        "llm_load_print_meta: ssm_d_state      = 0\n",
319        "llm_load_print_meta: ssm_dt_rank      = 0\n",
320        "llm_load_print_meta: model type       = 8B\n",
321        "llm_load_print_meta: model ftype      = F16\n",
322        "llm_load_print_meta: model params     = 8.03 B\n",
323        "llm_load_print_meta: model size       = 14.96 GiB (16.00 BPW) \n",
324        "llm_load_print_meta: general.name     = Hermes-2-Pro-Llama-3-Instruct-Merged-DPO\n",
325        "llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'\n",
326        "llm_load_print_meta: EOS token        = 128003 '<|im_end|>'\n",
327        "llm_load_print_meta: PAD token        = 128001 '<|end_of_text|>'\n",
328        "llm_load_print_meta: LF token         = 128 'Ä'\n",
329        "llm_load_print_meta: EOT token        = 128003 '<|im_end|>'\n",
330        "llm_load_tensors: ggml ctx size =    0.30 MiB\n",
331        "ggml_backend_metal_log_allocated_size: allocated buffer, size = 14315.02 MiB, (14315.08 / 49152.00)\n",
332        "llm_load_tensors: offloading 32 repeating layers to GPU\n",
333        "llm_load_tensors: offloaded 32/33 layers to GPU\n",
334        "llm_load_tensors:        CPU buffer size = 15317.02 MiB\n",
335        "llm_load_tensors:      Metal buffer size = 14315.00 MiB\n",
336        ".........................................................................................\n",
337        "llama_new_context_with_model: n_ctx      = 8192\n",
338        "llama_new_context_with_model: n_batch    = 512\n",
339        "llama_new_context_with_model: n_ubatch   = 512\n",
340        "llama_new_context_with_model: flash_attn = 1\n",
341        "llama_new_context_with_model: freq_base  = 500000.0\n",
342        "llama_new_context_with_model: freq_scale = 1\n",
343        "ggml_metal_init: allocating\n",
344        "ggml_metal_init: found device: Apple M1 Max\n",
345        "ggml_metal_init: picking default device: Apple M1 Max\n",
346        "ggml_metal_init: using embedded metal library\n",
347        "ggml_metal_init: GPU name:   Apple M1 Max\n",
348        "ggml_metal_init: GPU family: MTLGPUFamilyApple7  (1007)\n",
349        "ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)\n",
350        "ggml_metal_init: GPU family: MTLGPUFamilyMetal3  (5001)\n",
351        "ggml_metal_init: simdgroup reduction support   = true\n",
352        "ggml_metal_init: simdgroup matrix mul. support = true\n",
353        "ggml_metal_init: hasUnifiedMemory              = true\n",
354        "ggml_metal_init: recommendedMaxWorkingSetSize  = 51539.61 MB\n",
355        "llama_kv_cache_init:      Metal KV buffer size =  1024.00 MiB\n",
356        "llama_new_context_with_model: KV self size  = 1024.00 MiB, K (f16):  512.00 MiB, V (f16):  512.00 MiB\n",
357        "llama_new_context_with_model:        CPU  output buffer size =     0.49 MiB\n",
358        "llama_new_context_with_model:      Metal compute buffer size =    88.00 MiB\n",
359        "llama_new_context_with_model:        CPU compute buffer size =   258.50 MiB\n",
360        "llama_new_context_with_model: graph nodes  = 903\n",
361        "llama_new_context_with_model: graph splits = 3\n",
362        "AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | \n",
363        "Model metadata: {'tokenizer.chat_template': \"{{bos_token}}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\\n' + message['content'] + '<|im_end|>' + '\\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\\n' }}{% endif %}\", 'tokenizer.ggml.padding_token_id': '128001', 'tokenizer.ggml.eos_token_id': '128003', 'tokenizer.ggml.bos_token_id': '128000', 'tokenizer.ggml.pre': 'llama-bpe', 'tokenizer.ggml.model': 'gpt2', 'llama.vocab_size': '128256', 'llama.attention.head_count_kv': '8', 'llama.context_length': '8192', 'llama.attention.head_count': '32', 'general.file_type': '1', 'llama.feed_forward_length': '14336', 'llama.rope.dimension_count': '128', 'llama.rope.freq_base': '500000.000000', 'llama.embedding_length': '4096', 'general.architecture': 'llama', 'llama.attention.layer_norm_rms_epsilon': '0.000010', 'general.name': 'Hermes-2-Pro-Llama-3-Instruct-Merged-DPO', 'llama.block_count': '32'}\n",
364        "Available chat formats from metadata: chat_template.default\n",
365        "Using gguf chat template: {{bos_token}}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n",
366        "' + message['content'] + '<|im_end|>' + '\n",
367        "'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n",
368        "' }}{% endif %}\n",
369        "Using chat eos_token: <|im_end|>\n",
370        "Using chat bos_token: <|begin_of_text|>\n"
371       ]
372      }
373     ],
374     "source": [
375      "import llama_cpp\n",
376      "model = llama_cpp.Llama(\n",
377      "    model_path='Hermes-2-Pro-Llama-3-Instruct-Merged-DPO-F16.gguf',\n",
378      "    n_gpu_layers=32,\n",
379      "    n_threads=10,\n",
380      "    use_mlock=True,\n",
381      "    flash_attn=True,\n",
382      "    n_ctx=8192,\n",
383      ")"
384     ]
385    },
386    {
387     "cell_type": "markdown",
388     "metadata": {},
389     "source": [
390      "#### Inference"
391     ]
392    },
393    {
394     "cell_type": "code",
395     "execution_count": 7,
396     "metadata": {},
397     "outputs": [
398      {
399       "name": "stderr",
400       "output_type": "stream",
401       "text": [
402        "\n",
403        "llama_print_timings:        load time =    1707.16 ms\n",
404        "llama_print_timings:      sample time =      20.74 ms /    63 runs   (    0.33 ms per token,  3037.75 tokens per second)\n",
405        "llama_print_timings: prompt eval time =    1706.23 ms /   450 tokens (    3.79 ms per token,   263.74 tokens per second)\n",
406        "llama_print_timings:        eval time =    4579.55 ms /    62 runs   (   73.86 ms per token,    13.54 tokens per second)\n",
407        "llama_print_timings:       total time =    6663.81 ms /   512 tokens\n"
408       ]
409      },
410      {
411       "name": "stdout",
412       "output_type": "stream",
413       "text": [
414        "{'id': 'chatcmpl-8e53b561-32d6-433b-802d-5bb982660363', 'object': 'chat.completion', 'created': 1717239441, 'model': 'Hermes-2-Pro-Llama-3-Instruct-Merged-DPO-F16.gguf', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': '[\\n    {\\n        \"name\": \"get_random_city\",\\n        \"params\": {},\\n        \"output\": \"random_city\"\\n    },\\n    {\\n        \"name\": \"get_weather_forecast\",\\n        \"params\": {\"location\": \"random_city\"},\\n        \"output\": \"weather_forecast\"\\n    }\\n]'}, 'logprobs': None, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 450, 'completion_tokens': 62, 'total_tokens': 512}}\n"
415       ]
416      }
417     ],
418     "source": [
419      "\n",
420      "# Compose the prompt \n",
421      "user_query = \"Whats the temperature in a random city?\"\n",
422      "\n",
423      "# Get the response from the model\n",
424      "model_name = 'adrienbrault/nous-hermes2pro:Q8_0'\n",
425      "messages = [\n",
426      "    {'role': 'system', 'content': system_prompt,\n",
427      "    },\n",
428      "    {'role': 'user', 'content': user_query}\n",
429      "]\n",
430      "response = model.create_chat_completion(messages=messages)\n",
431      "print(response)\n",
432      "# Get the function calls from the response\n"
433     ]
434    },
435    {
436     "cell_type": "code",
437     "execution_count": 8,
438     "metadata": {},
439     "outputs": [
440      {
441       "data": {
442        "text/plain": [
443         "'[\\n    {\\n        \"name\": \"get_random_city\",\\n        \"params\": {},\\n        \"output\": \"random_city\"\\n    },\\n    {\\n        \"name\": \"get_weather_forecast\",\\n        \"params\": {\"location\": \"random_city\"},\\n        \"output\": \"weather_forecast\"\\n    }\\n]'"
444        ]
445       },
446       "execution_count": 8,
447       "metadata": {},
448       "output_type": "execute_result"
449      }
450     ],
451     "source": [
452      "response['choices'][0]['message']['content']"
453     ]
454    },
455    {
456     "cell_type": "code",
457     "execution_count": 9,
458     "metadata": {},
459     "outputs": [
460      {
461       "name": "stdout",
462       "output_type": "stream",
463       "text": [
464        "Function calls:\n",
465        "[{'name': 'get_random_city', 'params': {}, 'output': 'random_city'}, {'name': 'get_weather_forecast', 'params': {'location': 'random_city'}, 'output': 'weather_forecast'}]\n"
466       ]
467      }
468     ],
469     "source": [
470      "function_calls = response['choices'][0]['message']['content']\n",
471      "# If it ends with a <function_calls>, get everything before it\n",
472      "if function_calls.startswith(\"<function_calls>\"):\n",
473      "    function_calls = function_calls.split(\"<function_calls>\")[1]\n",
474      "\n",
475      "# Read function calls as json\n",
476      "try:\n",
477      "    function_calls_json: list[dict[str, str]] = json.loads(function_calls)\n",
478      "except json.JSONDecodeError:\n",
479      "    function_calls_json = []\n",
480      "    print (\"Model response not in desired JSON format\")\n",
481      "finally:\n",
482      "    print(\"Function calls:\")\n",
483      "    print(function_calls_json)"
484     ]
485    },
486    {
487     "cell_type": "markdown",
488     "metadata": {},
489     "source": [
490      "#### Append the assistant message to the chat and call the functions"
491     ]
492    },
493    {
494     "cell_type": "code",
495     "execution_count": 14,
496     "metadata": {},
497     "outputs": [],
498     "source": [
499      "# add <tool_call> to the function calls as mentioned in the chat template in Hugging Face\n",
500      "function_message = '<tool_call>' + str(function_calls_json) + '</tool_call>'\n",
501      "\n",
502      "messages.append({'role': 'assistant', 'content': function_message})"
503     ]
504    },
505    {
506     "cell_type": "code",
507     "execution_count": 15,
508     "metadata": {},
509     "outputs": [
510      {
511       "name": "stdout",
512       "output_type": "stream",
513       "text": [
514        "Tool Response: Amsterdam\n",
515        "Tool Response: {'location': 'Groningen', 'forecast': 'sunny', 'temperature': '25°C'}\n"
516       ]
517      }
518     ],
519     "source": [
520      "for function in function_calls_json:\n",
521      "    output = f\"Tool Response: {function_caller.call_function(function)}\"\n",
522      "    print(output)"
523     ]
524    },
525    {
526     "cell_type": "code",
527     "execution_count": 16,
528     "metadata": {},
529     "outputs": [],
530     "source": [
531      "# Call the functions\n",
532      "output = \"\"\n",
533      "for function in function_calls_json:\n",
534      "    output = f\"{function_caller.call_function(function)}\"\n",
535      "\n",
536      "#Append the tool response to the messages with the chat format\n",
537      "tool_output = '<tool_response> ' + output + ' </tool_response>'\n",
538      "messages.append({'role': 'tool', 'content': tool_output})\n"
539     ]
540    },
541    {
542     "cell_type": "code",
543     "execution_count": 18,
544     "metadata": {},
545     "outputs": [
546      {
547       "data": {
548        "text/plain": [
549         "[{'role': 'system',\n",
550         "  'content': '\\nYou are an AI assistant that can help the user with a variety of tasks. You have access to the following functions:\\n\\n<tools> [\\n    {\\n        \"name\": \"get_weather_forecast\",\\n        \"description\": \"Retrieves the weather forecast for a given location\",\\n        \"parameters\": {\\n            \"properties\": [\\n                {\\n                    \"name\": \"location\",\\n                    \"type\": \"str\"\\n                }\\n            ],\\n            \"required\": [\\n                \"location\"\\n            ]\\n        },\\n        \"returns\": [\\n            {\\n                \"name\": \"get_weather_forecast_output\",\\n                \"type\": \"dict[str, str]\"\\n            }\\n        ]\\n    },\\n    {\\n        \"name\": \"get_random_city\",\\n        \"description\": \"Retrieves a random city from a list of cities\",\\n        \"parameters\": {\\n            \"properties\": [],\\n            \"required\": []\\n        },\\n        \"returns\": [\\n            {\\n                \"name\": \"get_random_city_output\",\\n                \"type\": \"str\"\\n            }\\n        ]\\n    },\\n    {\\n        \"name\": \"get_random_number\",\\n        \"description\": \"Retrieves a random number\",\\n        \"parameters\": {\\n            \"properties\": [],\\n            \"required\": []\\n        },\\n        \"returns\": [\\n            {\\n                \"name\": \"get_random_number_output\",\\n                \"type\": \"int\"\\n            }\\n        ]\\n    }\\n] </tools>\\n\\nWhen the user asks you a question, if you need to use functions, provide ONLY the function calls, and NOTHING ELSE, in the format:\\n<function_calls>    \\n[\\n    { \"name\": \"function_name_1\", \"params\": { \"param_1\": \"value_1\", \"param_2\": \"value_2\" }, \"output\": \"The output variable name, to be possibly used as input for another function},\\n    { \"name\": \"function_name_2\", \"params\": { \"param_3\": \"value_3\", \"param_4\": \"output_1\"}, \"output\": \"The output variable name, to be possibly used as input for another function\"},\\n    ...\\n]\\n'},\n",
551         " {'role': 'user', 'content': 'Whats the temperature in a random city?'},\n",
552         " {'role': 'assistant',\n",
553         "  'content': \"<tool_call>[{'name': 'get_random_city', 'params': {}, 'output': 'random_city'}, {'name': 'get_weather_forecast', 'params': {'location': 'Groningen'}, 'output': 'weather_forecast'}]</tool_call>\"},\n",
554         " {'role': 'tool',\n",
555         "  'content': \"<tool_response> {'location': 'Groningen', 'forecast': 'sunny', 'temperature': '25°C'} </tool_response>\"}]"
556        ]
557       },
558       "execution_count": 18,
559       "metadata": {},
560       "output_type": "execute_result"
561      }
562     ],
563     "source": [
564      "messages"
565     ]
566    },
567    {
568     "cell_type": "markdown",
569     "metadata": {},
570     "source": [
571      "#### Inference the model again with the tool respones"
572     ]
573    },
574    {
575     "cell_type": "code",
576     "execution_count": 19,
577     "metadata": {},
578     "outputs": [
579      {
580       "name": "stderr",
581       "output_type": "stream",
582       "text": [
583        "Llama.generate: prefix-match hit\n",
584        "\n",
585        "llama_print_timings:        load time =    1707.16 ms\n",
586        "llama_print_timings:      sample time =      14.45 ms /    20 runs   (    0.72 ms per token,  1384.27 tokens per second)\n",
587        "llama_print_timings: prompt eval time =     775.81 ms /    86 tokens (    9.02 ms per token,   110.85 tokens per second)\n",
588        "llama_print_timings:        eval time =    1416.85 ms /    19 runs   (   74.57 ms per token,    13.41 tokens per second)\n",
589        "llama_print_timings:       total time =    2321.63 ms /   105 tokens\n"
590       ]
591      },
592      {
593       "data": {
594        "text/plain": [
595         "{'id': 'chatcmpl-25dfeae2-2184-497f-b838-e08565ad078c',\n",
596         " 'object': 'chat.completion',\n",
597         " 'created': 1717239671,\n",
598         " 'model': 'Hermes-2-Pro-Llama-3-Instruct-Merged-DPO-F16.gguf',\n",
599         " 'choices': [{'index': 0,\n",
600         "   'message': {'role': 'assistant',\n",
601         "    'content': \"The temperature in the random city of Groningen is currently 25°C and it's sunny.\"},\n",
602         "   'logprobs': None,\n",
603         "   'finish_reason': 'stop'}],\n",
604         " 'usage': {'prompt_tokens': 536, 'completion_tokens': 19, 'total_tokens': 555}}"
605        ]
606       },
607       "execution_count": 19,
608       "metadata": {},
609       "output_type": "execute_result"
610      }
611     ],
612     "source": [
613      "response=model.create_chat_completion(messages=messages,temperature=0)\n",
614      "response"
615     ]
616    }
617   ],
618   "metadata": {
619    "kernelspec": {
620     "display_name": "Python 3",
621     "language": "python",
622     "name": "python3"
623    },
624    "language_info": {
625     "codemirror_mode": {
626      "name": "ipython",
627      "version": 3
628     },
629     "file_extension": ".py",
630     "mimetype": "text/x-python",
631     "name": "python",
632     "nbconvert_exporter": "python",
633     "pygments_lexer": "ipython3",
634     "version": "3.11.9"
635    }
636   },
637   "nbformat": 4,
638   "nbformat_minor": 2
639  }