lllama-cpp-multiple-fn.ipynb
1 { 2 "cells": [ 3 { 4 "cell_type": "markdown", 5 "metadata": {}, 6 "source": [ 7 "## Multiple Connected Function Calling with llama.cpp\n", 8 "### Adapted from the Ollama Notebook\n", 9 "\n", 10 "### Requirements\n", 11 "\n", 12 "#### 1. Install llama.cpp\n", 13 "llama.cpp installation instructions per OS (macOS, Linux, Windows) can be found on [their website](https://llama-cpp-python.readthedocs.io/en/latest/). " 14 ] 15 }, 16 { 17 "cell_type": "markdown", 18 "metadata": {}, 19 "source": [ 20 "#### 2. Python llama.cpp Library\n", 21 "\n", 22 "For that:" 23 ] 24 }, 25 { 26 "cell_type": "code", 27 "execution_count": null, 28 "metadata": {}, 29 "outputs": [], 30 "source": [ 31 "%pip install llama-cpp-python" 32 ] 33 }, 34 { 35 "cell_type": "markdown", 36 "metadata": {}, 37 "source": [ 38 "#### 3. Pull the model from HuggingFace\n", 39 "\n", 40 "Download the GGUF NousHermes-2-Pro-Mistral-7B from HuggingFace (uploaded by adrienbrault) [here](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF). :" 41 ] 42 }, 43 { 44 "cell_type": "markdown", 45 "metadata": {}, 46 "source": [ 47 "### Usage\n", 48 "\n", 49 "#### 1. Define Tools" 50 ] 51 }, 52 { 53 "cell_type": "code", 54 "execution_count": 1, 55 "metadata": {}, 56 "outputs": [], 57 "source": [ 58 "import random\n", 59 "\n", 60 "def get_weather_forecast(location: str) -> dict[str, str]:\n", 61 " \"\"\"Retrieves the weather forecast for a given location\"\"\"\n", 62 " # Mock values for test\n", 63 " return {\n", 64 " \"location\": location,\n", 65 " \"forecast\": \"sunny\",\n", 66 " \"temperature\": \"25°C\",\n", 67 " }\n", 68 "\n", 69 "def get_random_city() -> str:\n", 70 " \"\"\"Retrieves a random city from a list of cities\"\"\"\n", 71 " cities = [\"Groningen\", \"Enschede\", \"Amsterdam\", \"Istanbul\", \"Baghdad\", \"Rio de Janeiro\", \"Tokyo\", \"Kampala\"]\n", 72 " return random.choice(cities)\n", 73 "\n", 74 "def get_random_number() -> int:\n", 75 " \"\"\"Retrieves a random number\"\"\"\n", 76 " # Mock value for test\n", 77 " return 31" 78 ] 79 }, 80 { 81 "cell_type": "markdown", 82 "metadata": {}, 83 "source": [ 84 "#### 2. Define Function Caller\n", 85 "\n", 86 "For this example in Jupyter format, I'm simply putting the functions in a list. In a python project, you can use the implementation here as an inspiration: https://github.com/AtakanTekparmak/ollama_langhcain_fn_calling/tree/main" 87 ] 88 }, 89 { 90 "cell_type": "code", 91 "execution_count": 2, 92 "metadata": {}, 93 "outputs": [], 94 "source": [ 95 "import inspect\n", 96 "\n", 97 "class FunctionCaller:\n", 98 " \"\"\"\n", 99 " A class to call functions from tools.py.\n", 100 " \"\"\"\n", 101 "\n", 102 " def __init__(self):\n", 103 " # Initialize the functions dictionary\n", 104 " self.functions = {\n", 105 " \"get_weather_forecast\": get_weather_forecast,\n", 106 " \"get_random_city\": get_random_city,\n", 107 " \"get_random_number\": get_random_number,\n", 108 " }\n", 109 " self.outputs = {}\n", 110 "\n", 111 " def create_functions_metadata(self) -> list[dict]:\n", 112 " \"\"\"Creates the functions metadata for the prompt. \"\"\"\n", 113 " def format_type(p_type: str) -> str:\n", 114 " \"\"\"Format the type of the parameter.\"\"\"\n", 115 " # If p_type begins with \"<class\", then it is a class type\n", 116 " if p_type.startswith(\"<class\"):\n", 117 " # Get the class name from the type\n", 118 " p_type = p_type.split(\"'\")[1]\n", 119 " \n", 120 " return p_type\n", 121 " \n", 122 " functions_metadata = []\n", 123 " i = 0\n", 124 " for name, function in self.functions.items():\n", 125 " i += 1\n", 126 " descriptions = function.__doc__.split(\"\\n\")\n", 127 " print(descriptions)\n", 128 " functions_metadata.append({\n", 129 " \"name\": name,\n", 130 " \"description\": descriptions[0],\n", 131 " \"parameters\": {\n", 132 " \"properties\": [ # Get the parameters for the function\n", 133 " { \n", 134 " \"name\": param_name,\n", 135 " \"type\": format_type(str(param_type)),\n", 136 " }\n", 137 " # Remove the return type from the parameters\n", 138 " for param_name, param_type in function.__annotations__.items() if param_name != \"return\"\n", 139 " ],\n", 140 " \n", 141 " \"required\": [param_name for param_name in function.__annotations__ if param_name != \"return\"],\n", 142 " } if function.__annotations__ else {},\n", 143 " \"returns\": [\n", 144 " {\n", 145 " \"name\": name + \"_output\",\n", 146 " \"type\": {param_name: format_type(str(param_type)) for param_name, param_type in function.__annotations__.items() if param_name == \"return\"}[\"return\"]\n", 147 " }\n", 148 " ]\n", 149 " })\n", 150 "\n", 151 " return functions_metadata\n", 152 "\n", 153 " def call_function(self, function):\n", 154 " \"\"\"\n", 155 " Call the function from the given input.\n", 156 "\n", 157 " Args:\n", 158 " function (dict): A dictionary containing the function details.\n", 159 " \"\"\"\n", 160 " \n", 161 " def check_if_input_is_output(input: dict) -> dict:\n", 162 " \"\"\"Check if the input is an output from a previous function.\"\"\"\n", 163 " for key, value in input.items():\n", 164 " if value in self.outputs:\n", 165 " input[key] = self.outputs[value]\n", 166 " return input\n", 167 "\n", 168 " # Get the function name from the function dictionary\n", 169 " function_name = function[\"name\"]\n", 170 " \n", 171 " # Get the function params from the function dictionary\n", 172 " function_input = function[\"params\"] if \"params\" in function else None\n", 173 " function_input = check_if_input_is_output(function_input) if function_input else None\n", 174 " \n", 175 " # Call the function from tools.py with the given input\n", 176 " # pass all the arguments to the function from the function_input\n", 177 " output = self.functions[function_name](**function_input) if function_input else self.functions[function_name]()\n", 178 " self.outputs[function[\"output\"]] = output\n", 179 " return output\n", 180 "\n", 181 " " 182 ] 183 }, 184 { 185 "cell_type": "markdown", 186 "metadata": {}, 187 "source": [ 188 "#### 3. Setup The Function Caller and Prompt" 189 ] 190 }, 191 { 192 "cell_type": "code", 193 "execution_count": 3, 194 "metadata": {}, 195 "outputs": [ 196 { 197 "name": "stdout", 198 "output_type": "stream", 199 "text": [ 200 "['Retrieves the weather forecast for a given location']\n", 201 "['Retrieves a random city from a list of cities']\n", 202 "['Retrieves a random number']\n" 203 ] 204 } 205 ], 206 "source": [ 207 "# Initialize the FunctionCaller \n", 208 "function_caller = FunctionCaller()\n", 209 "\n", 210 "# Create the functions metadata\n", 211 "functions_metadata = function_caller.create_functions_metadata()" 212 ] 213 }, 214 { 215 "cell_type": "code", 216 "execution_count": 4, 217 "metadata": {}, 218 "outputs": [], 219 "source": [ 220 "import json\n", 221 "\n", 222 "# Create the system prompt\n", 223 "prompt_beginning = \"\"\"\n", 224 "You are an AI assistant that can help the user with a variety of tasks. You have access to the following functions:\n", 225 "\n", 226 "\"\"\"\n", 227 "\n", 228 "system_prompt_end = \"\"\"\n", 229 "\n", 230 "When the user asks you a question, if you need to use functions, provide ONLY the function calls, and NOTHING ELSE, in the format:\n", 231 "<function_calls> \n", 232 "[\n", 233 " { \"name\": \"function_name_1\", \"params\": { \"param_1\": \"value_1\", \"param_2\": \"value_2\" }, \"output\": \"The output variable name, to be possibly used as input for another function},\n", 234 " { \"name\": \"function_name_2\", \"params\": { \"param_3\": \"value_3\", \"param_4\": \"output_1\"}, \"output\": \"The output variable name, to be possibly used as input for another function\"},\n", 235 " ...\n", 236 "]\n", 237 "\"\"\"\n", 238 "system_prompt = prompt_beginning + f\"<tools> {json.dumps(functions_metadata, indent=4)} </tools>\" + system_prompt_end" 239 ] 240 }, 241 { 242 "cell_type": "markdown", 243 "metadata": {}, 244 "source": [ 245 "#### 4. Load the model" 246 ] 247 }, 248 { 249 "cell_type": "code", 250 "execution_count": 6, 251 "metadata": {}, 252 "outputs": [ 253 { 254 "name": "stderr", 255 "output_type": "stream", 256 "text": [ 257 "llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from Hermes-2-Pro-Llama-3-Instruct-Merged-DPO-F16.gguf (version GGUF V3 (latest))\n", 258 "llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.\n", 259 "llama_model_loader: - kv 0: general.architecture str = llama\n", 260 "llama_model_loader: - kv 1: general.name str = Hermes-2-Pro-Llama-3-Instruct-Merged-DPO\n", 261 "llama_model_loader: - kv 2: llama.block_count u32 = 32\n", 262 "llama_model_loader: - kv 3: llama.context_length u32 = 8192\n", 263 "llama_model_loader: - kv 4: llama.embedding_length u32 = 4096\n", 264 "llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336\n", 265 "llama_model_loader: - kv 6: llama.attention.head_count u32 = 32\n", 266 "llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 8\n", 267 "llama_model_loader: - kv 8: llama.rope.freq_base f32 = 500000.000000\n", 268 "llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010\n", 269 "llama_model_loader: - kv 10: general.file_type u32 = 1\n", 270 "llama_model_loader: - kv 11: llama.vocab_size u32 = 128256\n", 271 "llama_model_loader: - kv 12: llama.rope.dimension_count u32 = 128\n", 272 "llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2\n", 273 "llama_model_loader: - kv 14: tokenizer.ggml.pre str = llama-bpe\n", 274 "llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,128256] = [\"!\", \"\\\"\", \"#\", \"$\", \"%\", \"&\", \"'\", ...\n", 275 "llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...\n", 276 "llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,280147] = [\"Ġ Ġ\", \"Ġ ĠĠĠ\", \"ĠĠ ĠĠ\", \"...\n", 277 "llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 128000\n", 278 "llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 128003\n", 279 "llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 128001\n", 280 "llama_model_loader: - kv 21: tokenizer.chat_template str = {{bos_token}}{% for message in messag...\n", 281 "llama_model_loader: - type f32: 65 tensors\n", 282 "llama_model_loader: - type f16: 226 tensors\n", 283 "llm_load_vocab: special tokens definition check successful ( 256/128256 ).\n", 284 "llm_load_print_meta: format = GGUF V3 (latest)\n", 285 "llm_load_print_meta: arch = llama\n", 286 "llm_load_print_meta: vocab type = BPE\n", 287 "llm_load_print_meta: n_vocab = 128256\n", 288 "llm_load_print_meta: n_merges = 280147\n", 289 "llm_load_print_meta: n_ctx_train = 8192\n", 290 "llm_load_print_meta: n_embd = 4096\n", 291 "llm_load_print_meta: n_head = 32\n", 292 "llm_load_print_meta: n_head_kv = 8\n", 293 "llm_load_print_meta: n_layer = 32\n", 294 "llm_load_print_meta: n_rot = 128\n", 295 "llm_load_print_meta: n_embd_head_k = 128\n", 296 "llm_load_print_meta: n_embd_head_v = 128\n", 297 "llm_load_print_meta: n_gqa = 4\n", 298 "llm_load_print_meta: n_embd_k_gqa = 1024\n", 299 "llm_load_print_meta: n_embd_v_gqa = 1024\n", 300 "llm_load_print_meta: f_norm_eps = 0.0e+00\n", 301 "llm_load_print_meta: f_norm_rms_eps = 1.0e-05\n", 302 "llm_load_print_meta: f_clamp_kqv = 0.0e+00\n", 303 "llm_load_print_meta: f_max_alibi_bias = 0.0e+00\n", 304 "llm_load_print_meta: f_logit_scale = 0.0e+00\n", 305 "llm_load_print_meta: n_ff = 14336\n", 306 "llm_load_print_meta: n_expert = 0\n", 307 "llm_load_print_meta: n_expert_used = 0\n", 308 "llm_load_print_meta: causal attn = 1\n", 309 "llm_load_print_meta: pooling type = 0\n", 310 "llm_load_print_meta: rope type = 0\n", 311 "llm_load_print_meta: rope scaling = linear\n", 312 "llm_load_print_meta: freq_base_train = 500000.0\n", 313 "llm_load_print_meta: freq_scale_train = 1\n", 314 "llm_load_print_meta: n_yarn_orig_ctx = 8192\n", 315 "llm_load_print_meta: rope_finetuned = unknown\n", 316 "llm_load_print_meta: ssm_d_conv = 0\n", 317 "llm_load_print_meta: ssm_d_inner = 0\n", 318 "llm_load_print_meta: ssm_d_state = 0\n", 319 "llm_load_print_meta: ssm_dt_rank = 0\n", 320 "llm_load_print_meta: model type = 8B\n", 321 "llm_load_print_meta: model ftype = F16\n", 322 "llm_load_print_meta: model params = 8.03 B\n", 323 "llm_load_print_meta: model size = 14.96 GiB (16.00 BPW) \n", 324 "llm_load_print_meta: general.name = Hermes-2-Pro-Llama-3-Instruct-Merged-DPO\n", 325 "llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'\n", 326 "llm_load_print_meta: EOS token = 128003 '<|im_end|>'\n", 327 "llm_load_print_meta: PAD token = 128001 '<|end_of_text|>'\n", 328 "llm_load_print_meta: LF token = 128 'Ä'\n", 329 "llm_load_print_meta: EOT token = 128003 '<|im_end|>'\n", 330 "llm_load_tensors: ggml ctx size = 0.30 MiB\n", 331 "ggml_backend_metal_log_allocated_size: allocated buffer, size = 14315.02 MiB, (14315.08 / 49152.00)\n", 332 "llm_load_tensors: offloading 32 repeating layers to GPU\n", 333 "llm_load_tensors: offloaded 32/33 layers to GPU\n", 334 "llm_load_tensors: CPU buffer size = 15317.02 MiB\n", 335 "llm_load_tensors: Metal buffer size = 14315.00 MiB\n", 336 ".........................................................................................\n", 337 "llama_new_context_with_model: n_ctx = 8192\n", 338 "llama_new_context_with_model: n_batch = 512\n", 339 "llama_new_context_with_model: n_ubatch = 512\n", 340 "llama_new_context_with_model: flash_attn = 1\n", 341 "llama_new_context_with_model: freq_base = 500000.0\n", 342 "llama_new_context_with_model: freq_scale = 1\n", 343 "ggml_metal_init: allocating\n", 344 "ggml_metal_init: found device: Apple M1 Max\n", 345 "ggml_metal_init: picking default device: Apple M1 Max\n", 346 "ggml_metal_init: using embedded metal library\n", 347 "ggml_metal_init: GPU name: Apple M1 Max\n", 348 "ggml_metal_init: GPU family: MTLGPUFamilyApple7 (1007)\n", 349 "ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)\n", 350 "ggml_metal_init: GPU family: MTLGPUFamilyMetal3 (5001)\n", 351 "ggml_metal_init: simdgroup reduction support = true\n", 352 "ggml_metal_init: simdgroup matrix mul. support = true\n", 353 "ggml_metal_init: hasUnifiedMemory = true\n", 354 "ggml_metal_init: recommendedMaxWorkingSetSize = 51539.61 MB\n", 355 "llama_kv_cache_init: Metal KV buffer size = 1024.00 MiB\n", 356 "llama_new_context_with_model: KV self size = 1024.00 MiB, K (f16): 512.00 MiB, V (f16): 512.00 MiB\n", 357 "llama_new_context_with_model: CPU output buffer size = 0.49 MiB\n", 358 "llama_new_context_with_model: Metal compute buffer size = 88.00 MiB\n", 359 "llama_new_context_with_model: CPU compute buffer size = 258.50 MiB\n", 360 "llama_new_context_with_model: graph nodes = 903\n", 361 "llama_new_context_with_model: graph splits = 3\n", 362 "AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | \n", 363 "Model metadata: {'tokenizer.chat_template': \"{{bos_token}}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\\n' + message['content'] + '<|im_end|>' + '\\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\\n' }}{% endif %}\", 'tokenizer.ggml.padding_token_id': '128001', 'tokenizer.ggml.eos_token_id': '128003', 'tokenizer.ggml.bos_token_id': '128000', 'tokenizer.ggml.pre': 'llama-bpe', 'tokenizer.ggml.model': 'gpt2', 'llama.vocab_size': '128256', 'llama.attention.head_count_kv': '8', 'llama.context_length': '8192', 'llama.attention.head_count': '32', 'general.file_type': '1', 'llama.feed_forward_length': '14336', 'llama.rope.dimension_count': '128', 'llama.rope.freq_base': '500000.000000', 'llama.embedding_length': '4096', 'general.architecture': 'llama', 'llama.attention.layer_norm_rms_epsilon': '0.000010', 'general.name': 'Hermes-2-Pro-Llama-3-Instruct-Merged-DPO', 'llama.block_count': '32'}\n", 364 "Available chat formats from metadata: chat_template.default\n", 365 "Using gguf chat template: {{bos_token}}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n", 366 "' + message['content'] + '<|im_end|>' + '\n", 367 "'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n", 368 "' }}{% endif %}\n", 369 "Using chat eos_token: <|im_end|>\n", 370 "Using chat bos_token: <|begin_of_text|>\n" 371 ] 372 } 373 ], 374 "source": [ 375 "import llama_cpp\n", 376 "model = llama_cpp.Llama(\n", 377 " model_path='Hermes-2-Pro-Llama-3-Instruct-Merged-DPO-F16.gguf',\n", 378 " n_gpu_layers=32,\n", 379 " n_threads=10,\n", 380 " use_mlock=True,\n", 381 " flash_attn=True,\n", 382 " n_ctx=8192,\n", 383 ")" 384 ] 385 }, 386 { 387 "cell_type": "markdown", 388 "metadata": {}, 389 "source": [ 390 "#### Inference" 391 ] 392 }, 393 { 394 "cell_type": "code", 395 "execution_count": 7, 396 "metadata": {}, 397 "outputs": [ 398 { 399 "name": "stderr", 400 "output_type": "stream", 401 "text": [ 402 "\n", 403 "llama_print_timings: load time = 1707.16 ms\n", 404 "llama_print_timings: sample time = 20.74 ms / 63 runs ( 0.33 ms per token, 3037.75 tokens per second)\n", 405 "llama_print_timings: prompt eval time = 1706.23 ms / 450 tokens ( 3.79 ms per token, 263.74 tokens per second)\n", 406 "llama_print_timings: eval time = 4579.55 ms / 62 runs ( 73.86 ms per token, 13.54 tokens per second)\n", 407 "llama_print_timings: total time = 6663.81 ms / 512 tokens\n" 408 ] 409 }, 410 { 411 "name": "stdout", 412 "output_type": "stream", 413 "text": [ 414 "{'id': 'chatcmpl-8e53b561-32d6-433b-802d-5bb982660363', 'object': 'chat.completion', 'created': 1717239441, 'model': 'Hermes-2-Pro-Llama-3-Instruct-Merged-DPO-F16.gguf', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': '[\\n {\\n \"name\": \"get_random_city\",\\n \"params\": {},\\n \"output\": \"random_city\"\\n },\\n {\\n \"name\": \"get_weather_forecast\",\\n \"params\": {\"location\": \"random_city\"},\\n \"output\": \"weather_forecast\"\\n }\\n]'}, 'logprobs': None, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 450, 'completion_tokens': 62, 'total_tokens': 512}}\n" 415 ] 416 } 417 ], 418 "source": [ 419 "\n", 420 "# Compose the prompt \n", 421 "user_query = \"Whats the temperature in a random city?\"\n", 422 "\n", 423 "# Get the response from the model\n", 424 "model_name = 'adrienbrault/nous-hermes2pro:Q8_0'\n", 425 "messages = [\n", 426 " {'role': 'system', 'content': system_prompt,\n", 427 " },\n", 428 " {'role': 'user', 'content': user_query}\n", 429 "]\n", 430 "response = model.create_chat_completion(messages=messages)\n", 431 "print(response)\n", 432 "# Get the function calls from the response\n" 433 ] 434 }, 435 { 436 "cell_type": "code", 437 "execution_count": 8, 438 "metadata": {}, 439 "outputs": [ 440 { 441 "data": { 442 "text/plain": [ 443 "'[\\n {\\n \"name\": \"get_random_city\",\\n \"params\": {},\\n \"output\": \"random_city\"\\n },\\n {\\n \"name\": \"get_weather_forecast\",\\n \"params\": {\"location\": \"random_city\"},\\n \"output\": \"weather_forecast\"\\n }\\n]'" 444 ] 445 }, 446 "execution_count": 8, 447 "metadata": {}, 448 "output_type": "execute_result" 449 } 450 ], 451 "source": [ 452 "response['choices'][0]['message']['content']" 453 ] 454 }, 455 { 456 "cell_type": "code", 457 "execution_count": 9, 458 "metadata": {}, 459 "outputs": [ 460 { 461 "name": "stdout", 462 "output_type": "stream", 463 "text": [ 464 "Function calls:\n", 465 "[{'name': 'get_random_city', 'params': {}, 'output': 'random_city'}, {'name': 'get_weather_forecast', 'params': {'location': 'random_city'}, 'output': 'weather_forecast'}]\n" 466 ] 467 } 468 ], 469 "source": [ 470 "function_calls = response['choices'][0]['message']['content']\n", 471 "# If it ends with a <function_calls>, get everything before it\n", 472 "if function_calls.startswith(\"<function_calls>\"):\n", 473 " function_calls = function_calls.split(\"<function_calls>\")[1]\n", 474 "\n", 475 "# Read function calls as json\n", 476 "try:\n", 477 " function_calls_json: list[dict[str, str]] = json.loads(function_calls)\n", 478 "except json.JSONDecodeError:\n", 479 " function_calls_json = []\n", 480 " print (\"Model response not in desired JSON format\")\n", 481 "finally:\n", 482 " print(\"Function calls:\")\n", 483 " print(function_calls_json)" 484 ] 485 }, 486 { 487 "cell_type": "markdown", 488 "metadata": {}, 489 "source": [ 490 "#### Append the assistant message to the chat and call the functions" 491 ] 492 }, 493 { 494 "cell_type": "code", 495 "execution_count": 14, 496 "metadata": {}, 497 "outputs": [], 498 "source": [ 499 "# add <tool_call> to the function calls as mentioned in the chat template in Hugging Face\n", 500 "function_message = '<tool_call>' + str(function_calls_json) + '</tool_call>'\n", 501 "\n", 502 "messages.append({'role': 'assistant', 'content': function_message})" 503 ] 504 }, 505 { 506 "cell_type": "code", 507 "execution_count": 15, 508 "metadata": {}, 509 "outputs": [ 510 { 511 "name": "stdout", 512 "output_type": "stream", 513 "text": [ 514 "Tool Response: Amsterdam\n", 515 "Tool Response: {'location': 'Groningen', 'forecast': 'sunny', 'temperature': '25°C'}\n" 516 ] 517 } 518 ], 519 "source": [ 520 "for function in function_calls_json:\n", 521 " output = f\"Tool Response: {function_caller.call_function(function)}\"\n", 522 " print(output)" 523 ] 524 }, 525 { 526 "cell_type": "code", 527 "execution_count": 16, 528 "metadata": {}, 529 "outputs": [], 530 "source": [ 531 "# Call the functions\n", 532 "output = \"\"\n", 533 "for function in function_calls_json:\n", 534 " output = f\"{function_caller.call_function(function)}\"\n", 535 "\n", 536 "#Append the tool response to the messages with the chat format\n", 537 "tool_output = '<tool_response> ' + output + ' </tool_response>'\n", 538 "messages.append({'role': 'tool', 'content': tool_output})\n" 539 ] 540 }, 541 { 542 "cell_type": "code", 543 "execution_count": 18, 544 "metadata": {}, 545 "outputs": [ 546 { 547 "data": { 548 "text/plain": [ 549 "[{'role': 'system',\n", 550 " 'content': '\\nYou are an AI assistant that can help the user with a variety of tasks. You have access to the following functions:\\n\\n<tools> [\\n {\\n \"name\": \"get_weather_forecast\",\\n \"description\": \"Retrieves the weather forecast for a given location\",\\n \"parameters\": {\\n \"properties\": [\\n {\\n \"name\": \"location\",\\n \"type\": \"str\"\\n }\\n ],\\n \"required\": [\\n \"location\"\\n ]\\n },\\n \"returns\": [\\n {\\n \"name\": \"get_weather_forecast_output\",\\n \"type\": \"dict[str, str]\"\\n }\\n ]\\n },\\n {\\n \"name\": \"get_random_city\",\\n \"description\": \"Retrieves a random city from a list of cities\",\\n \"parameters\": {\\n \"properties\": [],\\n \"required\": []\\n },\\n \"returns\": [\\n {\\n \"name\": \"get_random_city_output\",\\n \"type\": \"str\"\\n }\\n ]\\n },\\n {\\n \"name\": \"get_random_number\",\\n \"description\": \"Retrieves a random number\",\\n \"parameters\": {\\n \"properties\": [],\\n \"required\": []\\n },\\n \"returns\": [\\n {\\n \"name\": \"get_random_number_output\",\\n \"type\": \"int\"\\n }\\n ]\\n }\\n] </tools>\\n\\nWhen the user asks you a question, if you need to use functions, provide ONLY the function calls, and NOTHING ELSE, in the format:\\n<function_calls> \\n[\\n { \"name\": \"function_name_1\", \"params\": { \"param_1\": \"value_1\", \"param_2\": \"value_2\" }, \"output\": \"The output variable name, to be possibly used as input for another function},\\n { \"name\": \"function_name_2\", \"params\": { \"param_3\": \"value_3\", \"param_4\": \"output_1\"}, \"output\": \"The output variable name, to be possibly used as input for another function\"},\\n ...\\n]\\n'},\n", 551 " {'role': 'user', 'content': 'Whats the temperature in a random city?'},\n", 552 " {'role': 'assistant',\n", 553 " 'content': \"<tool_call>[{'name': 'get_random_city', 'params': {}, 'output': 'random_city'}, {'name': 'get_weather_forecast', 'params': {'location': 'Groningen'}, 'output': 'weather_forecast'}]</tool_call>\"},\n", 554 " {'role': 'tool',\n", 555 " 'content': \"<tool_response> {'location': 'Groningen', 'forecast': 'sunny', 'temperature': '25°C'} </tool_response>\"}]" 556 ] 557 }, 558 "execution_count": 18, 559 "metadata": {}, 560 "output_type": "execute_result" 561 } 562 ], 563 "source": [ 564 "messages" 565 ] 566 }, 567 { 568 "cell_type": "markdown", 569 "metadata": {}, 570 "source": [ 571 "#### Inference the model again with the tool respones" 572 ] 573 }, 574 { 575 "cell_type": "code", 576 "execution_count": 19, 577 "metadata": {}, 578 "outputs": [ 579 { 580 "name": "stderr", 581 "output_type": "stream", 582 "text": [ 583 "Llama.generate: prefix-match hit\n", 584 "\n", 585 "llama_print_timings: load time = 1707.16 ms\n", 586 "llama_print_timings: sample time = 14.45 ms / 20 runs ( 0.72 ms per token, 1384.27 tokens per second)\n", 587 "llama_print_timings: prompt eval time = 775.81 ms / 86 tokens ( 9.02 ms per token, 110.85 tokens per second)\n", 588 "llama_print_timings: eval time = 1416.85 ms / 19 runs ( 74.57 ms per token, 13.41 tokens per second)\n", 589 "llama_print_timings: total time = 2321.63 ms / 105 tokens\n" 590 ] 591 }, 592 { 593 "data": { 594 "text/plain": [ 595 "{'id': 'chatcmpl-25dfeae2-2184-497f-b838-e08565ad078c',\n", 596 " 'object': 'chat.completion',\n", 597 " 'created': 1717239671,\n", 598 " 'model': 'Hermes-2-Pro-Llama-3-Instruct-Merged-DPO-F16.gguf',\n", 599 " 'choices': [{'index': 0,\n", 600 " 'message': {'role': 'assistant',\n", 601 " 'content': \"The temperature in the random city of Groningen is currently 25°C and it's sunny.\"},\n", 602 " 'logprobs': None,\n", 603 " 'finish_reason': 'stop'}],\n", 604 " 'usage': {'prompt_tokens': 536, 'completion_tokens': 19, 'total_tokens': 555}}" 605 ] 606 }, 607 "execution_count": 19, 608 "metadata": {}, 609 "output_type": "execute_result" 610 } 611 ], 612 "source": [ 613 "response=model.create_chat_completion(messages=messages,temperature=0)\n", 614 "response" 615 ] 616 } 617 ], 618 "metadata": { 619 "kernelspec": { 620 "display_name": "Python 3", 621 "language": "python", 622 "name": "python3" 623 }, 624 "language_info": { 625 "codemirror_mode": { 626 "name": "ipython", 627 "version": 3 628 }, 629 "file_extension": ".py", 630 "mimetype": "text/x-python", 631 "name": "python", 632 "nbconvert_exporter": "python", 633 "pygments_lexer": "ipython3", 634 "version": "3.11.9" 635 } 636 }, 637 "nbformat": 4, 638 "nbformat_minor": 2 639 }