/ examples / server / tests / features / slotsave.feature
slotsave.feature
 1  @llama.cpp
 2  @slotsave
 3  Feature: llama.cpp server slot management
 4  
 5    Background: Server startup
 6      Given a server listening on localhost:8080
 7      And   a model file tinyllamas/stories260K.gguf from HF repo ggml-org/models
 8      And   prompt caching is enabled
 9      And   2 slots
10      And   . as slot save path
11      And   2048 KV cache size
12      And   42 as server seed
13      And   24 max tokens to predict
14      Then  the server is starting
15      Then  the server is healthy
16  
17    Scenario: Save and Restore Slot
18      # First prompt in slot 1 should be fully processed
19      Given a user prompt "What is the capital of France?"
20      And   using slot id 1
21      And   a completion request with no api error
22      Then  24 tokens are predicted matching (Lily|cake)
23      And   22 prompt tokens are processed
24      When  the slot 1 is saved with filename "slot1.bin"
25      Then  the server responds with status code 200
26      # Since we have cache, this should only process the last tokens
27      Given a user prompt "What is the capital of Germany?"
28      And   a completion request with no api error
29      Then  24 tokens are predicted matching (Thank|special)
30      And   7 prompt tokens are processed
31      # Loading the original cache into slot 0,
32      # we should only be processing 1 prompt token and get the same output
33      When  the slot 0 is restored with filename "slot1.bin"
34      Then  the server responds with status code 200
35      Given a user prompt "What is the capital of France?"
36      And   using slot id 0
37      And   a completion request with no api error
38      Then  24 tokens are predicted matching (Lily|cake)
39      And   1 prompt tokens are processed
40      # For verification that slot 1 was not corrupted during slot 0 load, same thing
41      Given a user prompt "What is the capital of Germany?"
42      And   using slot id 1
43      And   a completion request with no api error
44      Then  24 tokens are predicted matching (Thank|special)
45      And   1 prompt tokens are processed
46  
47    Scenario: Erase Slot
48      Given a user prompt "What is the capital of France?"
49      And   using slot id 1
50      And   a completion request with no api error
51      Then  24 tokens are predicted matching (Lily|cake)
52      And   22 prompt tokens are processed
53      When  the slot 1 is erased
54      Then  the server responds with status code 200
55      Given a user prompt "What is the capital of France?"
56      And   a completion request with no api error
57      Then  24 tokens are predicted matching (Lily|cake)
58      And   22 prompt tokens are processed