embeddings.feature
1 @llama.cpp 2 @embeddings 3 Feature: llama.cpp server 4 5 Background: Server startup 6 Given a server listening on localhost:8080 7 And a model url https://huggingface.co/ggml-org/models/resolve/main/bert-bge-small/ggml-model-f16.gguf 8 And a model file bert-bge-small.gguf 9 And a model alias bert-bge-small 10 And 42 as server seed 11 And 2 slots 12 And 1024 as batch size 13 And 1024 as ubatch size 14 And 2048 KV cache size 15 And embeddings extraction 16 Then the server is starting 17 Then the server is healthy 18 19 Scenario: Embedding 20 When embeddings are computed for: 21 """ 22 What is the capital of Bulgaria ? 23 """ 24 Then embeddings are generated 25 26 Scenario: OAI Embeddings compatibility 27 Given a model bert-bge-small 28 When an OAI compatible embeddings computation request for: 29 """ 30 What is the capital of Spain ? 31 """ 32 Then embeddings are generated 33 34 Scenario: OAI Embeddings compatibility with multiple inputs 35 Given a model bert-bge-small 36 Given a prompt: 37 """ 38 In which country Paris is located ? 39 """ 40 And a prompt: 41 """ 42 Is Madrid the capital of Spain ? 43 """ 44 When an OAI compatible embeddings computation request for multiple inputs 45 Then embeddings are generated 46 47 Scenario: Multi users embeddings 48 Given a prompt: 49 """ 50 Write a very long story about AI. 51 """ 52 And a prompt: 53 """ 54 Write another very long music lyrics. 55 """ 56 And a prompt: 57 """ 58 Write a very long poem. 59 """ 60 And a prompt: 61 """ 62 Write a very long joke. 63 """ 64 Given concurrent embedding requests 65 Then the server is busy 66 Then the server is idle 67 Then all embeddings are generated 68 69 Scenario: Multi users OAI compatibility embeddings 70 Given a prompt: 71 """ 72 In which country Paris is located ? 73 """ 74 And a prompt: 75 """ 76 Is Madrid the capital of Spain ? 77 """ 78 And a prompt: 79 """ 80 What is the biggest US city ? 81 """ 82 And a prompt: 83 """ 84 What is the capital of Bulgaria ? 85 """ 86 And a model bert-bge-small 87 Given concurrent OAI embedding requests 88 Then the server is busy 89 Then the server is idle 90 Then all embeddings are generated 91 92 Scenario: All embeddings should be the same 93 Given 10 fixed prompts 94 And a model bert-bge-small 95 Given concurrent OAI embedding requests 96 Then all embeddings are the same