/ examples / server / tests / features / embeddings.feature
embeddings.feature
 1  @llama.cpp
 2  @embeddings
 3  Feature: llama.cpp server
 4  
 5    Background: Server startup
 6      Given a server listening on localhost:8080
 7      And   a model url https://huggingface.co/ggml-org/models/resolve/main/bert-bge-small/ggml-model-f16.gguf
 8      And   a model file bert-bge-small.gguf
 9      And   a model alias bert-bge-small
10      And   42 as server seed
11      And   2 slots
12      And   1024 as batch size
13      And   1024 as ubatch size
14      And   2048 KV cache size
15      And   embeddings extraction
16      Then  the server is starting
17      Then  the server is healthy
18  
19    Scenario: Embedding
20      When embeddings are computed for:
21      """
22      What is the capital of Bulgaria ?
23      """
24      Then embeddings are generated
25  
26    Scenario: OAI Embeddings compatibility
27      Given a model bert-bge-small
28      When an OAI compatible embeddings computation request for:
29      """
30      What is the capital of Spain ?
31      """
32      Then embeddings are generated
33  
34    Scenario: OAI Embeddings compatibility with multiple inputs
35      Given a model bert-bge-small
36      Given a prompt:
37        """
38        In which country Paris is located ?
39        """
40      And a prompt:
41        """
42        Is Madrid the capital of Spain ?
43        """
44      When an OAI compatible embeddings computation request for multiple inputs
45      Then embeddings are generated
46  
47    Scenario: Multi users embeddings
48      Given a prompt:
49        """
50        Write a very long story about AI.
51        """
52      And a prompt:
53        """
54        Write another very long music lyrics.
55        """
56      And a prompt:
57        """
58        Write a very long poem.
59        """
60      And a prompt:
61        """
62        Write a very long joke.
63        """
64      Given concurrent embedding requests
65      Then the server is busy
66      Then the server is idle
67      Then all embeddings are generated
68  
69    Scenario: Multi users OAI compatibility embeddings
70      Given a prompt:
71        """
72        In which country Paris is located ?
73        """
74      And a prompt:
75        """
76        Is Madrid the capital of Spain ?
77        """
78      And a prompt:
79        """
80        What is the biggest US city ?
81        """
82      And a prompt:
83        """
84        What is the capital of Bulgaria ?
85        """
86      And   a model bert-bge-small
87      Given concurrent OAI embedding requests
88      Then the server is busy
89      Then the server is idle
90      Then all embeddings are generated
91  
92    Scenario: All embeddings should be the same
93      Given 10 fixed prompts
94      And   a model bert-bge-small
95      Given concurrent OAI embedding requests
96      Then all embeddings are the same