Cradicle Explorer

/ docs-website / reference_versioned_docs / version-2.26 / haystack-api / builders_api.md
builders_api.md
  1  ---
  2  title: "Builders"
  3  id: builders-api
  4  description: "Extract the output of a Generator to an Answer format, and build prompts."
  5  slug: "/builders-api"
  6  ---
  7  
  8  
  9  ## answer_builder
 10  
 11  ### AnswerBuilder
 12  
 13  Converts a query and Generator replies into a `GeneratedAnswer` object.
 14  
 15  AnswerBuilder parses Generator replies using custom regular expressions.
 16  Check out the usage example below to see how it works.
 17  Optionally, it can also take documents and metadata from the Generator to add to the `GeneratedAnswer` object.
 18  AnswerBuilder works with both non-chat and chat Generators.
 19  
 20  ### Usage example
 21  
 22  ```python
 23  from haystack.components.builders import AnswerBuilder
 24  
 25  builder = AnswerBuilder(pattern="Answer: (.*)")
 26  builder.run(query="What's the answer?", replies=["This is an argument. Answer: This is the answer."])
 27  ```
 28  
 29  ### Usage example with documents and reference pattern
 30  
 31  ```python
 32  from haystack import Document
 33  from haystack.components.builders import AnswerBuilder
 34  
 35  replies = ["The capital of France is Paris [2]."]
 36  
 37  docs = [
 38      Document(content="Berlin is the capital of Germany."),
 39      Document(content="Paris is the capital of France."),
 40      Document(content="Rome is the capital of Italy."),
 41  ]
 42  
 43  builder = AnswerBuilder(reference_pattern="\[(\d+)\]", return_only_referenced_documents=False)
 44  result = builder.run(query="What is the capital of France?", replies=replies, documents=docs)["answers"][0]
 45  
 46  print(f"Answer: {result.data}")
 47  print("References:")
 48  for doc in result.documents:
 49      if doc.meta["referenced"]:
 50          print(f"[{doc.meta['source_index']}] {doc.content}")
 51  print("Other sources:")
 52  for doc in result.documents:
 53      if not doc.meta["referenced"]:
 54          print(f"[{doc.meta['source_index']}] {doc.content}")
 55  
 56  # Answer: The capital of France is Paris
 57  # References:
 58  # [2] Paris is the capital of France.
 59  # Other sources:
 60  # [1] Berlin is the capital of Germany.
 61  # [3] Rome is the capital of Italy.
 62  ```
 63  
 64  #### __init__
 65  
 66  ```python
 67  __init__(
 68      pattern: str | None = None,
 69      reference_pattern: str | None = None,
 70      last_message_only: bool = False,
 71      *,
 72      return_only_referenced_documents: bool = True
 73  )
 74  ```
 75  
 76  Creates an instance of the AnswerBuilder component.
 77  
 78  **Parameters:**
 79  
 80  - **pattern** (<code>str | None</code>) – The regular expression pattern to extract the answer text from the Generator.
 81    If not specified, the entire response is used as the answer.
 82    The regular expression can have one capture group at most.
 83    If present, the capture group text
 84    is used as the answer. If no capture group is present, the whole match is used as the answer.
 85    Examples:
 86    `[^\n]+$` finds "this is an answer" in a string "this is an argument.\\nthis is an answer".
 87    `Answer: (.*)` finds "this is an answer" in a string "this is an argument. Answer: this is an answer".
 88  - **reference_pattern** (<code>str | None</code>) – The regular expression pattern used for parsing the document references.
 89    If not specified, no parsing is done, and all documents are returned.
 90    References need to be specified as indices of the input documents and start at [1].
 91    Example: `\[(\d+)\]` finds "1" in a string "this is an answer[1]".
 92    If this parameter is provided, documents metadata will contain a "referenced" key with a boolean value.
 93  - **last_message_only** (<code>bool</code>) – If False (default value), all messages are used as the answer.
 94    If True, only the last message is used as the answer.
 95  - **return_only_referenced_documents** (<code>bool</code>) – To be used in conjunction with `reference_pattern`.
 96    If True (default value), only the documents that were actually referenced in `replies` are returned.
 97    If False, all documents are returned.
 98    If `reference_pattern` is not provided, this parameter has no effect, and all documents are returned.
 99  
100  #### run
101  
102  ```python
103  run(
104      query: str,
105      replies: list[str] | list[ChatMessage],
106      meta: list[dict[str, Any]] | None = None,
107      documents: list[Document] | None = None,
108      pattern: str | None = None,
109      reference_pattern: str | None = None,
110  )
111  ```
112  
113  Turns the output of a Generator into `GeneratedAnswer` objects using regular expressions.
114  
115  **Parameters:**
116  
117  - **query** (<code>str</code>) – The input query used as the Generator prompt.
118  - **replies** (<code>list\[str\] | list\[ChatMessage\]</code>) – The output of the Generator. Can be a list of strings or a list of `ChatMessage` objects.
119  - **meta** (<code>list\[dict\[str, Any\]\] | None</code>) – The metadata returned by the Generator. If not specified, the generated answer will contain no metadata.
120  - **documents** (<code>list\[Document\] | None</code>) – The documents used as the Generator inputs. If specified, they are added to
121    the `GeneratedAnswer` objects.
122    Each Document.meta includes a "source_index" key, representing its 1-based position in the input list.
123    When `reference_pattern` is provided:
124  - "referenced" key is added to the Document.meta, indicating if the document was referenced in the output.
125  - `return_only_referenced_documents` init parameter controls if all or only referenced documents are
126    returned.
127  - **pattern** (<code>str | None</code>) – The regular expression pattern to extract the answer text from the Generator.
128    If not specified, the entire response is used as the answer.
129    The regular expression can have one capture group at most.
130    If present, the capture group text
131    is used as the answer. If no capture group is present, the whole match is used as the answer.
132    Examples:
133    `[^\n]+$` finds "this is an answer" in a string "this is an argument.\\nthis is an answer".
134    `Answer: (.*)` finds "this is an answer" in a string
135    "this is an argument. Answer: this is an answer".
136  - **reference_pattern** (<code>str | None</code>) – The regular expression pattern used for parsing the document references.
137    If not specified, no parsing is done, and all documents are returned.
138    References need to be specified as indices of the input documents and start at [1].
139    Example: `\[(\d+)\]` finds "1" in a string "this is an answer[1]".
140  
141  **Returns:**
142  
143  - – A dictionary with the following keys:
144  - `answers`: The answers received from the output of the Generator.
145  
146  ## chat_prompt_builder
147  
148  ### ChatPromptBuilder
149  
150  Renders a chat prompt from a template using Jinja2 syntax.
151  
152  A template can be a list of `ChatMessage` objects, or a special string, as shown in the usage examples.
153  
154  It constructs prompts using static or dynamic templates, which you can update for each pipeline run.
155  
156  Template variables in the template are optional unless specified otherwise.
157  If an optional variable isn't provided, it defaults to an empty string. Use `variable` and `required_variables`
158  to define input types and required variables.
159  
160  ### Usage examples
161  
162  #### Static ChatMessage prompt template
163  
164  ```python
165  template = [ChatMessage.from_user("Translate to {{ target_language }}. Context: {{ snippet }}; Translation:")]
166  builder = ChatPromptBuilder(template=template)
167  builder.run(target_language="spanish", snippet="I can't speak spanish.")
168  ```
169  
170  #### Overriding static ChatMessage template at runtime
171  
172  ```python
173  template = [ChatMessage.from_user("Translate to {{ target_language }}. Context: {{ snippet }}; Translation:")]
174  builder = ChatPromptBuilder(template=template)
175  builder.run(target_language="spanish", snippet="I can't speak spanish.")
176  
177  msg = "Translate to {{ target_language }} and summarize. Context: {{ snippet }}; Summary:"
178  summary_template = [ChatMessage.from_user(msg)]
179  builder.run(target_language="spanish", snippet="I can't speak spanish.", template=summary_template)
180  ```
181  
182  #### Dynamic ChatMessage prompt template
183  
184  ```python
185  from haystack.components.builders import ChatPromptBuilder
186  from haystack.components.generators.chat import OpenAIChatGenerator
187  from haystack.dataclasses import ChatMessage
188  from haystack import Pipeline
189  
190  # no parameter init, we don't use any runtime template variables
191  prompt_builder = ChatPromptBuilder()
192  llm = OpenAIChatGenerator(model="gpt-5-mini")
193  
194  pipe = Pipeline()
195  pipe.add_component("prompt_builder", prompt_builder)
196  pipe.add_component("llm", llm)
197  pipe.connect("prompt_builder.prompt", "llm.messages")
198  
199  location = "Berlin"
200  language = "English"
201  system_message = ChatMessage.from_system("You are an assistant giving information to tourists in {{language}}")
202  messages = [system_message, ChatMessage.from_user("Tell me about {{location}}")]
203  
204  res = pipe.run(data={"prompt_builder": {"template_variables": {"location": location, "language": language},
205                                      "template": messages}})
206  print(res)
207  # >> {'llm': {'replies': [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text=
208  # "Berlin is the capital city of Germany and one of the most vibrant
209  # and diverse cities in Europe. Here are some key things to know...Enjoy your time exploring the vibrant and dynamic
210  # capital of Germany!")], _name=None, _meta={'model': 'gpt-5-mini',
211  # 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 27, 'completion_tokens': 681, 'total_tokens':
212  # 708}})]}}
213  
214  messages = [system_message, ChatMessage.from_user("What's the weather forecast for {{location}} in the next
215  {{day_count}} days?")]
216  
217  res = pipe.run(data={"prompt_builder": {"template_variables": {"location": location, "day_count": "5"},
218                                      "template": messages}})
219  
220  print(res)
221  # >> {'llm': {'replies': [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text=
222  # "Here is the weather forecast for Berlin in the next 5
223  # days:\n\nDay 1: Mostly cloudy with a high of 22°C (72°F) and...so it's always a good idea to check for updates
224  # closer to your visit.")], _name=None, _meta={'model': 'gpt-5-mini',
225  # 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 37, 'completion_tokens': 201,
226  # 'total_tokens': 238}})]}}
227  ```
228  
229  #### String prompt template
230  
231  ```python
232  from haystack.components.builders import ChatPromptBuilder
233  from haystack.dataclasses.image_content import ImageContent
234  
235  template = """
236  {% message role="system" %}
237  You are a helpful assistant.
238  {% endmessage %}
239  
240  {% message role="user" %}
241  Hello! I am {{user_name}}. What's the difference between the following images?
242  {% for image in images %}
243  {{ image | templatize_part }}
244  {% endfor %}
245  {% endmessage %}
246  """
247  
248  images = [ImageContent.from_file_path("test/test_files/images/apple.jpg"),
249            ImageContent.from_file_path("test/test_files/images/haystack-logo.png")]
250  
251  builder = ChatPromptBuilder(template=template)
252  builder.run(user_name="John", images=images)
253  ```
254  
255  #### __init__
256  
257  ```python
258  __init__(
259      template: list[ChatMessage] | str | None = None,
260      required_variables: list[str] | Literal["*"] | None = None,
261      variables: list[str] | None = None,
262  )
263  ```
264  
265  Constructs a ChatPromptBuilder component.
266  
267  **Parameters:**
268  
269  - **template** (<code>list\[ChatMessage\] | str | None</code>) – A list of `ChatMessage` objects or a string template. The component looks for Jinja2 template syntax and
270    renders the prompt with the provided variables. Provide the template in either
271    the `init` method`or the`run\` method.
272  - **required_variables** (<code>list\[str\] | Literal['\*'] | None</code>) – List variables that must be provided as input to ChatPromptBuilder.
273    If a variable listed as required is not provided, an exception is raised.
274    If set to `"*"`, all variables found in the prompt are required. Optional.
275  - **variables** (<code>list\[str\] | None</code>) – List input variables to use in prompt templates instead of the ones inferred from the
276    `template` parameter. For example, to use more variables during prompt engineering than the ones present
277    in the default template, you can provide them here.
278  
279  #### run
280  
281  ```python
282  run(
283      template: list[ChatMessage] | str | None = None,
284      template_variables: dict[str, Any] | None = None,
285      **kwargs: dict[str, Any] | None
286  ) -> dict[str, list[ChatMessage]]
287  ```
288  
289  Renders the prompt template with the provided variables.
290  
291  It applies the template variables to render the final prompt. You can provide variables with pipeline kwargs.
292  To overwrite the default template, you can set the `template` parameter.
293  To overwrite pipeline kwargs, you can set the `template_variables` parameter.
294  
295  **Parameters:**
296  
297  - **template** (<code>list\[ChatMessage\] | str | None</code>) – An optional list of `ChatMessage` objects or string template to overwrite ChatPromptBuilder's default
298    template.
299    If `None`, the default template provided at initialization is used.
300  - **template_variables** (<code>dict\[str, Any\] | None</code>) – An optional dictionary of template variables to overwrite the pipeline variables.
301  - **kwargs** – Pipeline variables used for rendering the prompt.
302  
303  **Returns:**
304  
305  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys:
306  - `prompt`: The updated list of `ChatMessage` objects after rendering the templates.
307  
308  **Raises:**
309  
310  - <code>ValueError</code> – If `chat_messages` is empty or contains elements that are not instances of `ChatMessage`.
311  
312  #### to_dict
313  
314  ```python
315  to_dict() -> dict[str, Any]
316  ```
317  
318  Returns a dictionary representation of the component.
319  
320  **Returns:**
321  
322  - <code>dict\[str, Any\]</code> – Serialized dictionary representation of the component.
323  
324  #### from_dict
325  
326  ```python
327  from_dict(data: dict[str, Any]) -> ChatPromptBuilder
328  ```
329  
330  Deserialize this component from a dictionary.
331  
332  **Parameters:**
333  
334  - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize and create the component.
335  
336  **Returns:**
337  
338  - <code>ChatPromptBuilder</code> – The deserialized component.
339  
340  ## prompt_builder
341  
342  ### PromptBuilder
343  
344  Renders a prompt filling in any variables so that it can send it to a Generator.
345  
346  The prompt uses Jinja2 template syntax.
347  The variables in the default template are used as PromptBuilder's input and are all optional.
348  If they're not provided, they're replaced with an empty string in the rendered prompt.
349  To try out different prompts, you can replace the prompt template at runtime by
350  providing a template for each pipeline run invocation.
351  
352  ### Usage examples
353  
354  #### On its own
355  
356  This example uses PromptBuilder to render a prompt template and fill it with `target_language`
357  and `snippet`. PromptBuilder returns a prompt with the string "Translate the following context to Spanish.
358  Context: I can't speak Spanish.; Translation:".
359  
360  ```python
361  from haystack.components.builders import PromptBuilder
362  
363  template = "Translate the following context to {{ target_language }}. Context: {{ snippet }}; Translation:"
364  builder = PromptBuilder(template=template)
365  builder.run(target_language="spanish", snippet="I can't speak spanish.")
366  ```
367  
368  #### In a Pipeline
369  
370  This is an example of a RAG pipeline where PromptBuilder renders a custom prompt template and fills it
371  with the contents of the retrieved documents and a query. The rendered prompt is then sent to a Generator.
372  
373  ```python
374  from haystack import Pipeline, Document
375  from haystack.utils import Secret
376  from haystack.components.generators import OpenAIGenerator
377  from haystack.components.builders.prompt_builder import PromptBuilder
378  
379  # in a real world use case documents could come from a retriever, web, or any other source
380  documents = [Document(content="Joe lives in Berlin"), Document(content="Joe is a software engineer")]
381  prompt_template = """
382      Given these documents, answer the question.
383      Documents:
384      {% for doc in documents %}
385          {{ doc.content }}
386      {% endfor %}
387  
388      Question: {{query}}
389      Answer:
390      """
391  p = Pipeline()
392  p.add_component(instance=PromptBuilder(template=prompt_template), name="prompt_builder")
393  p.add_component(instance=OpenAIGenerator(api_key=Secret.from_env_var("OPENAI_API_KEY")), name="llm")
394  p.connect("prompt_builder", "llm")
395  
396  question = "Where does Joe live?"
397  result = p.run({"prompt_builder": {"documents": documents, "query": question}})
398  print(result)
399  ```
400  
401  #### Changing the template at runtime (prompt engineering)
402  
403  You can change the prompt template of an existing pipeline, like in this example:
404  
405  ```python
406  documents = [
407      Document(content="Joe lives in Berlin", meta={"name": "doc1"}),
408      Document(content="Joe is a software engineer", meta={"name": "doc1"}),
409  ]
410  new_template = """
411      You are a helpful assistant.
412      Given these documents, answer the question.
413      Documents:
414      {% for doc in documents %}
415          Document {{ loop.index }}:
416          Document name: {{ doc.meta['name'] }}
417          {{ doc.content }}
418      {% endfor %}
419  
420      Question: {{ query }}
421      Answer:
422      """
423  p.run({
424      "prompt_builder": {
425          "documents": documents,
426          "query": question,
427          "template": new_template,
428      },
429  })
430  ```
431  
432  To replace the variables in the default template when testing your prompt,
433  pass the new variables in the `variables` parameter.
434  
435  #### Overwriting variables at runtime
436  
437  To overwrite the values of variables, use `template_variables` during runtime:
438  
439  ```python
440  language_template = """
441  You are a helpful assistant.
442  Given these documents, answer the question.
443  Documents:
444  {% for doc in documents %}
445      Document {{ loop.index }}:
446      Document name: {{ doc.meta['name'] }}
447      {{ doc.content }}
448  {% endfor %}
449  
450  Question: {{ query }}
451  Please provide your answer in {{ answer_language | default('English') }}
452  Answer:
453  """
454  p.run({
455      "prompt_builder": {
456          "documents": documents,
457          "query": question,
458          "template": language_template,
459          "template_variables": {"answer_language": "German"},
460      },
461  })
462  ```
463  
464  Note that `language_template` introduces variable `answer_language` which is not bound to any pipeline variable.
465  If not set otherwise, it will use its default value 'English'.
466  This example overwrites its value to 'German'.
467  Use `template_variables` to overwrite pipeline variables (such as documents) as well.
468  
469  #### __init__
470  
471  ```python
472  __init__(
473      template: str,
474      required_variables: list[str] | Literal["*"] | None = None,
475      variables: list[str] | None = None,
476  )
477  ```
478  
479  Constructs a PromptBuilder component.
480  
481  **Parameters:**
482  
483  - **template** (<code>str</code>) – A prompt template that uses Jinja2 syntax to add variables. For example:
484    `"Summarize this document: {{ documents[0].content }}\nSummary:"`
485    It's used to render the prompt.
486    The variables in the default template are input for PromptBuilder and are all optional,
487    unless explicitly specified.
488    If an optional variable is not provided, it's replaced with an empty string in the rendered prompt.
489  - **required_variables** (<code>list\[str\] | Literal['\*'] | None</code>) – List variables that must be provided as input to PromptBuilder.
490    If a variable listed as required is not provided, an exception is raised.
491    If set to `"*"`, all variables found in the prompt are required. Optional.
492  - **variables** (<code>list\[str\] | None</code>) – List input variables to use in prompt templates instead of the ones inferred from the
493    `template` parameter. For example, to use more variables during prompt engineering than the ones present
494    in the default template, you can provide them here.
495  
496  #### to_dict
497  
498  ```python
499  to_dict() -> dict[str, Any]
500  ```
501  
502  Returns a dictionary representation of the component.
503  
504  **Returns:**
505  
506  - <code>dict\[str, Any\]</code> – Serialized dictionary representation of the component.
507  
508  #### run
509  
510  ```python
511  run(
512      template: str | None = None,
513      template_variables: dict[str, Any] | None = None,
514      **kwargs: dict[str, Any] | None
515  )
516  ```
517  
518  Renders the prompt template with the provided variables.
519  
520  It applies the template variables to render the final prompt. You can provide variables via pipeline kwargs.
521  In order to overwrite the default template, you can set the `template` parameter.
522  In order to overwrite pipeline kwargs, you can set the `template_variables` parameter.
523  
524  **Parameters:**
525  
526  - **template** (<code>str | None</code>) – An optional string template to overwrite PromptBuilder's default template. If None, the default template
527    provided at initialization is used.
528  - **template_variables** (<code>dict\[str, Any\] | None</code>) – An optional dictionary of template variables to overwrite the pipeline variables.
529  - **kwargs** – Pipeline variables used for rendering the prompt.
530  
531  **Returns:**
532  
533  - – A dictionary with the following keys:
534  - `prompt`: The updated prompt text after rendering the prompt template.
535  
536  **Raises:**
537  
538  - <code>ValueError</code> – If any of the required template variables is not provided.