Cradicle Explorer

/ docs-website / reference_versioned_docs / version-2.28 / haystack-api / builders_api.md
builders_api.md
  1  ---
  2  title: "Builders"
  3  id: builders-api
  4  description: "Extract the output of a Generator to an Answer format, and build prompts."
  5  slug: "/builders-api"
  6  ---
  7  
  8  
  9  ## answer_builder
 10  
 11  ### AnswerBuilder
 12  
 13  Converts a query and Generator replies into a `GeneratedAnswer` object.
 14  
 15  AnswerBuilder parses Generator replies using custom regular expressions.
 16  Check out the usage example below to see how it works.
 17  Optionally, it can also take documents and metadata from the Generator to add to the `GeneratedAnswer` object.
 18  AnswerBuilder works with both non-chat and chat Generators.
 19  
 20  ### Usage example
 21  
 22  ```python
 23  from haystack.components.builders import AnswerBuilder
 24  
 25  builder = AnswerBuilder(pattern="Answer: (.*)")
 26  builder.run(query="What's the answer?", replies=["This is an argument. Answer: This is the answer."])
 27  ```
 28  
 29  ### Usage example with documents and reference pattern
 30  
 31  ```python
 32  from haystack import Document
 33  from haystack.components.builders import AnswerBuilder
 34  
 35  replies = ["The capital of France is Paris [2]."]
 36  
 37  docs = [
 38      Document(content="Berlin is the capital of Germany."),
 39      Document(content="Paris is the capital of France."),
 40      Document(content="Rome is the capital of Italy."),
 41  ]
 42  
 43  builder = AnswerBuilder(reference_pattern="\[(\d+)\]", return_only_referenced_documents=False)
 44  result = builder.run(query="What is the capital of France?", replies=replies, documents=docs)["answers"][0]
 45  
 46  print(f"Answer: {result.data}")
 47  print("References:")
 48  for doc in result.documents:
 49      if doc.meta["referenced"]:
 50          print(f"[{doc.meta['source_index']}] {doc.content}")
 51  print("Other sources:")
 52  for doc in result.documents:
 53      if not doc.meta["referenced"]:
 54          print(f"[{doc.meta['source_index']}] {doc.content}")
 55  
 56  # >> Answer: The capital of France is Paris
 57  # >> References:
 58  # >> [2] Paris is the capital of France.
 59  # >> Other sources:
 60  # >> [1] Berlin is the capital of Germany.
 61  # >> [3] Rome is the capital of Italy.
 62  ```
 63  
 64  #### __init__
 65  
 66  ```python
 67  __init__(
 68      pattern: str | None = None,
 69      reference_pattern: str | None = None,
 70      last_message_only: bool = False,
 71      *,
 72      return_only_referenced_documents: bool = True
 73  ) -> None
 74  ```
 75  
 76  Creates an instance of the AnswerBuilder component.
 77  
 78  **Parameters:**
 79  
 80  - **pattern** (<code>str | None</code>) – The regular expression pattern to extract the answer text from the Generator.
 81    If not specified, the entire response is used as the answer.
 82    The regular expression can have one capture group at most.
 83    If present, the capture group text
 84    is used as the answer. If no capture group is present, the whole match is used as the answer.
 85    Examples:
 86    `[^\n]+$` finds "this is an answer" in a string "this is an argument.\\nthis is an answer".
 87    `Answer: (.*)` finds "this is an answer" in a string "this is an argument. Answer: this is an answer".
 88  - **reference_pattern** (<code>str | None</code>) – The regular expression pattern used for parsing the document references.
 89    If not specified, no parsing is done, and all documents are returned.
 90    References need to be specified as indices of the input documents and start at [1].
 91    Example: `\[(\d+)\]` finds "1" in a string "this is an answer[1]".
 92    If this parameter is provided, documents metadata will contain a "referenced" key with a boolean value.
 93  - **last_message_only** (<code>bool</code>) – If False (default value), all messages are used as the answer.
 94    If True, only the last message is used as the answer.
 95  - **return_only_referenced_documents** (<code>bool</code>) – To be used in conjunction with `reference_pattern`.
 96    If True (default value), only the documents that were actually referenced in `replies` are returned.
 97    If False, all documents are returned.
 98    If `reference_pattern` is not provided, this parameter has no effect, and all documents are returned.
 99  
100  #### run
101  
102  ```python
103  run(
104      query: str,
105      replies: list[str] | list[ChatMessage],
106      meta: list[dict[str, Any]] | None = None,
107      documents: list[Document] | None = None,
108      pattern: str | None = None,
109      reference_pattern: str | None = None,
110  ) -> dict[str, Any]
111  ```
112  
113  Turns the output of a Generator into `GeneratedAnswer` objects using regular expressions.
114  
115  **Parameters:**
116  
117  - **query** (<code>str</code>) – The input query used as the Generator prompt.
118  - **replies** (<code>list\[str\] | list\[ChatMessage\]</code>) – The output of the Generator. Can be a list of strings or a list of `ChatMessage` objects.
119  - **meta** (<code>list\[dict\[str, Any\]\] | None</code>) – The metadata returned by the Generator. If not specified, the generated answer will contain no metadata.
120  - **documents** (<code>list\[Document\] | None</code>) – The documents used as the Generator inputs. If specified, they are added to
121    the `GeneratedAnswer` objects.
122    Each Document.meta includes a "source_index" key, representing its 1-based position in the input list.
123    When `reference_pattern` is provided:
124  - "referenced" key is added to the Document.meta, indicating if the document was referenced in the output.
125  - `return_only_referenced_documents` init parameter controls if all or only referenced documents are
126    returned.
127  - **pattern** (<code>str | None</code>) – The regular expression pattern to extract the answer text from the Generator.
128    If not specified, the entire response is used as the answer.
129    The regular expression can have one capture group at most.
130    If present, the capture group text
131    is used as the answer. If no capture group is present, the whole match is used as the answer.
132    Examples:
133    `[^\n]+$` finds "this is an answer" in a string "this is an argument.\\nthis is an answer".
134    `Answer: (.*)` finds "this is an answer" in a string
135    "this is an argument. Answer: this is an answer".
136  - **reference_pattern** (<code>str | None</code>) – The regular expression pattern used for parsing the document references.
137    If not specified, no parsing is done, and all documents are returned.
138    References need to be specified as indices of the input documents and start at [1].
139    Example: `\[(\d+)\]` finds "1" in a string "this is an answer[1]".
140  
141  **Returns:**
142  
143  - <code>dict\[str, Any\]</code> – A dictionary with the following keys:
144  - `answers`: The answers received from the output of the Generator.
145  
146  ## chat_prompt_builder
147  
148  ### ChatPromptBuilder
149  
150  Renders a chat prompt from a template using Jinja2 syntax.
151  
152  A template can be a list of `ChatMessage` objects, or a special string, as shown in the usage examples.
153  
154  It constructs prompts using static or dynamic templates, which you can update for each pipeline run.
155  
156  Template variables in the template are optional unless specified otherwise.
157  If an optional variable isn't provided, it defaults to an empty string. Use `variable` and `required_variables`
158  to define input types and required variables.
159  
160  ### Usage examples
161  
162  #### Static ChatMessage prompt template
163  
164  ```python
165  template = [ChatMessage.from_user("Translate to {{ target_language }}. Context: {{ snippet }}; Translation:")]
166  builder = ChatPromptBuilder(template=template)
167  builder.run(target_language="spanish", snippet="I can't speak spanish.")
168  ```
169  
170  #### Overriding static ChatMessage template at runtime
171  
172  ```python
173  template = [ChatMessage.from_user("Translate to {{ target_language }}. Context: {{ snippet }}; Translation:")]
174  builder = ChatPromptBuilder(template=template)
175  builder.run(target_language="spanish", snippet="I can't speak spanish.")
176  
177  msg = "Translate to {{ target_language }} and summarize. Context: {{ snippet }}; Summary:"
178  summary_template = [ChatMessage.from_user(msg)]
179  builder.run(target_language="spanish", snippet="I can't speak spanish.", template=summary_template)
180  ```
181  
182  #### Dynamic ChatMessage prompt template
183  
184  ```python
185  from haystack.components.builders import ChatPromptBuilder
186  from haystack.components.generators.chat import OpenAIChatGenerator
187  from haystack.dataclasses import ChatMessage
188  from haystack import Pipeline
189  
190  # no parameter init, we don't use any runtime template variables
191  prompt_builder = ChatPromptBuilder()
192  llm = OpenAIChatGenerator(model="gpt-5-mini")
193  
194  pipe = Pipeline()
195  pipe.add_component("prompt_builder", prompt_builder)
196  pipe.add_component("llm", llm)
197  pipe.connect("prompt_builder.prompt", "llm.messages")
198  
199  location = "Berlin"
200  language = "English"
201  system_message = ChatMessage.from_system("You are an assistant giving information to tourists in {{language}}")
202  messages = [system_message, ChatMessage.from_user("Tell me about {{location}}")]
203  
204  res = pipe.run(data={"prompt_builder": {"template_variables": {"location": location, "language": language},
205                                      "template": messages}})
206  print(res)
207  # >> {'llm': {'replies': [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text=
208  # "Berlin is the capital city of Germany and one of the most vibrant
209  # and diverse cities in Europe. Here are some key things to know...Enjoy your time exploring the vibrant and dynamic
210  # capital of Germany!")], _name=None, _meta={'model': 'gpt-5-mini',
211  # 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 27, 'completion_tokens': 681, 'total_tokens':
212  # 708}})]}}
213  
214  messages = [system_message, ChatMessage.from_user("What's the weather forecast for {{location}} in the next {{day_count}} days?")]
215  
216  res = pipe.run(data={"prompt_builder": {"template_variables": {"location": location, "day_count": "5"},
217                                      "template": messages}})
218  
219  print(res)
220  # >> {'llm': {'replies': [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text=
221  # "Here is the weather forecast for Berlin in the next 5
222  # days:\n\nDay 1: Mostly cloudy with a high of 22°C (72°F) and...so it's always a good idea to check for updates
223  # closer to your visit.")], _name=None, _meta={'model': 'gpt-5-mini',
224  # 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 37, 'completion_tokens': 201,
225  # 'total_tokens': 238}})]}}
226  ```
227  
228  #### String prompt template
229  
230  ```python
231  from haystack.components.builders import ChatPromptBuilder
232  from haystack.dataclasses.image_content import ImageContent
233  
234  template = """
235  {% message role="system" %}
236  You are a helpful assistant.
237  {% endmessage %}
238  
239  {% message role="user" %}
240  Hello! I am {{user_name}}. What's the difference between the following images?
241  {% for image in images %}
242  {{ image | templatize_part }}
243  {% endfor %}
244  {% endmessage %}
245  """
246  
247  images = [ImageContent.from_file_path("test/test_files/images/apple.jpg"),
248            ImageContent.from_file_path("test/test_files/images/haystack-logo.png")]
249  
250  builder = ChatPromptBuilder(template=template)
251  builder.run(user_name="John", images=images)
252  ```
253  
254  #### __init__
255  
256  ```python
257  __init__(
258      template: list[ChatMessage] | str | None = None,
259      required_variables: list[str] | Literal["*"] | None = None,
260      variables: list[str] | None = None,
261  ) -> None
262  ```
263  
264  Constructs a ChatPromptBuilder component.
265  
266  **Parameters:**
267  
268  - **template** (<code>list\[ChatMessage\] | str | None</code>) – A list of `ChatMessage` objects or a string template. The component looks for Jinja2 template syntax and
269    renders the prompt with the provided variables. Provide the template in either
270    the `init` method`or the`run\` method.
271  - **required_variables** (<code>list\[str\] | Literal['\*'] | None</code>) – List variables that must be provided as input to ChatPromptBuilder.
272    If a variable listed as required is not provided, an exception is raised.
273    If set to `"*"`, all variables found in the prompt are required. Optional.
274  - **variables** (<code>list\[str\] | None</code>) – List input variables to use in prompt templates instead of the ones inferred from the
275    `template` parameter. For example, to use more variables during prompt engineering than the ones present
276    in the default template, you can provide them here.
277  
278  #### run
279  
280  ```python
281  run(
282      template: list[ChatMessage] | str | None = None,
283      template_variables: dict[str, Any] | None = None,
284      **kwargs: Any
285  ) -> dict[str, list[ChatMessage]]
286  ```
287  
288  Renders the prompt template with the provided variables.
289  
290  It applies the template variables to render the final prompt. You can provide variables with pipeline kwargs.
291  To overwrite the default template, you can set the `template` parameter.
292  To overwrite pipeline kwargs, you can set the `template_variables` parameter.
293  
294  **Parameters:**
295  
296  - **template** (<code>list\[ChatMessage\] | str | None</code>) – An optional list of `ChatMessage` objects or string template to overwrite ChatPromptBuilder's default
297    template.
298    If `None`, the default template provided at initialization is used.
299  - **template_variables** (<code>dict\[str, Any\] | None</code>) – An optional dictionary of template variables to overwrite the pipeline variables.
300  - **kwargs** (<code>Any</code>) – Pipeline variables used for rendering the prompt.
301  
302  **Returns:**
303  
304  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys:
305  - `prompt`: The updated list of `ChatMessage` objects after rendering the templates.
306  
307  **Raises:**
308  
309  - <code>ValueError</code> – If `chat_messages` is empty or contains elements that are not instances of `ChatMessage`.
310  
311  #### to_dict
312  
313  ```python
314  to_dict() -> dict[str, Any]
315  ```
316  
317  Returns a dictionary representation of the component.
318  
319  **Returns:**
320  
321  - <code>dict\[str, Any\]</code> – Serialized dictionary representation of the component.
322  
323  #### from_dict
324  
325  ```python
326  from_dict(data: dict[str, Any]) -> ChatPromptBuilder
327  ```
328  
329  Deserialize this component from a dictionary.
330  
331  **Parameters:**
332  
333  - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize and create the component.
334  
335  **Returns:**
336  
337  - <code>ChatPromptBuilder</code> – The deserialized component.
338  
339  ## prompt_builder
340  
341  ### PromptBuilder
342  
343  Renders a prompt filling in any variables so that it can send it to a Generator.
344  
345  The prompt uses Jinja2 template syntax.
346  The variables in the default template are used as PromptBuilder's input and are all optional.
347  If they're not provided, they're replaced with an empty string in the rendered prompt.
348  To try out different prompts, you can replace the prompt template at runtime by
349  providing a template for each pipeline run invocation.
350  
351  ### Usage examples
352  
353  #### On its own
354  
355  This example uses PromptBuilder to render a prompt template and fill it with `target_language`
356  and `snippet`. PromptBuilder returns a prompt with the string "Translate the following context to Spanish.
357  Context: I can't speak Spanish.; Translation:".
358  
359  ```python
360  from haystack.components.builders import PromptBuilder
361  
362  template = "Translate the following context to {{ target_language }}. Context: {{ snippet }}; Translation:"
363  builder = PromptBuilder(template=template)
364  builder.run(target_language="spanish", snippet="I can't speak spanish.")
365  ```
366  
367  #### In a Pipeline
368  
369  This is an example of a RAG pipeline where PromptBuilder renders a custom prompt template and fills it
370  with the contents of the retrieved documents and a query. The rendered prompt is then sent to a Generator.
371  
372  ```python
373  from haystack import Pipeline, Document
374  from haystack.utils import Secret
375  from haystack.components.generators import OpenAIGenerator
376  from haystack.components.builders.prompt_builder import PromptBuilder
377  
378  # in a real world use case documents could come from a retriever, web, or any other source
379  documents = [Document(content="Joe lives in Berlin"), Document(content="Joe is a software engineer")]
380  prompt_template = """
381      Given these documents, answer the question.
382      Documents:
383      {% for doc in documents %}
384          {{ doc.content }}
385      {% endfor %}
386  
387      Question: {{query}}
388      Answer:
389      """
390  p = Pipeline()
391  p.add_component(instance=PromptBuilder(template=prompt_template), name="prompt_builder")
392  p.add_component(instance=OpenAIGenerator(api_key=Secret.from_env_var("OPENAI_API_KEY")), name="llm")
393  p.connect("prompt_builder", "llm")
394  
395  question = "Where does Joe live?"
396  result = p.run({"prompt_builder": {"documents": documents, "query": question}})
397  print(result)
398  ```
399  
400  #### Changing the template at runtime (prompt engineering)
401  
402  You can change the prompt template of an existing pipeline, like in this example:
403  
404  ```python
405  documents = [
406      Document(content="Joe lives in Berlin", meta={"name": "doc1"}),
407      Document(content="Joe is a software engineer", meta={"name": "doc1"}),
408  ]
409  new_template = """
410      You are a helpful assistant.
411      Given these documents, answer the question.
412      Documents:
413      {% for doc in documents %}
414          Document {{ loop.index }}:
415          Document name: {{ doc.meta['name'] }}
416          {{ doc.content }}
417      {% endfor %}
418  
419      Question: {{ query }}
420      Answer:
421      """
422  p.run({
423      "prompt_builder": {
424          "documents": documents,
425          "query": question,
426          "template": new_template,
427      },
428  })
429  ```
430  
431  To replace the variables in the default template when testing your prompt,
432  pass the new variables in the `variables` parameter.
433  
434  #### Overwriting variables at runtime
435  
436  To overwrite the values of variables, use `template_variables` during runtime:
437  
438  ```python
439  language_template = """
440  You are a helpful assistant.
441  Given these documents, answer the question.
442  Documents:
443  {% for doc in documents %}
444      Document {{ loop.index }}:
445      Document name: {{ doc.meta['name'] }}
446      {{ doc.content }}
447  {% endfor %}
448  
449  Question: {{ query }}
450  Please provide your answer in {{ answer_language | default('English') }}
451  Answer:
452  """
453  p.run({
454      "prompt_builder": {
455          "documents": documents,
456          "query": question,
457          "template": language_template,
458          "template_variables": {"answer_language": "German"},
459      },
460  })
461  ```
462  
463  Note that `language_template` introduces variable `answer_language` which is not bound to any pipeline variable.
464  If not set otherwise, it will use its default value 'English'.
465  This example overwrites its value to 'German'.
466  Use `template_variables` to overwrite pipeline variables (such as documents) as well.
467  
468  #### __init__
469  
470  ```python
471  __init__(
472      template: str,
473      required_variables: list[str] | Literal["*"] | None = None,
474      variables: list[str] | None = None,
475  ) -> None
476  ```
477  
478  Constructs a PromptBuilder component.
479  
480  **Parameters:**
481  
482  - **template** (<code>str</code>) – A prompt template that uses Jinja2 syntax to add variables. For example:
483    `"Summarize this document: {{ documents[0].content }}\nSummary:"`
484    It's used to render the prompt.
485    The variables in the default template are input for PromptBuilder and are all optional,
486    unless explicitly specified.
487    If an optional variable is not provided, it's replaced with an empty string in the rendered prompt.
488  - **required_variables** (<code>list\[str\] | Literal['\*'] | None</code>) – List variables that must be provided as input to PromptBuilder.
489    If a variable listed as required is not provided, an exception is raised.
490    If set to `"*"`, all variables found in the prompt are required. Optional.
491  - **variables** (<code>list\[str\] | None</code>) – List input variables to use in prompt templates instead of the ones inferred from the
492    `template` parameter. For example, to use more variables during prompt engineering than the ones present
493    in the default template, you can provide them here.
494  
495  #### to_dict
496  
497  ```python
498  to_dict() -> dict[str, Any]
499  ```
500  
501  Returns a dictionary representation of the component.
502  
503  **Returns:**
504  
505  - <code>dict\[str, Any\]</code> – Serialized dictionary representation of the component.
506  
507  #### run
508  
509  ```python
510  run(
511      template: str | None = None,
512      template_variables: dict[str, Any] | None = None,
513      **kwargs: Any
514  ) -> dict[str, Any]
515  ```
516  
517  Renders the prompt template with the provided variables.
518  
519  It applies the template variables to render the final prompt. You can provide variables via pipeline kwargs.
520  In order to overwrite the default template, you can set the `template` parameter.
521  In order to overwrite pipeline kwargs, you can set the `template_variables` parameter.
522  
523  **Parameters:**
524  
525  - **template** (<code>str | None</code>) – An optional string template to overwrite PromptBuilder's default template. If None, the default template
526    provided at initialization is used.
527  - **template_variables** (<code>dict\[str, Any\] | None</code>) – An optional dictionary of template variables to overwrite the pipeline variables.
528  - **kwargs** (<code>Any</code>) – Pipeline variables used for rendering the prompt.
529  
530  **Returns:**
531  
532  - <code>dict\[str, Any\]</code> – A dictionary with the following keys:
533  - `prompt`: The updated prompt text after rendering the prompt template.
534  
535  **Raises:**
536  
537  - <code>ValueError</code> – If any of the required template variables is not provided.