ragas.md
1 --- 2 title: "Ragas" 3 id: integrations-ragas 4 description: "Ragas integration for Haystack" 5 slug: "/integrations-ragas" 6 --- 7 8 9 ## haystack_integrations.components.evaluators.ragas.evaluator 10 11 ### RagasEvaluator 12 13 A component that uses the Ragas framework to evaluate inputs against specified Ragas metrics. 14 15 See the [Ragas framework](https://docs.ragas.io/) for more details. 16 17 This component supports the modern Ragas metrics API (`ragas.metrics.collections`). 18 Each metric must be a `SimpleBaseMetric` instance with its LLM configured at construction time. 19 20 Usage example: 21 22 ```python 23 from openai import AsyncOpenAI 24 from ragas.llms import llm_factory 25 from ragas.metrics.collections import Faithfulness 26 from haystack_integrations.components.evaluators.ragas import RagasEvaluator 27 28 client = AsyncOpenAI() 29 llm = llm_factory("gpt-4o-mini", client=client) 30 31 evaluator = RagasEvaluator( 32 ragas_metrics=[Faithfulness(llm=llm)], 33 ) 34 output = evaluator.run( 35 query="Which is the most popular global sport?", 36 documents=[ 37 "Football is undoubtedly the world's most popular sport with" 38 " major events like the FIFA World Cup and sports personalities" 39 " like Ronaldo and Messi, drawing a followership of more than 4" 40 " billion people." 41 ], 42 reference="Football is the most popular sport with around 4 billion" 43 " followers worldwide", 44 ) 45 46 output['result'] 47 ``` 48 49 #### __init__ 50 51 ```python 52 __init__(ragas_metrics: list[SimpleBaseMetric]) -> None 53 ``` 54 55 Constructs a new Ragas evaluator. 56 57 **Parameters:** 58 59 - **ragas_metrics** (<code>list\[SimpleBaseMetric\]</code>) – A list of modern Ragas metrics from `ragas.metrics.collections`. 60 Each metric must be fully configured (including its LLM) at construction time. 61 Available metrics can be found in the 62 [Ragas documentation](https://docs.ragas.io/en/stable/concepts/metrics/available_metrics/). 63 64 #### to_dict 65 66 ```python 67 to_dict() -> dict[str, Any] 68 ``` 69 70 Serialize this component to a dictionary. 71 72 **Returns:** 73 74 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 75 76 #### from_dict 77 78 ```python 79 from_dict(data: dict[str, Any]) -> RagasEvaluator 80 ``` 81 82 Deserialize this component from a dictionary. 83 84 Metrics are reconstructed from their stored class path and LLM/embedding 85 configuration. Only the `openai` provider is supported for automatic 86 deserialization; the API key is read from the `OPENAI_API_KEY` environment 87 variable at load time. 88 89 **Parameters:** 90 91 - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from. 92 93 **Returns:** 94 95 - <code>RagasEvaluator</code> – Deserialized component. 96 97 #### run 98 99 ```python 100 run( 101 query: str | None = None, 102 response: list[ChatMessage] | str | None = None, 103 documents: list[Document | str] | None = None, 104 reference_contexts: list[str] | None = None, 105 multi_responses: list[str] | None = None, 106 reference: str | None = None, 107 rubrics: dict[str, str] | None = None, 108 ) -> dict[str, dict[str, MetricResult]] 109 ``` 110 111 Evaluates the provided inputs against each metric and returns the results. 112 113 **Parameters:** 114 115 - **query** (<code>str | None</code>) – The input query from the user. 116 - **response** (<code>list\[ChatMessage\] | str | None</code>) – A list of ChatMessage responses (typically from a language model or agent). 117 - **documents** (<code>list\[Document | str\] | None</code>) – A list of Haystack Document or strings that were retrieved for the query. 118 - **reference_contexts** (<code>list\[str\] | None</code>) – A list of reference contexts that should have been retrieved for the query. 119 - **multi_responses** (<code>list\[str\] | None</code>) – List of multiple responses generated for the query. 120 - **reference** (<code>str | None</code>) – A string reference answer for the query. 121 - **rubrics** (<code>dict\[str, str\] | None</code>) – A dictionary of evaluation rubric, where keys represent the score 122 and the values represent the corresponding evaluation criteria. 123 124 **Returns:** 125 126 - <code>dict\[str, dict\[str, MetricResult\]\]</code> – A dictionary with key `result` mapping metric names to their `MetricResult`.