Cradicle Explorer

/ docs-website / reference_versioned_docs / version-2.22 / haystack-api / evaluation_api.md
evaluation_api.md
  1  ---
  2  title: "Evaluation"
  3  id: evaluation-api
  4  description: "Represents the results of evaluation."
  5  slug: "/evaluation-api"
  6  ---
  7  
  8  <a id="eval_run_result"></a>
  9  
 10  ## Module eval\_run\_result
 11  
 12  <a id="eval_run_result.EvaluationRunResult"></a>
 13  
 14  ### EvaluationRunResult
 15  
 16  Contains the inputs and the outputs of an evaluation pipeline and provides methods to inspect them.
 17  
 18  <a id="eval_run_result.EvaluationRunResult.__init__"></a>
 19  
 20  #### EvaluationRunResult.\_\_init\_\_
 21  
 22  ```python
 23  def __init__(run_name: str, inputs: dict[str, list[Any]],
 24               results: dict[str, dict[str, Any]])
 25  ```
 26  
 27  Initialize a new evaluation run result.
 28  
 29  **Arguments**:
 30  
 31  - `run_name`: Name of the evaluation run.
 32  - `inputs`: Dictionary containing the inputs used for the run. Each key is the name of the input and its value is a list
 33  of input values. The length of the lists should be the same.
 34  - `results`: Dictionary containing the results of the evaluators used in the evaluation pipeline. Each key is the name
 35  of the metric and its value is dictionary with the following keys:
 36  - 'score': The aggregated score for the metric.
 37  - 'individual_scores': A list of scores for each input sample.
 38  
 39  <a id="eval_run_result.EvaluationRunResult.aggregated_report"></a>
 40  
 41  #### EvaluationRunResult.aggregated\_report
 42  
 43  ```python
 44  def aggregated_report(
 45      output_format: Literal["json", "csv", "df"] = "json",
 46      csv_file: str | None = None
 47  ) -> Union[dict[str, list[Any]], "DataFrame", str]
 48  ```
 49  
 50  Generates a report with aggregated scores for each metric.
 51  
 52  **Arguments**:
 53  
 54  - `output_format`: The output format for the report, "json", "csv", or "df", default to "json".
 55  - `csv_file`: Filepath to save CSV output if `output_format` is "csv", must be provided.
 56  
 57  **Returns**:
 58  
 59  JSON or DataFrame with aggregated scores, in case the output is set to a CSV file, a message confirming the
 60  successful write or an error message.
 61  
 62  <a id="eval_run_result.EvaluationRunResult.detailed_report"></a>
 63  
 64  #### EvaluationRunResult.detailed\_report
 65  
 66  ```python
 67  def detailed_report(
 68      output_format: Literal["json", "csv", "df"] = "json",
 69      csv_file: str | None = None
 70  ) -> Union[dict[str, list[Any]], "DataFrame", str]
 71  ```
 72  
 73  Generates a report with detailed scores for each metric.
 74  
 75  **Arguments**:
 76  
 77  - `output_format`: The output format for the report, "json", "csv", or "df", default to "json".
 78  - `csv_file`: Filepath to save CSV output if `output_format` is "csv", must be provided.
 79  
 80  **Returns**:
 81  
 82  JSON or DataFrame with the detailed scores, in case the output is set to a CSV file, a message confirming
 83  the successful write or an error message.
 84  
 85  <a id="eval_run_result.EvaluationRunResult.comparative_detailed_report"></a>
 86  
 87  #### EvaluationRunResult.comparative\_detailed\_report
 88  
 89  ```python
 90  def comparative_detailed_report(
 91          other: "EvaluationRunResult",
 92          keep_columns: list[str] | None = None,
 93          output_format: Literal["json", "csv", "df"] = "json",
 94          csv_file: str | None = None) -> Union[str, "DataFrame", None]
 95  ```
 96  
 97  Generates a report with detailed scores for each metric from two evaluation runs for comparison.
 98  
 99  **Arguments**:
100  
101  - `other`: Results of another evaluation run to compare with.
102  - `keep_columns`: List of common column names to keep from the inputs of the evaluation runs to compare.
103  - `output_format`: The output format for the report, "json", "csv", or "df", default to "json".
104  - `csv_file`: Filepath to save CSV output if `output_format` is "csv", must be provided.
105  
106  **Returns**:
107  
108  JSON or DataFrame with a comparison of the detailed scores, in case the output is set to a CSV file,
109  a message confirming the successful write or an error message.
110