evaluation_api.md
  1  ---
  2  title: "Evaluation"
  3  id: evaluation-api
  4  description: "Represents the results of evaluation."
  5  slug: "/evaluation-api"
  6  ---
  7  
  8  
  9  ## eval_run_result
 10  
 11  ### EvaluationRunResult
 12  
 13  Contains the inputs and the outputs of an evaluation pipeline and provides methods to inspect them.
 14  
 15  #### __init__
 16  
 17  ```python
 18  __init__(
 19      run_name: str,
 20      inputs: dict[str, list[Any]],
 21      results: dict[str, dict[str, Any]],
 22  )
 23  ```
 24  
 25  Initialize a new evaluation run result.
 26  
 27  **Parameters:**
 28  
 29  - **run_name** (<code>str</code>) – Name of the evaluation run.
 30  - **inputs** (<code>dict\[str, list\[Any\]\]</code>) – Dictionary containing the inputs used for the run. Each key is the name of the input and its value is a list
 31    of input values. The length of the lists should be the same.
 32  - **results** (<code>dict\[str, dict\[str, Any\]\]</code>) – Dictionary containing the results of the evaluators used in the evaluation pipeline. Each key is the name
 33    of the metric and its value is dictionary with the following keys:
 34    - 'score': The aggregated score for the metric.
 35    - 'individual_scores': A list of scores for each input sample.
 36  
 37  #### aggregated_report
 38  
 39  ```python
 40  aggregated_report(
 41      output_format: Literal["json", "csv", "df"] = "json",
 42      csv_file: str | None = None,
 43  ) -> Union[dict[str, list[Any]], DataFrame, str]
 44  ```
 45  
 46  Generates a report with aggregated scores for each metric.
 47  
 48  **Parameters:**
 49  
 50  - **output_format** (<code>Literal['json', 'csv', 'df']</code>) – The output format for the report, "json", "csv", or "df", default to "json".
 51  - **csv_file** (<code>str | None</code>) – Filepath to save CSV output if `output_format` is "csv", must be provided.
 52  
 53  **Returns:**
 54  
 55  - <code>Union\[dict\[str, list\[Any\]\], DataFrame, str\]</code> – JSON or DataFrame with aggregated scores, in case the output is set to a CSV file, a message confirming the
 56    successful write or an error message.
 57  
 58  #### detailed_report
 59  
 60  ```python
 61  detailed_report(
 62      output_format: Literal["json", "csv", "df"] = "json",
 63      csv_file: str | None = None,
 64  ) -> Union[dict[str, list[Any]], DataFrame, str]
 65  ```
 66  
 67  Generates a report with detailed scores for each metric.
 68  
 69  **Parameters:**
 70  
 71  - **output_format** (<code>Literal['json', 'csv', 'df']</code>) – The output format for the report, "json", "csv", or "df", default to "json".
 72  - **csv_file** (<code>str | None</code>) – Filepath to save CSV output if `output_format` is "csv", must be provided.
 73  
 74  **Returns:**
 75  
 76  - <code>Union\[dict\[str, list\[Any\]\], DataFrame, str\]</code> – JSON or DataFrame with the detailed scores, in case the output is set to a CSV file, a message confirming
 77    the successful write or an error message.
 78  
 79  #### comparative_detailed_report
 80  
 81  ```python
 82  comparative_detailed_report(
 83      other: EvaluationRunResult,
 84      keep_columns: list[str] | None = None,
 85      output_format: Literal["json", "csv", "df"] = "json",
 86      csv_file: str | None = None,
 87  ) -> Union[str, DataFrame, None]
 88  ```
 89  
 90  Generates a report with detailed scores for each metric from two evaluation runs for comparison.
 91  
 92  **Parameters:**
 93  
 94  - **other** (<code>EvaluationRunResult</code>) – Results of another evaluation run to compare with.
 95  - **keep_columns** (<code>list\[str\] | None</code>) – List of common column names to keep from the inputs of the evaluation runs to compare.
 96  - **output_format** (<code>Literal['json', 'csv', 'df']</code>) – The output format for the report, "json", "csv", or "df", default to "json".
 97  - **csv_file** (<code>str | None</code>) – Filepath to save CSV output if `output_format` is "csv", must be provided.
 98  
 99  **Returns:**
100  
101  - <code>Union\[str, DataFrame, None\]</code> – JSON or DataFrame with a comparison of the detailed scores, in case the output is set to a CSV file,
102    a message confirming the successful write or an error message.
103  
104  **Raises:**
105  
106  - <code>TypeError</code> – If `other` is not an EvaluationRunResult instance, or if the detailed reports are not
107    dictionaries.
108  - <code>ValueError</code> – If the `other` parameter is missing required attributes.