evaluation_api.md
1 --- 2 title: "Evaluation" 3 id: evaluation-api 4 description: "Represents the results of evaluation." 5 slug: "/evaluation-api" 6 --- 7 8 9 ## eval_run_result 10 11 ### EvaluationRunResult 12 13 Contains the inputs and the outputs of an evaluation pipeline and provides methods to inspect them. 14 15 #### __init__ 16 17 ```python 18 __init__( 19 run_name: str, 20 inputs: dict[str, list[Any]], 21 results: dict[str, dict[str, Any]], 22 ) -> None 23 ``` 24 25 Initialize a new evaluation run result. 26 27 **Parameters:** 28 29 - **run_name** (<code>str</code>) – Name of the evaluation run. 30 - **inputs** (<code>dict\[str, list\[Any\]\]</code>) – Dictionary containing the inputs used for the run. Each key is the name of the input and its value is a list 31 of input values. The length of the lists should be the same. 32 - **results** (<code>dict\[str, dict\[str, Any\]\]</code>) – Dictionary containing the results of the evaluators used in the evaluation pipeline. Each key is the name 33 of the metric and its value is dictionary with the following keys: 34 - 'score': The aggregated score for the metric. 35 - 'individual_scores': A list of scores for each input sample. 36 37 #### aggregated_report 38 39 ```python 40 aggregated_report( 41 output_format: Literal["json", "csv", "df"] = "json", 42 csv_file: str | None = None, 43 ) -> Union[dict[str, list[Any]], DataFrame, str] 44 ``` 45 46 Generates a report with aggregated scores for each metric. 47 48 **Parameters:** 49 50 - **output_format** (<code>Literal['json', 'csv', 'df']</code>) – The output format for the report, "json", "csv", or "df", default to "json". 51 - **csv_file** (<code>str | None</code>) – Filepath to save CSV output if `output_format` is "csv", must be provided. 52 53 **Returns:** 54 55 - <code>Union\[dict\[str, list\[Any\]\], DataFrame, str\]</code> – JSON or DataFrame with aggregated scores, in case the output is set to a CSV file, a message confirming the 56 successful write or an error message. 57 58 #### detailed_report 59 60 ```python 61 detailed_report( 62 output_format: Literal["json", "csv", "df"] = "json", 63 csv_file: str | None = None, 64 ) -> Union[dict[str, list[Any]], DataFrame, str] 65 ``` 66 67 Generates a report with detailed scores for each metric. 68 69 **Parameters:** 70 71 - **output_format** (<code>Literal['json', 'csv', 'df']</code>) – The output format for the report, "json", "csv", or "df", default to "json". 72 - **csv_file** (<code>str | None</code>) – Filepath to save CSV output if `output_format` is "csv", must be provided. 73 74 **Returns:** 75 76 - <code>Union\[dict\[str, list\[Any\]\], DataFrame, str\]</code> – JSON or DataFrame with the detailed scores, in case the output is set to a CSV file, a message confirming 77 the successful write or an error message. 78 79 #### comparative_detailed_report 80 81 ```python 82 comparative_detailed_report( 83 other: EvaluationRunResult, 84 keep_columns: list[str] | None = None, 85 output_format: Literal["json", "csv", "df"] = "json", 86 csv_file: str | None = None, 87 ) -> Union[str, DataFrame, None] 88 ``` 89 90 Generates a report with detailed scores for each metric from two evaluation runs for comparison. 91 92 **Parameters:** 93 94 - **other** (<code>EvaluationRunResult</code>) – Results of another evaluation run to compare with. 95 - **keep_columns** (<code>list\[str\] | None</code>) – List of common column names to keep from the inputs of the evaluation runs to compare. 96 - **output_format** (<code>Literal['json', 'csv', 'df']</code>) – The output format for the report, "json", "csv", or "df", default to "json". 97 - **csv_file** (<code>str | None</code>) – Filepath to save CSV output if `output_format` is "csv", must be provided. 98 99 **Returns:** 100 101 - <code>Union\[str, DataFrame, None\]</code> – JSON or DataFrame with a comparison of the detailed scores, in case the output is set to a CSV file, 102 a message confirming the successful write or an error message. 103 104 **Raises:** 105 106 - <code>TypeError</code> – If `other` is not an EvaluationRunResult instance, or if the detailed reports are not 107 dictionaries. 108 - <code>ValueError</code> – If the `other` parameter is missing required attributes.