evaluation_api.md
1 --- 2 title: "Evaluation" 3 id: evaluation-api 4 description: "Represents the results of evaluation." 5 slug: "/evaluation-api" 6 --- 7 8 <a id="eval_run_result"></a> 9 10 ## Module eval\_run\_result 11 12 <a id="eval_run_result.EvaluationRunResult"></a> 13 14 ### EvaluationRunResult 15 16 Contains the inputs and the outputs of an evaluation pipeline and provides methods to inspect them. 17 18 <a id="eval_run_result.EvaluationRunResult.__init__"></a> 19 20 #### EvaluationRunResult.\_\_init\_\_ 21 22 ```python 23 def __init__(run_name: str, inputs: dict[str, list[Any]], 24 results: dict[str, dict[str, Any]]) 25 ``` 26 27 Initialize a new evaluation run result. 28 29 **Arguments**: 30 31 - `run_name`: Name of the evaluation run. 32 - `inputs`: Dictionary containing the inputs used for the run. Each key is the name of the input and its value is a list 33 of input values. The length of the lists should be the same. 34 - `results`: Dictionary containing the results of the evaluators used in the evaluation pipeline. Each key is the name 35 of the metric and its value is dictionary with the following keys: 36 - 'score': The aggregated score for the metric. 37 - 'individual_scores': A list of scores for each input sample. 38 39 <a id="eval_run_result.EvaluationRunResult.aggregated_report"></a> 40 41 #### EvaluationRunResult.aggregated\_report 42 43 ```python 44 def aggregated_report( 45 output_format: Literal["json", "csv", "df"] = "json", 46 csv_file: str | None = None 47 ) -> Union[dict[str, list[Any]], "DataFrame", str] 48 ``` 49 50 Generates a report with aggregated scores for each metric. 51 52 **Arguments**: 53 54 - `output_format`: The output format for the report, "json", "csv", or "df", default to "json". 55 - `csv_file`: Filepath to save CSV output if `output_format` is "csv", must be provided. 56 57 **Returns**: 58 59 JSON or DataFrame with aggregated scores, in case the output is set to a CSV file, a message confirming the 60 successful write or an error message. 61 62 <a id="eval_run_result.EvaluationRunResult.detailed_report"></a> 63 64 #### EvaluationRunResult.detailed\_report 65 66 ```python 67 def detailed_report( 68 output_format: Literal["json", "csv", "df"] = "json", 69 csv_file: str | None = None 70 ) -> Union[dict[str, list[Any]], "DataFrame", str] 71 ``` 72 73 Generates a report with detailed scores for each metric. 74 75 **Arguments**: 76 77 - `output_format`: The output format for the report, "json", "csv", or "df", default to "json". 78 - `csv_file`: Filepath to save CSV output if `output_format` is "csv", must be provided. 79 80 **Returns**: 81 82 JSON or DataFrame with the detailed scores, in case the output is set to a CSV file, a message confirming 83 the successful write or an error message. 84 85 <a id="eval_run_result.EvaluationRunResult.comparative_detailed_report"></a> 86 87 #### EvaluationRunResult.comparative\_detailed\_report 88 89 ```python 90 def comparative_detailed_report( 91 other: "EvaluationRunResult", 92 keep_columns: list[str] | None = None, 93 output_format: Literal["json", "csv", "df"] = "json", 94 csv_file: str | None = None) -> Union[str, "DataFrame", None] 95 ``` 96 97 Generates a report with detailed scores for each metric from two evaluation runs for comparison. 98 99 **Arguments**: 100 101 - `other`: Results of another evaluation run to compare with. 102 - `keep_columns`: List of common column names to keep from the inputs of the evaluation runs to compare. 103 - `output_format`: The output format for the report, "json", "csv", or "df", default to "json". 104 - `csv_file`: Filepath to save CSV output if `output_format` is "csv", must be provided. 105 106 **Returns**: 107 108 JSON or DataFrame with a comparison of the detailed scores, in case the output is set to a CSV file, 109 a message confirming the successful write or an error message. 110