Generic evaluation entry point for DSPrrr modules. Executes the module on a dataset, applies a metric to each example, and returns aggregate statistics together with the predictions and metadata required for downstream analysis.
Arguments
- module
A DSPrrr module created with
module().- ...
Arguments passed to methods:
data: A data frame or tibble containing columns that match the module's signature inputs plus any expected fields used by metric.metric: A function applied per example with signaturemetric(prediction, expected_row).
Additional arguments passed to
run_dataset():.llm: Optional ellmer chat object.parallel: Logical; whether to allow parallel execution.progress: Logical; whether to display progress while evaluating.return_format: Character;"simple"returns just scores and predictions,"structured"(default) includes full metadata and data
Value
A list with elements. When .return_format = "structured" (default):
mean_score: numeric mean over all successful metric evaluations.scores: per-example numeric scores (coerced from logical metrics).predictions: list of model outputs.metadata: list of metadata captured fromrun().n_evaluated: number of successful evaluations.n_errors: number of metric failures.errors: character vector with error messages, when any.data: input data augmented with prediction metadata.
When .return_format = "simple":
mean_score,scores,predictions,n_evaluated,n_errors,errors(omitsmetadataanddatafor lighter-weight results)
See also
run()for executing without metricsrun_dataset()for batch execution without metricsoptimize_grid()for parameter optimizationmetric_exact_match(),metric_contains()for built-in metrics
