Skip to contents

Evaluate the performance of a compiled module on a test dataset.

Usage

evaluate_dsp(module, data, metric, .llm = NULL, verbose = TRUE)

Arguments

module

A DSPrrr module (compiled or not)

data

Test data as a data frame or tibble.

metric

A metric function from metric_*() functions

.llm

Optional LLM connection for running the module

verbose

Whether to show progress

Value

A list with evaluation results including mean score and per-example scores

Examples

if (FALSE) { # \dontrun{
# Evaluate a module
results <- evaluate_dsp(
  module = optimized_classifier,
  data = test_data,
  metric = metric_exact_match(field = "sentiment"),
  .llm = llm_connection
)

print(results$mean_score)
} # }