Execute Module on Data — run

Execute a module on a data frame/tibble with optimized batch processing.

Usage

run_dataset(module, ...)

# S3 method for class 'Module'
run_dataset(
  module,
  data,
  .llm = NULL,
  .verbose = FALSE,
  .parallel = FALSE,
  .parallel_method = c("ellmer", "mirai"),
  .progress = TRUE,
  .return_format = "simple",
  ...
)

Arguments

module: A DSPrrr module (e.g., created with module())
...: Additional arguments passed to run().
data: A tibble or data frame with columns matching the module's inputs.
.llm: Optional ellmer Chat object for LLM calls
.verbose: Logical whether to print verbose output
.parallel: Logical whether to enable parallel processing
.parallel_method: Character, either "ellmer" (default) or "mirai". "ellmer" uses ellmer's parallel_chat_structured() for native async HTTP parallelism (more efficient, single process). "mirai" uses mirai for multi-process parallelism (requires .llm = NULL).
.progress: Logical whether to show progress bar
.return_format: Character either "simple" or "structured"

Value

A tibble with the input columns plus a result column containing outputs

Examples

if (FALSE) { # \dontrun{
# Process data
df <- tibble::tibble(
  text = c("I love this!", "This is bad", "Okay product")
)

llm <- ellmer::chat_openai()
results <- signature("text -> sentiment") |>
  module(type = "predict") |>
  run_dataset(df, .llm = llm)
} # }