Skip to contents

Execute a module on a data frame/tibble with optimized batch processing.

Usage

run_dataset(module, ...)

# S3 method for class 'Module'
run_dataset(
  module,
  data,
  .llm = NULL,
  .verbose = FALSE,
  .parallel = FALSE,
  .parallel_method = c("ellmer", "mirai"),
  .progress = TRUE,
  .return_format = "simple",
  ...
)

Arguments

module

A DSPrrr module (e.g., created with module())

...

Additional arguments passed to run().

data

A tibble or data frame with columns matching the module's inputs.

.llm

Optional ellmer Chat object for LLM calls

.verbose

Logical whether to print verbose output

.parallel

Logical whether to enable parallel processing

.parallel_method

Character, either "ellmer" (default) or "mirai". "ellmer" uses ellmer's parallel_chat_structured() for native async HTTP parallelism (more efficient, single process). "mirai" uses mirai for multi-process parallelism (requires .llm = NULL).

.progress

Logical whether to show progress bar

.return_format

Character either "simple" or "structured"

Value

A tibble with the input columns plus a result column containing outputs

Examples

if (FALSE) { # \dontrun{
# Process data
df <- tibble::tibble(
  text = c("I love this!", "This is bad", "Okay product")
)

llm <- ellmer::chat_openai()
results <- signature("text -> sentiment") |>
  module(type = "predict") |>
  run_dataset(df, .llm = llm)
} # }