Execute a module on a data frame/tibble with optimized batch processing.
Usage
run_dataset(module, ...)
# S3 method for class 'Module'
run_dataset(
module,
data,
.llm = NULL,
.verbose = FALSE,
.parallel = FALSE,
.parallel_method = c("ellmer", "mirai"),
.progress = TRUE,
.return_format = "simple",
...
)Arguments
- module
A DSPrrr module (e.g., created with
module())- ...
Additional arguments passed to
run().- data
A tibble or data frame with columns matching the module's inputs.
- .llm
Optional ellmer Chat object for LLM calls
- .verbose
Logical whether to print verbose output
- .parallel
Logical whether to enable parallel processing
- .parallel_method
Character, either "ellmer" (default) or "mirai". "ellmer" uses ellmer's
parallel_chat_structured()for native async HTTP parallelism (more efficient, single process). "mirai" uses mirai for multi-process parallelism (requires.llm = NULL).- .progress
Logical whether to show progress bar
- .return_format
Character either "simple" or "structured"
Examples
if (FALSE) { # \dontrun{
# Process data
df <- tibble::tibble(
text = c("I love this!", "This is bad", "Okay product")
)
llm <- ellmer::chat_openai()
results <- signature("text -> sentiment") |>
module(type = "predict") |>
run_dataset(df, .llm = llm)
} # }
