Introduction
Building LLM applications in development is one thing; running them reliably in production is another. This vignette covers dsprrr’s orchestration features that help you:
- Persist module configurations and traces across sessions
- Orchestrate complex pipelines with targets
- Report on experiments with Quarto
- Validate workflows before expensive LLM operations
These features integrate with the broader R ecosystem:
- pins for versioned artifact storage
- targets for reproducible pipelines
- Quarto for rich reporting
The Challenge of Production LLM Workflows
LLM applications present unique challenges for production deployment:
- Configuration Drift: Optimized prompts and parameters can be lost between sessions
- Cost Tracking: LLM API calls are expensive and need monitoring
- Reproducibility: Results should be reproducible for debugging and auditing
- Collaboration: Team members need to share optimized modules
dsprrr’s orchestration helpers address these challenges by providing a structured approach to persisting and sharing LLM workflow artifacts.
Persisting Module Configurations with pins
The pins package provides a simple way to store and version R objects. dsprrr integrates with pins to save module configurations, making it easy to:
- Share optimized modules across projects
- Version control your LLM configurations
- Restore modules in new sessions without re-optimization
Setting Up a Pins Board
First, create a pins board. You can use local storage, cloud providers (S3, Azure, GCS), or Posit Connect:
library(pins)
# Local board for development
board <- board_folder("pins", versioned = TRUE)
# Cloud boards for production
# board <- board_s3("my-bucket/dsprrr-pins")
# board <- board_connect() # Posit ConnectPinning a Module Configuration
After optimizing a module, save its configuration:
library(dsprrr)
library(ellmer)
# Create and optimize a module
mod <- signature("text -> sentiment: enum('positive', 'negative', 'neutral')") |>
module(type = "predict")
# Run optimization (in practice, with real data)
# optimize_grid(mod, devset = train_data, metric = exact_match, .llm = llm)
# Pin the configuration
pin_module_config(
board = board,
name = "sentiment-classifier-v1",
module = mod,
description = "Production sentiment classifier, optimized on customer feedback"
)The pinned configuration includes:
- Signature: Input/output specifications
- Configuration: Temperature, prompt style, template
- Optimization state: Best parameters, trial history
- Metadata: Package version, timestamps
Restoring a Module
In a new session or different project, restore the module:
# Read the pinned configuration
config <- pin_read(board, "sentiment-classifier-v1")
# Restore the module
mod <- restore_module_config(config)
# Use immediately - no re-optimization needed!
result <- run(mod, text = "Great product!", .llm = llm)Versioning and Rollback
Pins automatically versions your configurations:
# List versions
pin_versions(board, "sentiment-classifier-v1")
# Read a specific version
old_config <- pin_read(board, "sentiment-classifier-v1", version = "20240115T120000Z")
old_mod <- restore_module_config(old_config)Saving Execution Traces
Traces capture detailed information about LLM calls: timing, token usage, prompts, and outputs. Pinning traces enables:
- Cost analysis across experiments
- Performance debugging
- Audit trails for compliance
Analyzing Traces
Load pinned traces for analysis:
library(dplyr)
library(ggplot2)
# Load trace data
trace_data <- pin_read(board, "experiment-2024-01-traces")
# Access the traces tibble
traces_df <- trace_data$traces
# Analyze token usage
traces_df |>
summarize(
total_tokens = sum(total_tokens),
avg_latency = mean(latency_ms),
total_cost = sum(cost, na.rm = TRUE)
)
# Plot latency over time
ggplot(traces_df, aes(x = timestamp, y = latency_ms)) +
geom_line() +
geom_smooth(method = "loess") +
labs(title = "Request Latency Over Time", y = "Latency (ms)")Saving Evaluation Results
Evaluation results from evaluate() or vitals Tasks can
be pinned for tracking model performance over time:
# Run evaluation
eval_result <- evaluate(
mod,
dataset = test_data,
metric = metric_exact_match(),
.llm = llm
)
# Pin the results
pin_vitals_log(
board = board,
name = "sentiment-eval-2024-01",
eval_result = eval_result,
module = mod,
description = "Monthly evaluation on customer feedback test set"
)Tracking Performance Over Time
Compare evaluations across time:
# Load multiple evaluation results
eval_jan <- pin_read(board, "sentiment-eval-2024-01")
eval_feb <- pin_read(board, "sentiment-eval-2024-02")
# Compare scores
tibble(
month = c("January", "February"),
accuracy = c(eval_jan$mean_score, eval_feb$mean_score),
n_samples = c(eval_jan$n_evaluated, eval_feb$n_evaluated)
)Orchestrating Pipelines with targets
For complex workflows, the targets package provides a powerful framework for:
- Dependency tracking and caching
- Parallel execution
- Reproducible pipelines
Using the targets Template
dsprrr provides a ready-to-use targets template:
# Copy the template to your project
use_dsprrr_template("targets")
# This creates _targets.R with a complete pipelineAnatomy of the Pipeline
The template includes these stages:
# 1. Data Preparation
tar_target(train_data, load_training_data())
tar_target(test_data, load_test_data())
# 2. Module Definition
tar_target(module_definition, {
signature("text -> sentiment") |>
module(type = "predict")
})
# 3. Optimization
tar_target(optimized_module, {
mod <- module_definition$clone(deep = TRUE)
optimize_grid(mod, devset = train_data, ...)
mod
})
# 4. Evaluation
tar_target(evaluation_results, {
evaluate(optimized_module, dataset = test_data, ...)
})
# 5. Persistence
tar_target(pinned_config, {
pin_module_config(board, "model", optimized_module)
})Running the Pipeline
Execute your pipeline with targets:
library(targets)
# Run the full pipeline
tar_make()
# Visualize dependencies
tar_visnetwork()
# Read specific targets
eval_results <- tar_read(evaluation_results)Incremental Updates
targets caches results and only reruns what’s changed:
# Modify only the test data
# targets will skip optimization and only rerun evaluation
tar_outdated() # See what will rerun
tar_make() # Only evaluation runsGenerating Reports with Quarto
Quarto documents provide rich, reproducible reports. dsprrr’s template generates professional experiment reports:
# Copy the Quarto template
use_dsprrr_template("quarto")
# This creates report.qmdCustomizing the Report
The template reads from your pins board and generates:
- Module configuration summary
- Evaluation metrics and visualizations
- Trace analysis (latency, tokens, costs)
- Reproducibility information
Configure the report parameters in the YAML header:
Rendering the Report
Render your report:
# From R
quarto::quarto_render("report.qmd")
# From terminal
# quarto render report.qmdValidating Workflows
Before running expensive LLM operations, validate your workflow:
# Check that everything is configured correctly
validate_workflow(
module = mod,
dataset = test_data,
board = board
)
# Output:
# -- Workflow Validation --------------------------------
# v module: Module type: PredictModule
# v signature: 1 input(s) defined
# v dataset: 100 rows, 1 required columns present
# v board: Board type: pins_board_folder
# v Workflow validation passedThis catches common issues:
- Missing required columns in datasets
- Invalid module configurations
- Inaccessible pins boards
Complete Production Workflow
Here’s a complete example bringing everything together:
library(dsprrr)
library(ellmer)
library(pins)
library(targets)
# ---- Setup ----
board <- board_folder("pins", versioned = TRUE)
llm <- chat_claude()
# ---- Define Module ----
mod <- signature(
"feedback -> sentiment: enum('positive', 'negative', 'neutral'), issues: array(string)",
instructions = "Analyze customer feedback. Identify sentiment and extract specific issues mentioned."
) |>
module(type = "predict")
# ---- Optimize ----
train_data <- read_csv("data/train.csv")
optimize_grid(
mod,
devset = train_data,
metric = metric_exact_match(field = "sentiment"),
parameters = list(temperature = c(0, 0.3, 0.7)),
.llm = llm
)
# ---- Validate ----
test_data <- read_csv("data/test.csv")
validate_workflow(mod, dataset = test_data, board = board)
# ---- Evaluate ----
eval_result <- evaluate(
mod,
dataset = test_data,
metric = metric_exact_match(field = "sentiment"),
.llm = llm
)
# ---- Persist ----
pin_module_config(board, "feedback-analyzer-v2", mod)
pin_trace(board, "feedback-eval-traces", mod, include_prompts = TRUE)
pin_vitals_log(board, "feedback-eval-v2", eval_result, module = mod)
# ---- Report ----
use_dsprrr_template("quarto")
quarto::quarto_render("report.qmd")Best Practices
1. Version Your Configurations
Always use versioned pins boards:
board <- board_folder("pins", versioned = TRUE)This enables rollback and audit trails.
2. Include Descriptive Metadata
Add descriptions to your pins:
pin_module_config(
board, "classifier-v1", mod,
description = "Trained on Q1 2024 data, optimized for precision"
)3. Separate Development and Production
Use different boards for different environments:
dev_board <- board_folder("pins-dev")
prod_board <- board_s3("prod-bucket/dsprrr-pins")4. Validate Before Running
Always validate workflows, especially in production:
validation <- validate_workflow(mod, dataset, board)
if (!validation$valid) {
stop("Workflow validation failed")
}5. Track Costs
Monitor token usage and costs via traces:
traces <- pin_read(board, "latest-traces")
total_cost <- sum(traces$traces$cost, na.rm = TRUE)
cli::cli_alert_info("Total API cost: ${total_cost}")Integration with vitals
dsprrr’s orchestration integrates seamlessly with the vitals package
for rigorous LLM evaluation. See
vignette("vitals-integration") for details on:
- Converting dsprrr modules to vitals solvers
- Using vitals scorers in dsprrr metrics
- Combining vitals Tasks with dsprrr optimization
Next Steps
-
Getting Started:
vignette("getting-started")for dsprrr basics -
Optimization:
vignette("compilation-optimization")for tuning modules -
Vitals:
vignette("vitals-integration")for evaluation workflows
Summary
dsprrr’s orchestration features enable production-ready LLM workflows:
| Feature | Purpose | Package |
|---|---|---|
pin_module_config() |
Save/share optimized modules | pins |
pin_trace() |
Persist execution traces | pins |
pin_vitals_log() |
Store evaluation results | pins |
use_dsprrr_template("targets") |
Pipeline orchestration | targets |
use_dsprrr_template("quarto") |
Experiment reporting | Quarto |
validate_workflow() |
Pre-flight checks | dsprrr |
These tools integrate dsprrr into the broader R ecosystem, making it easy to build reliable, reproducible, and collaborative LLM applications.
