Skip to contents

Programming—not prompting—LLMs in R

dsprrr brings the power of DSPy to R. Instead of wrestling with prompt strings, declare what you want, compose modules into pipelines, and let optimization find the best prompts automatically.

# Install
pak::pak("JamesHWade/dsprrr")

# That's it. Start using LLMs.
library(dsprrr)
dsp("question -> answer", question = "What is the capital of France?")
#> "Paris"

Compose programs with reusable primitives

Every dsprrr program is built from the same three pieces. Learn these and the rest of the package falls into place.

Signatures

Declare your task. Define typed inputs and outputs instead of wrestling with prompt strings. Portable, maintainable, and easy to iterate on.

Learn about signatures →

# Route a support ticket
sig <- signature(
  "ticket -> urgency: enum('low', 'high'), team: string"
)

Modules

Same interface, different strategy. Modules control how a signature executes—reason step by step, use tools, or run ensembles—without rewriting the task.

Explore modules →

sig <- signature(
  "ticket -> urgency: enum('low', 'high'), team: string"
)

# Direct completion
classify <- module(sig, type = "predict")

# Add step-by-step reasoning
classify <- module(sig, type = "chain_of_thought")

# Add a tool-use loop
lookup_tool <- ellmer::tool(
  function(query) paste("Found:", query),
  description = "Look up support policy details",
  arguments = list(query = ellmer::type_string())
)
classify <- module(sig, type = "react", tools = list(lookup_tool))

Optimizers

Compile your program against a metric. Give dsprrr examples and a scoring function; it tunes prompts and demos automatically until quality converges.

Try optimizers →

route_sig <- signature("ticket -> urgency: enum('low', 'high')")
router <- module(route_sig, type = "predict")
trainset <- dsp_trainset(
  ticket  = c("Package lost", "Need a receipt"),
  urgency = c("high", "low")
)

tp <- GEPA(metric = metric_exact_match(field = "urgency"))
optimized <- compile(tp, router, trainset)

board <- pins::board_temp()
pin_module_config(board, "ticket-router-v2", optimized)

Getting Started: Configure Your LLM

library(dsprrr)
library(ellmer)

chat <- chat_openai(model = "gpt-4o-mini")
chat |> dsp("question -> answer", question = "What is 2+2?")
#> "4"
library(dsprrr)
library(ellmer)

chat <- chat_claude(model = "claude-sonnet-4-20250514")
chat |> dsp("question -> answer", question = "What is 2+2?")
#> "4"
library(dsprrr)
library(ellmer)

chat <- chat_google_gemini(model = "gemini-2.0-flash")
chat |> dsp("question -> answer", question = "What is 2+2?")
#> "4"
library(dsprrr)
library(ellmer)

chat <- chat_ollama(model = "llama3.2")
chat |> dsp("question -> answer", question = "What is 2+2?")
#> "4"
# dsprrr auto-detects from environment variables
library(dsprrr)

# Uses OPENAI_API_KEY, ANTHROPIC_API_KEY, or GOOGLE_API_KEY
dsp("question -> answer", question = "What is 2+2?")
#> "4"

Define a task. Grow it into a system.

Start with a single signature and grow it into a multi-step program—the same building blocks scale from a one-line extractor to a full pipeline.

Automatic Optimization

dsprrr can automatically optimize your prompts using your data.

# Add examples automatically
trainset <- dsp_trainset(
  text = c("Great product!", "Awful experience", "It works"),
  sentiment = c("positive", "negative", "neutral")
)

optimized <- compile(
  LabeledFewShot(k = 3),
  classifier,
  trainset
)

# Now includes 3 examples in every prompt
optimized$predict(text = "Amazing service!")
#> "positive"

Result: Few-shot examples improve accuracy on edge cases.

# Search over configurations
classifier$optimize_grid(
  devset = validation_data,
  metric = metric_exact_match(),
  parameters = list(
    temperature = c(0.1, 0.5, 1.0),
    prompt_style = c("concise", "detailed")
  )
)

# View results
module_trials(classifier)
#> # A tibble: 6 × 4
#>   temperature prompt_style score    n
#>         <dbl> <chr>        <dbl> <int>
#> 1         0.1 concise      0.92    100
#> 2         0.1 detailed     0.88    100
#> ...

Result: Find the best configuration for your task.

# Rigorous evaluation with metrics
results <- evaluate(
  classifier,
  test_data,
  metric = metric_exact_match()
)

results$mean_score
#> 0.94

# Integrate with vitals for advanced evaluation
library(vitals)
solver <- as_vitals_solver(classifier)

Result: Measure and track performance systematically.

Why dsprrr?

Declarative

Define what you want, not how to prompt. Signatures like “text -> sentiment” describe your task clearly.

Composable

Build complex pipelines from simple modules. Each module is testable, optimizable, and reusable.

Optimizable

Automatically improve prompts with your data. Few-shot learning, grid search, and advanced teleprompters.

Integrated

Built on ellmer for LLM access and vitals for evaluation. Works with tidyverse.

Observable

Every LLM call is traced. Inspect prompts, debug failures, track costs.

Production-Ready

Persistence with pins, orchestration with targets, deployment with vetiver.

Learn More

Tutorials

Reference

Ecosystem

dsprrr integrates with much of Posit’s LLM ecosystem:

Package Purpose
ellmer Chat with LLMs from R
vitals LLM evaluation framework
shinychat Chat UIs for Shiny

Inspired by DSPy from Stanford NLP.