dsprrr is an R implementation of DSPy’s programming model, built on ellmer and tidyverse conventions. If you know DSPy, this page tells you what carries over, what is different, and what is not (yet) available. It reflects DSPy 3.x as of mid-2026.
dsprrr is not a line-by-line port. It follows DSPy’s concepts — signatures, modules, metrics, and optimizers (“teleprompters”) — while embracing R idioms: tibbles in and out, S7/R6 objects, and ellmer for all provider communication.
Modules
| DSPy | dsprrr | Notes |
|---|---|---|
dspy.Predict |
module(sig, type = "predict") |
Core predictor |
dspy.ChainOfThought |
chain_of_thought(), with_reasoning()
|
Implemented as signature transforms |
dspy.ReAct |
module(sig, type = "react") |
Tool-calling agent loop via ellmer tools |
dspy.ProgramOfThought |
program_of_thought() |
Generates and executes R code (not Python) |
dspy.CodeAct |
code_act() |
Hybrid tools + R code execution; requires an explicit, opt-in
r_code_runner()
|
dspy.BestOfN |
best_of_n() |
Reward-function-guided retries |
dspy.Refine |
refine() |
Retries with LLM-generated feedback |
dspy.MultiChainComparison |
multi_chain_comparison() |
|
dspy.RLM |
rlm_module() |
Recursive language models over an R REPL |
dspy.Parallel / Module.batch
|
run_dataset(),
run(..., .parallel = TRUE)
|
Batch over a data frame; heterogeneous (module, example) fan-out is not yet a dedicated module |
dspy.majority |
ensemble() with reduce_majority()
|
Plus reduce_weighted_vote(),
reduce_best_by_metric()
|
dspy.KNN |
KNNFewShot teleprompter / KNN module |
Bring-your-own vectorizer (e.g.,
ragnar::embed_openai()) |
| Retrieval (custom functions) |
rag_module() + ragnar |
First-class ragnar retriever integration |
One notable difference: DSPy 3.0 removed
dspy.Assert/dspy.Suggest in favor of
BestOfN/Refine. dsprrr keeps both styles:
declarative assertions with retry/backtracking
(with_assertions(), assert_output(),
suggest_output()) and the
best_of_n()/refine() wrappers. If you prefer
the modern DSPy style, use the wrappers; use assertions when you want
declarative output contracts with automatic feedback injection.
Optimizers (teleprompters)
| DSPy | dsprrr | Fidelity notes |
|---|---|---|
LabeledFewShot |
LabeledFewShot |
Equivalent |
BootstrapFewShot |
BootstrapFewShot |
Equivalent; compiles pipelines jointly (demos for every step harvested from end-to-end traces) |
BootstrapFewShotWithRandomSearch |
BootstrapFewShotWithRandomSearch |
Equivalent |
MIPROv2 |
MIPROv2 |
Discrete Bayesian optimization with UCB over instruction + demo candidates |
SIMBA |
SIMBA |
Adapted: hard-example mining + LLM-generated rules; simplified vs. the full introspective algorithm |
GEPA |
GEPA |
Adapted (“GEPA-lite”): reflective mutation + Pareto selection;
supports feedback metrics via metric_with_feedback(); no
per-component selection or inference-time search yet |
COPRO |
COPRO |
Equivalent (coordinate ascent over instructions) |
KNNFewShot |
KNNFewShot |
Equivalent |
Ensemble |
Ensemble |
Equivalent |
BetterTogether |
BetterTogether |
Chains prompt optimizers via strategy strings; does not alternate prompt/weight optimization (no finetuning backend) |
BootstrapFinetune |
— | Not implemented (planned); dsprrr currently optimizes prompts, not weights |
GRPO (RL via Arbor) |
— | Not implemented |
BootstrapFewShotWithOptuna,
AvatarOptimizer, InferRules
|
— | Niche/legacy in DSPy; not planned |
| — |
GridSearchTeleprompter,
optimize_grid()
|
dsprrr addition: tidymodels-style grid search over module parameters |
GEPA feedback metrics
DSPy’s GEPA expects metrics that return a score and textual feedback. dsprrr supports the same protocol:
metric <- metric_with_feedback(
function(prediction, expected) {
if (identical(prediction$answer, expected)) {
list(score = 1, feedback = "Correct.")
} else {
list(
score = 0,
feedback = paste("Wrong: expected", expected, "- check the arithmetic.")
)
}
},
field = "answer"
)
tp <- GEPA(metric = metric, generations = 5L)
compiled <- compile(tp, mod, trainset, .llm = llm)The feedback for failed examples is injected into GEPA’s reflection prompt, so the reflection LLM learns why outputs failed, not just that they did.
Signatures and types
| DSPy | dsprrr |
|---|---|
"question -> answer: int" string signatures |
signature("question -> answer: integer") |
Class-based signatures with
InputField/OutputField
|
signature(inputs = list(input(...)), output_type = ...) |
| Pydantic-typed outputs | ellmer type objects (type_string(),
type_enum(), type_object(),
type_array()) |
dspy.Image, dspy.Audio,
dspy.File
|
ellmer Content objects (images, PDFs) passed as
inputs |
dspy.History |
Implicit via ellmer Chat$get_turns(); not a signature
type |
dspy.Tool, dspy.ToolCalls
|
ellmer ToolDef via as_ellmer_tool() /
register_dsprrr_tool()
|
dspy.Reasoning (native reasoning traces) |
Not yet first-class; with_reasoning() adds a prompted
reasoning field |
Programs and composition
DSPy composes programs as Python classes with multiple predictors. dsprrr composes pipelines:
program <- mod_retrieve %>>%
map_inputs(mod_answer, documents = "context") %>>%
mod_formatBootstrapFewShot compiles pipelines
jointly, like DSPy: the teacher pipeline runs
end-to-end, final outputs are scored, and each step harvests
demonstrations from passing traces. Other teleprompters currently
optimize a pipeline’s steps individually (instruction-level optimizers
operate on single modules).
Infrastructure
| Capability | DSPy | dsprrr |
|---|---|---|
| LM client |
dspy.LM (LiteLLM; decoupling in 3.2+) |
ellmer Chat (100+ providers via ellmer) |
| Configuration |
dspy.configure() / dspy.context()
|
dsp_configure(), with_lm(),
local_lm()
|
| Caching | Two-tier memory + disk | Two-tier memory + disk (configure_cache()) |
| Async |
acall/aforward, asyncify
|
run_async() with promises |
| Streaming |
streamify() + StreamListener
|
run_stream() + stream_listener(); token
streaming for single string fields, status events per pipeline step |
| Usage tracking | track_usage |
get_tokens(), get_cost(),
session_cost()
|
| Parallel evaluation | Evaluate(num_threads = ...) |
evaluate(.parallel = TRUE) via mirai or ellmer’s native
parallelism |
| Saving programs |
save/load, whole-program
serialization |
pin_module_config() /
restore_module_config() (pins-based) |
| Observability | MLflow autolog, OpenTelemetry callbacks | Traces tibble, inspect_history(),
export_traces(); MLflow integration planned |
| Adapters (Chat/JSON/XML/TwoStep/BAML) | Yes | No adapter layer; ellmer’s chat_structured() handles
structured output |
| Evaluation framework | dspy.Evaluate |
evaluate(), eval_program(), plus
vitals integration |
What dsprrr has that DSPy doesn’t
-
tidymodels integration: use modules as parsnip
engines, tune with dials parameters (
temperature,top_p,reasoning_effort). -
vitals integration: bridge modules and metrics to
the vitals evaluation framework (
as_vitals_solver(),as_dsprrr_metric()). -
ragnar integration: production RAG with
rag_module()andragnar_tool(). - Assertions with backtracking: kept and maintained (removed in DSPy 3.0).
-
Grid search compilation:
optimize_grid()for explicit, tidymodels-style parameter sweeps.
Known gaps (roadmap)
In rough priority order:
-
Weight optimization:
BootstrapFinetune/ RL-based optimizers. -
Native reasoning-trace capture as a typed output
(analogous to
dspy.Reasoning). - Joint multi-step optimization for instruction optimizers (MIPROv2, GEPA per-component selection); demo bootstrapping is already joint.
-
Adapter-style fallbacks for models with weak
structured-output support (analogous to
TwoStepAdapter). - MLflow / OpenTelemetry observability.
If one of these blocks your use case, please open an issue.