dsprrr vs. DSPy: Feature Comparison • dsprrr

dsprrr is an R implementation of DSPy’s programming model, built on ellmer and tidyverse conventions. If you know DSPy, this page tells you what carries over, what is different, and what is not (yet) available.

Version baseline

This comparison was checked against DSPy 3.2.1 (the latest stable release on 2026-07-09) and 3.3.0b1 (beta). The beta is especially useful as a design signal: it introduces ReActV2, a typed provider-neutral LM boundary, normalized LM error classes, and sanitized state serialization. Those beta APIs may still change, so dsprrr follows the durable contracts rather than copying unstable Python interfaces. See the official DSPy releases.

dsprrr is not a line-by-line port. It follows DSPy’s concepts — signatures, modules, metrics, and optimizers (“teleprompters”) — while embracing R idioms: tibbles in and out, S7/R6 objects, and ellmer for all provider communication.

Modules

DSPy	dsprrr	Notes
`dspy.Predict`	`module(sig, type = "predict")`	Core predictor
`dspy.ChainOfThought`	`chain_of_thought()`, `with_reasoning()`	Implemented as signature transforms
`dspy.ReAct` / experimental `ReActV2`	`module(sig, type = "react")`	Native ellmer turn history, tool-call IDs, parallel calls per assistant turn, enforced iteration limit, then structured finalization
`dspy.ProgramOfThought`	`program_of_thought()`	Generates and executes R code (not Python)
`dspy.CodeAct`	`code_act()`	Hybrid tools + R code execution; the built-in runner is trusted-input-only, and sandboxed backends can implement the runner protocol
`dspy.BestOfN`	`best_of_n()`	Reward-function-guided retries
`dspy.Refine`	`refine()`	Retries with LLM-generated feedback
`dspy.MultiChainComparison`	`multi_chain_comparison()`
`dspy.RLM`	`rlm_module()`	Recursive language models over an R REPL
`dspy.Parallel` / `Module.batch`	`run_dataset()`, `run(..., .parallel = TRUE)`	Batch over a data frame; heterogeneous (module, example) fan-out is not yet a dedicated module
`dspy.majority`	`ensemble()` with `reduce_majority()`	Plus `reduce_weighted_vote()`, `reduce_best_by_metric()`
`dspy.KNN`	`KNNFewShot` teleprompter / KNN module	Bring-your-own vectorizer (e.g., `ragnar::embed_openai()`)
Retrieval (custom functions)	`rag_module()` + ragnar	First-class ragnar retriever integration

One notable difference: DSPy 3.0 removed dspy.Assert/dspy.Suggest in favor of BestOfN/Refine. dsprrr keeps both styles: declarative assertions with retry/backtracking (with_assertions(), assert_output(), suggest_output()) and the best_of_n()/refine() wrappers. If you prefer the modern DSPy style, use the wrappers; use assertions when you want declarative output contracts with automatic feedback injection.

Optimizers (teleprompters)

DSPy	dsprrr	Fidelity notes
`LabeledFewShot`	`LabeledFewShot`	Equivalent
`BootstrapFewShot`	`BootstrapFewShot`	Equivalent; compiles pipelines jointly (demos for every step harvested from end-to-end traces)
`BootstrapFewShotWithRandomSearch`	`BootstrapFewShotWithRandomSearch`	Equivalent
`MIPROv2`	`MIPROv2`	Discrete Bayesian optimization with UCB over instruction + demo candidates
`SIMBA`	`SIMBA`	Adapted: hard-example mining + LLM-generated rules; simplified vs. the full introspective algorithm
`GEPA`	`GEPA`	Adapted (“GEPA-lite”): reflective mutation + Pareto selection; supports feedback metrics via `metric_with_feedback()`; no per-component selection or inference-time search yet
`COPRO`	`COPRO`	Equivalent (coordinate ascent over instructions)
`KNNFewShot`	`KNNFewShot`	Equivalent
`Ensemble`	`Ensemble`	Equivalent
`BetterTogether`	`BetterTogether`	Chains prompt optimizers via strategy strings; does not alternate prompt/weight optimization (no finetuning backend)
`BootstrapFinetune`	—	Not implemented (planned); dsprrr currently optimizes prompts, not weights
`GRPO` (RL via Arbor)	—	Not implemented
`BootstrapFewShotWithOptuna`, `AvatarOptimizer`, `InferRules`	—	Niche/legacy in DSPy; not planned
—	`GridSearchTeleprompter`, `optimize_grid()`	dsprrr addition: tidymodels-style grid search over module parameters
—	`Omni`	dsprrr addition: independent best-of exploration plus a fresh continuation optimizer, with common validation scoring and optional mirai concurrency
—	`AutoResearch`	dsprrr addition: persistent research-agent loop over validated, jointly editable module snapshots with sandboxed R analysis
—	`MetaHarness`	dsprrr addition: fresh batch proposers plus host-owned frontier selection, lineage, budgets, and checkpoint resume

GEPA feedback metrics

DSPy’s GEPA expects metrics that return a score and textual feedback. dsprrr supports the same protocol:

metric <- metric_with_feedback(
  function(prediction, expected) {
    if (identical(prediction$answer, expected)) {
      list(score = 1, feedback = "Correct.")
    } else {
      list(
        score = 0,
        feedback = paste("Wrong: expected", expected, "- check the arithmetic.")
      )
    }
  },
  field = "answer"
)

tp <- GEPA(metric = metric, generations = 5L)
compiled <- compile(tp, mod, trainset, .llm = llm)

The feedback for failed examples is injected into GEPA’s reflection prompt, so the reflection LLM learns why outputs failed, not just that they did.

Signatures and types

DSPy	dsprrr
`"question -> answer: int"` string signatures	`signature("question -> answer: integer")`
Class-based signatures with `InputField`/`OutputField`	`signature(inputs = list(input(...)), output_type = ...)`
Pydantic-typed outputs	ellmer type objects (`type_string()`, `type_enum()`, `type_object()`, `type_array()`)
`dspy.Image`, `dspy.Audio`, `dspy.File`	ellmer `Content` objects (images, PDFs) passed as inputs
`dspy.History`	Native ellmer turns preserved in ReAct metadata and traces; not a signature type
`dspy.Tool`, `dspy.ToolCalls`, `ToolCallResults`	ellmer `ToolDef`, `ContentToolRequest`, and `ContentToolResult`; IDs remain attached to native turns
`dspy.Reasoning` (native reasoning traces)	Not yet first-class; `with_reasoning()` adds a prompted reasoning field

Programs and composition

DSPy composes programs as Python classes with multiple predictors. dsprrr composes pipelines:

program <- mod_retrieve %>>%
  map_inputs(mod_answer, documents = "context") %>>%
  mod_format

BootstrapFewShot compiles pipelines jointly, like DSPy: the teacher pipeline runs end-to-end, final outputs are scored, and each step harvests demonstrations from passing traces. Other teleprompters currently optimize a pipeline’s steps individually (instruction-level optimizers operate on single modules).

Infrastructure

Capability	DSPy	dsprrr
LM client	`dspy.LM`; experimental typed `LMRequest -> LMResponse` boundary in 3.3	ellmer `Chat`; `build_module_request()` normalizes prompt/content input, but a complete package-wide invocation record is still planned
Configuration	`dspy.configure()` / `dspy.context()`	`dsp_configure()`, `with_lm()`, `local_lm()`
Caching	Two-tier memory + disk	Two-tier memory + disk (`configure_cache()`)
Async	`acall`/`aforward`, `asyncify`	`run_async()` with promises
Streaming	`streamify()` + `StreamListener`	`run_stream()` + `stream_listener()`; token streaming for single string fields, status events per pipeline step
Usage tracking	`track_usage`	`get_tokens()`, `get_cost()`, `session_cost()`
Parallel evaluation	`Evaluate(num_threads = ...)`	`evaluate(.parallel = TRUE)` via mirai or ellmer’s native parallelism
Saving programs	`save`/`load`; sanitized LM state and explicit unsafe-class opt-in in 3.3 beta	Versioned whole-program artifacts via `save_program()` / `load_program()` or pins, with registry-backed runtime IDs and explicit trusted opt-in
Observability	MLflow autolog, OpenTelemetry callbacks	Traces tibble, `inspect_history()`, `export_traces()`; package-level OpenTelemetry spans are planned on top of ellmer
Adapters (Chat/JSON/XML/TwoStep/BAML)	Yes	No adapter layer; ellmer’s `chat_structured()` handles structured output
Evaluation framework	`dspy.Evaluate`	`evaluate()`, `eval_program()`, plus vitals integration

What dsprrr has that DSPy doesn’t

tidymodels integration: use modules as parsnip engines, tune with dials parameters (temperature, top_p, reasoning_effort).
vitals integration: bridge modules and metrics to the vitals evaluation framework (as_vitals_solver(), as_dsprrr_metric()).
ragnar integration: production RAG with rag_module() and ragnar_tool().
Assertions with backtracking: kept and maintained (removed in DSPy 3.0).
Grid search compilation: optimize_grid() for explicit, tidymodels-style parameter sweeps.

Known gaps (roadmap)

In rough priority order, based on the stable DSPy 3.2 runtime and the 3.3 beta direction:

Versioned, safe whole-program state: nested pipelines without secrets, with explicit opt-in before restoring trusted custom code or classes.
One provider-neutral invocation/result contract carrying native turns, usage, cost, cache state, timing, and normalized errors across every module.
Package-level OpenTelemetry spans for module, optimizer, evaluation, cache, and tool activity, composed with ellmer’s provider telemetry.
MCP tools through ellmer’s tool abstraction, without a second transport or competing tool schema.
Native reasoning-trace capture as a typed output (analogous to dspy.Reasoning).
Joint multi-step optimization for instruction optimizers (MIPROv2, GEPA per-component selection); demo bootstrapping is already joint.
Adapter-style fallbacks for models with weak structured-output support (analogous to TwoStepAdapter).
Weight and RL optimization, after provider-neutral training data, reproducibility, cost accounting, and artifact contracts are stable.

If one of these blocks your use case, please open an issue.