Overview
Chaining lets you build multi-step LLM workflows by passing outputs from one module to the next. dsprrr provides two approaches:
-
%>>%for quick, readable pipelines with automatic field matching -
pipeline()+step()for explicit control over mappings and selections
Both approaches produce a pipeline module you can run with
run() or run_dataset() just like any other
module.
Create Building-Block Modules
Start with small modules that do one thing well:
library(dsprrr)
library(ellmer)
mod_extract <- signature("document -> facts") |>
module(type = "predict", template = "Extract key facts: {document}")
mod_answer <- signature("facts, question -> answer") |>
module(type = "predict", template = "Use facts: {facts}\nQ: {question}")
mod_format <- signature("answer -> response") |>
module(type = "predict", template = "Format: {answer}")Simple Chaining with %>>%
When field names align, outputs connect automatically:
qa_pipeline <- mod_extract %>>% mod_answer %>>% mod_format
llm <- chat_openai()
result <- run(
qa_pipeline,
document = "...",
question = "What happened?",
.llm = llm
)Map Inputs When Names Differ
If a downstream module expects a different input name, map fields explicitly:
mod_retrieve <- signature("query -> documents") |>
module(type = "predict", template = "Retrieve docs for: {query}")
mod_summarize <- signature("context -> summary") |>
module(type = "predict", template = "Summarize: {context}")
rag_pipeline <- mod_retrieve %>>%
map_inputs(mod_summarize, documents = "context")
result <- run(rag_pipeline, query = "dsprrr pipelines", .llm = llm)Inject Static Inputs
Use with_inputs() to add constants that don't come from
upstream outputs:
mod_answer <- signature("facts, question, tone -> answer") |>
module(type = "predict")
qa_pipeline <- mod_extract %>>%
with_inputs(mod_answer, tone = "concise") %>>%
mod_formatExplicit Pipelines with pipeline() and
step()
For more control (or when you prefer not to use
%>>%), build a pipeline explicitly:
Batch Execution with run_dataset()
Pipelines work with data frames the same way as single modules.
run_dataset() adds a result column to your
input data:
Tips
- Prefer small, composable modules. They are easier to debug and optimize.
- Use
trace_summary()andexport_traces()to inspect multi-step behavior. - When column names in your data frame don't match the pipeline
inputs, rename them first or wrap the pipeline with
map_inputs()to align fields.