Tutorial 2: Building a Reusable Classifier
Source:vignettes/tutorial-build-classifier.Rmd
tutorial-build-classifier.RmdIn Tutorial 1, you made
one-off LLM calls with dsp(). But what if you need to
classify hundreds of texts? Creating a new call each time is tedious and
slow.
In this tutorial, you’ll build a reusable module—a classifier you can use over and over.
Time: 20-25 minutes
What You’ll Build
A sentiment classifier that: - Processes single texts or batches - Remembers its configuration - Can be saved and reused
Prerequisites
- Completed Tutorial 1
-
OPENAI_API_KEYset in your environment
Step 1: The Problem with dsp()
In Tutorial 1, you classified sentiment like this:
chat <- chat_openai()
chat |> dsp("text -> sentiment: enum('positive', 'negative', 'neutral')", text = "Great!")
chat |> dsp("text -> sentiment: enum('positive', 'negative', 'neutral')", text = "Awful")
chat |> dsp("text -> sentiment: enum('positive', 'negative', 'neutral')", text = "Meh")This works, but you’re repeating the signature every time. If you want to change the signature, you have to change it everywhere.
Step 2: Create a Reusable Module
The as_module() function wraps a signature into a
reusable object:
chat <- chat_openai()
classifier <- chat |>
as_module("text -> sentiment: enum('positive', 'negative', 'neutral')")
classifierNow classifier is an object you can use repeatedly.
Step 3: Classify Single Texts
Use the $predict() method to classify:
classifier$predict(text = "I absolutely loved this movie!")Try a few more:
classifier$predict(text = "This was a complete waste of time.")
classifier$predict(text = "It was okay, I guess.")
classifier$predict(text = "The service was terrible but the food was amazing.")Step 4: Batch Processing
Here’s where modules shine. Process multiple texts at once by passing a vector:
reviews <- c(
"Best purchase I've ever made!",
"Broke after one day. Total garbage.",
"Does what it says. Nothing special.",
"Exceeded all my expectations!",
"Would not recommend to anyone."
)
classifier$predict(text = reviews)All five classifications came back in a single call. Much more efficient than five separate calls.
Step 5: The Full Control Approach
as_module() is convenient, but sometimes you need more
control. The signature() + module() approach
gives you that:
# Define the signature separately
sig <- signature(
"text -> sentiment: enum('positive', 'negative', 'neutral')",
instructions = "Classify the overall sentiment. If mixed, choose the dominant emotion."
)
sigNow create a module from the signature:
classifier2 <- module(sig, type = "predict")
classifier2Step 6: Running with run()
With the full control approach, use run() to
execute:
run(classifier2, text = "This is fantastic!", .llm = chat)Notice you pass the chat object via .llm. This gives you
flexibility—you can use different LLMs for different calls.
Batch processing works the same way:
Step 7: Working with Data Frames
Real data often comes in data frames. Use
run_dataset():
library(tibble)
reviews_df <- tibble(
id = 1:4,
text = c(
"Absolutely wonderful experience!",
"Never buying from them again.",
"Solid product, fair price.",
"Changed my life for the better."
)
)
results <- run_dataset(classifier2, reviews_df, .llm = chat)
resultsThe results include your original columns plus the classification.
Step 8: Adding Descriptions
Make your inputs more informative with descriptions:
sig <- signature(
inputs = list(
input("review_text", description = "Customer review to classify")
),
output_type = type_enum(values = c("positive", "negative", "neutral")),
instructions = "Classify the customer sentiment."
)
detailed_classifier <- module(sig, type = "predict")
run(
detailed_classifier,
review_text = "Five stars! Would buy again!",
.llm = chat
)Descriptions help the LLM understand what it’s working with.
Step 9: Checking Your Work
Modules track their calls. See what happened:
classifier2$trace_summary()This shows you how many calls were made and the token costs.
What You Learned
In this tutorial, you:
- Created reusable modules with
as_module() - Used
$predict()for single and batch processing - Built modules with full control using
signature()+module() - Processed data frames with
run_dataset() - Added input descriptions for clarity
- Checked your work with
trace_summary()
When to Use Each Approach
| Approach | Best For |
|---|---|
dsp() |
Quick one-off calls, exploration |
as_module() |
Simple reusable modules, prototyping |
signature() + module()
|
Production code, optimization workflows |
The Module Advantage
Why bother with modules when dsp() works?
- Reusability: Define once, use everywhere
- Efficiency: Batch processing reduces API calls
- Configuration: Change settings in one place
- Optimization: Modules can be improved with training data (covered in Tutorial 4)
- Tracing: Track what happened for debugging
Next Steps
Your classifier works, but can it handle more complex outputs? Continue to:
- Tutorial 3: Extracting Structured Data — Get multiple fields and nested structures
- Quick Reference — Module types and methods
- Understanding Signatures & Modules — Why S7 for signatures, R6 for modules