Overview
This vignette describes measure’s internal class system. While most users won’t need to interact with these internals directly, understanding them is useful if you’re:
- Debugging unexpected behavior
- Contributing to measure
- Building extensions that work with measure data
Motivation
Early versions of measure relied on a column named
.measures to store spectral data. This worked but had
limitations:
- Name clashes if users had their own
.measurescolumn - No way to have multiple measure columns
- Detection relied on column names, not types
Following Issue #16,
measure now uses custom S3 classes. This enables robust detection via
inherits() and supports multiple measure columns per
dataset (see Multiple Measure
Columns below).
The two classes
measure uses a two-level class hierarchy:
measure_tbl
A single measurement - a tibble with location and
value columns:
# After preprocessing, each row's .measures element is a measure_tbl
rec <- recipe(water + fat + protein ~ ., data = meats_long) |>
update_role(id, new_role = "id") |>
step_measure_input_long(transmittance, location = vars(channel)) |>
prep()
result <- bake(rec, new_data = NULL)
# Extract one measurement
one_measurement <- result$.measures[[1]]
one_measurement
#> <measure_tbl [100 x 2]>
#> # A tibble: 100 × 2
#> location value
#> <int> <dbl>
#> 1 1 2.62
#> 2 2 2.62
#> 3 3 2.62
#> 4 4 2.62
#> 5 5 2.62
#> 6 6 2.62
#> 7 7 2.62
#> 8 8 2.62
#> 9 9 2.63
#> 10 10 2.63
#> # ℹ 90 more rows
# Check the class
class(one_measurement)
#> [1] "measure_tbl" "tbl_df" "tbl" "data.frame"
is_measure_tbl(one_measurement)
#> [1] TRUE
measure_list
A list column containing multiple measure_tbl objects -
one per row in your data:
# The .measures column itself is a measure_list
class(result$.measures)
#> [1] "measure_list" "vctrs_list_of" "vctrs_vctr" "list"
is_measure_list(result$.measures)
#> [1] TRUE
# Nice printing in tibbles
result
#> # A tibble: 215 × 5
#> id water fat protein .measures
#> <int> <dbl> <dbl> <dbl> <meas>
#> 1 1 60.5 22.5 16.7 [100 × 2]
#> 2 2 46 40.1 13.5 [100 × 2]
#> 3 3 71 8.4 20.5 [100 × 2]
#> 4 4 72.8 5.9 20.7 [100 × 2]
#> 5 5 58.3 25.5 15.5 [100 × 2]
#> 6 6 44 42.7 13.7 [100 × 2]
#> 7 7 44 42.7 13.7 [100 × 2]
#> 8 8 69.3 10.6 19.3 [100 × 2]
#> 9 9 61.4 19.9 17.7 [100 × 2]
#> 10 10 61.4 19.9 17.7 [100 × 2]
#> # ℹ 205 more rowsDetecting measure columns
measure provides helper functions to find and validate measure columns:
is_measure_list() and
is_measure_tbl()
Test if an object has the appropriate class:
is_measure_list(result$.measures)
#> [1] TRUE
is_measure_tbl(result$.measures[[1]])
#> [1] TRUE
# Regular lists and tibbles return FALSE
is_measure_list(list())
#> [1] FALSE
is_measure_tbl(tibble::tibble(location = 1:5, value = rnorm(5)))
#> [1] FALSE
find_measure_cols()
Find all measure columns in a data frame:
find_measure_cols(result)
#> [1] ".measures"
has_measure_col()
Check that a data frame has at least one measure column, erroring if not:
has_measure_col(result)This is used internally by processing steps to validate input.
Why this matters
The class-based approach provides several benefits:
-
Robust detection: Steps use
inherits(x, "measure_list")instead of checking column names -
Nice printing: Tibbles show
<meas [100]>instead of raw list output - Multiple columns: You can have multiple measure columns per dataset (e.g., UV and MS spectra)
- Validation: The classes enforce that measurements have the expected structure
For package developers
If you’re writing functions that work with measure data:
my_function <- function(data) {
# Validate input has measure columns
has_measure_col(data)
# Find measure columns
meas_cols <- find_measure_cols(data)
# Work with the measure_list
for (col in meas_cols) {
measurements <- data[[col]]
# Each element is a measure_tbl with $location and $value
}
}The helper functions measure_to_matrix() and
matrix_to_measure() in R/helpers.R convert
between measure lists and matrices for bulk operations.
Working with Measure Data Interactively
While recipe steps are the primary interface for production pipelines, measure provides utility functions for interactive exploration and debugging.
measure_map(): Prototyping transformations
When developing a custom transformation, use
measure_map() to test it interactively:
# Apply a transformation to each sample's measurements
centered <- measure_map(result, ~ {
.x$value <- .x$value - mean(.x$value)
.x
})
# Check the result
mean(centered$.measures[[1]]$value) # Should be ~0
#> [1] -1.599431e-16Important: measure_map() is for
exploration only. Once your transformation works, move it to
step_measure_map() for reproducible pipelines:
# For production use
rec <- recipe(...) |>
step_measure_input_long(...) |>
step_measure_map(~ { .x$value <- .x$value - mean(.x$value); .x })measure_map_safely(): Fault-tolerant exploration
When exploring data that might have problematic samples, use the safer variant:
result <- measure_map_safely(data, risky_function)
# Check which samples failed
result$errors
# result$result contains data with successful transforms
# (failed samples keep original values)measure_summarize(): Understanding your data
Compute summary statistics across all samples at each measurement location:
# Default: mean and SD at each location
stats <- measure_summarize(result)
head(stats)
#> # A tibble: 6 × 3
#> location mean sd
#> <int> <dbl> <dbl>
#> 1 1 2.81 0.411
#> 2 2 2.81 0.413
#> 3 3 2.81 0.416
#> 4 4 2.82 0.418
#> 5 5 2.82 0.421
#> 6 6 2.82 0.424This is useful for:
- Computing reference spectra (e.g., for MSC-style corrections)
- Identifying high-variability regions
- Quality control and outlier detection
Multiple Measure Columns
measure supports multiple measure columns in a single dataset. This is useful when you have different types of measurements (e.g., UV and MS spectra) that need separate processing.
Creating multiple measure columns
Use the col_name parameter in input steps:
rec <- recipe(outcome ~ ., data = my_data) |>
step_measure_input_wide(
starts_with("uv_"),
col_name = ".uv_spectrum"
) |>
step_measure_input_wide(
starts_with("ms_"),
col_name = ".ms_spectrum"
)Processing steps
By default, processing steps operate on all measure columns:
rec <- rec |>
step_measure_snv() # Applies to both .uv_spectrum and .ms_spectrumTo process specific columns, use the measures
parameter:
rec <- rec |>
step_measure_snv(measures = ".uv_spectrum") # Only UVOutput steps
When multiple measure columns exist, output steps require you to specify which column to output:
rec <- rec |>
step_measure_output_wide(measures = ".uv_spectrum", prefix = "uv_") |>
step_measure_output_wide(measures = ".ms_spectrum", prefix = "ms_")If you don’t specify and multiple columns exist, you’ll get a helpful error message telling you which columns are available.