Skip to contents
library(measure)
#> Loading required package: recipes
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> 
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#> 
#>     step
library(recipes)
library(dplyr)

Introduction

Many analytical techniques produce multi-dimensional data. Examples include:

  • LC-DAD: Liquid chromatography with diode array detection (time × wavelength)
  • GC×GC: Comprehensive two-dimensional gas chromatography (time₁ × time₂)
  • EEM: Excitation-emission matrix fluorescence (excitation × emission wavelength)
  • 2D NMR: Two-dimensional nuclear magnetic resonance (chemical shift × chemical shift)

The measure package provides native support for n-dimensional measurement data through the measure_nd_tbl and measure_nd_list classes.

Creating 2D Measurement Data

Let’s create synthetic LC-DAD data with retention time and wavelength dimensions:

set.seed(42)

# Simulate 3 samples with LC-DAD measurements
# 10 time points × 4 wavelengths = 40 data points per sample
lc_dad_data <- tibble(
  sample_id = rep(1:3, each = 40),
  retention_time = rep(rep(seq(0, 9, by = 1), each = 4), 3),
  wavelength = rep(c(254, 280, 320, 350), 30),
  absorbance = rnorm(120, mean = 100, sd = 10),
  concentration = rep(c(10, 25, 50), each = 40)
)

head(lc_dad_data, 12)
#> # A tibble: 12 × 5
#>    sample_id retention_time wavelength absorbance concentration
#>        <int>          <dbl>      <dbl>      <dbl>         <dbl>
#>  1         1              0        254      114.             10
#>  2         1              0        280       94.4            10
#>  3         1              0        320      104.             10
#>  4         1              0        350      106.             10
#>  5         1              1        254      104.             10
#>  6         1              1        280       98.9            10
#>  7         1              1        320      115.             10
#>  8         1              1        350       99.1            10
#>  9         1              2        254      120.             10
#> 10         1              2        280       99.4            10
#> 11         1              2        320      113.             10
#> 12         1              2        350      123.             10

This is the typical “long format” for 2D analytical data, where each row represents a single measurement at a specific (time, wavelength) coordinate.

Ingesting 2D Data

Use step_measure_input_long() with multiple location columns to create a 2D measure column:

rec <- recipe(concentration ~ ., data = lc_dad_data) |>
  update_role(sample_id, new_role = "id") |>
  step_measure_input_long(
    absorbance,
    location = vars(retention_time, wavelength),
    dim_names = c("time", "wavelength"),
    dim_units = c("min", "nm")
  ) |>
  prep()

result <- bake(rec, new_data = NULL)
result
#> # A tibble: 3 × 3
#>   sample_id concentration .measures
#>       <int>         <dbl>  <meas2d>
#> 1         1            10  [40 × 3]
#> 2         2            25  [40 × 3]
#> 3         3            50  [40 × 3]

The .measures column now contains measure_nd_list objects - one 2D measurement per sample:

class(result$.measures)
#> [1] "measure_nd_list" "measure_list"    "vctrs_list_of"   "vctrs_vctr"     
#> [5] "list"
measure_ndim(result$.measures)
#> [1] 2

Inspecting 2D Measurements

Each element of the measure column is a measure_nd_tbl:

# First sample's measurement
m1 <- result$.measures[[1]]
class(m1)
#> [1] "measure_nd_tbl" "measure_tbl"    "tbl_df"         "tbl"           
#> [5] "data.frame"
m1
#> <measure_nd_tbl [40 x 3] time x wavelength>
#> <measure_tbl [40 x 3]>
#> # A tibble: 40 × 3
#>    location_1 location_2 value
#>         <dbl>      <dbl> <dbl>
#>  1          0        254 114. 
#>  2          0        280  94.4
#>  3          0        320 104. 
#>  4          0        350 106. 
#>  5          1        254 104. 
#>  6          1        280  98.9
#>  7          1        320 115. 
#>  8          1        350  99.1
#>  9          2        254 120. 
#> 10          2        280  99.4
#> # ℹ 30 more rows

Dimension metadata is preserved:

measure_dim_names(m1)
#> [1] "time"       "wavelength"
measure_dim_units(m1)
#> [1] "min" "nm"

Grid Information

The measure_grid_info() function provides detailed information about the measurement grid:

info <- measure_grid_info(m1)
info$ndim
#> [1] 2
info$shape
#> dim_1 dim_2 
#>    10     4
info$n_points
#> [1] 40
info$is_regular
#> [1] TRUE

A “regular” grid means all combinations of location values are present (complete rectangular grid).

Applying 1D Operations to 2D Data

The measure_apply() function enables existing 1D preprocessing operations to work on n-dimensional data by applying them along specified dimensions.

# Define a simple 1D smoothing function
smooth_1d <- function(x, window = 3) {
  if (nrow(x) < window) return(x)
  smoothed <- stats::filter(x$value, rep(1/window, window), sides = 2)
  valid <- !is.na(smoothed)
  new_measure_tbl(
    location = x$location[valid],
    value = as.numeric(smoothed[valid])
  )
}

Apply smoothing along the time dimension (dimension 1):

# Apply to a single 2D measurement
smoothed <- measure_apply(m1, smooth_1d, along = 1, window = 3)

# Original had 40 points (10 time × 4 wavelength)
nrow(m1)
#> [1] 40

# Smoothed has fewer points (edges removed by filter)
nrow(smoothed)
#> [1] 32

The function was applied independently to each wavelength slice, treating time as the 1D axis.

Converting Back to Long Format

Use step_measure_output_long() to convert the nested measure back to long format:

output_rec <- recipe(concentration ~ ., data = lc_dad_data) |>
  update_role(sample_id, new_role = "id") |>
  step_measure_input_long(
    absorbance,
    location = vars(retention_time, wavelength)
  ) |>
  step_measure_output_long(
    values_to = "absorbance",
    location_to = "loc"
  ) |>
  prep()

output_result <- bake(output_rec, new_data = NULL)
head(output_result)
#> # A tibble: 6 × 5
#>   sample_id concentration loc_1 loc_2 absorbance
#>       <int>         <dbl> <dbl> <dbl>      <dbl>
#> 1         1            10     0   254      114. 
#> 2         1            10     0   280       94.4
#> 3         1            10     0   320      104. 
#> 4         1            10     0   350      106. 
#> 5         1            10     1   254      104. 
#> 6         1            10     1   280       98.9

For 2D data, location columns are named with dimension suffixes (loc_1, loc_2).

Irregular Grids

Not all 2D data forms a regular rectangular grid. The package handles irregular grids gracefully:

# Create irregular data (different wavelengths sampled at different times)
irregular_data <- tibble(
  sample_id = rep(1, 7),
  time = c(0, 0, 0, 5, 5, 10, 10),
  wavelength = c(254, 280, 320, 254, 280, 254, 350),
  value = rnorm(7),
  outcome = 1
)

irr_rec <- recipe(outcome ~ ., data = irregular_data) |>
  update_role(sample_id, new_role = "id") |>
  step_measure_input_long(value, location = vars(time, wavelength)) |>
  prep()

irr_result <- bake(irr_rec, new_data = NULL)

# Check regularity
measure_is_regular(irr_result$.measures[[1]])
#> [1] FALSE

Dimension Reduction Operations

The package provides several operations for reducing dimensionality of nD data.

Unfolding and Folding

measure_unfold() converts nD data to 1D for use with modeling techniques that expect vectors:

# Unfold 2D to 1D
m1d <- measure_unfold(m1)
m1d
#> <measure_tbl [40 x 2]>
#> # A tibble: 40 × 2
#>    location value
#>       <int> <dbl>
#>  1        1 114. 
#>  2        2 104. 
#>  3        3 120. 
#>  4        4  86.1
#>  5        5  97.2
#>  6        6  96.9
#>  7        7 119. 
#>  8        8 105. 
#>  9        9 110. 
#> 10       10  92.2
#> # ℹ 30 more rows

# The fold metadata is preserved
attr(m1d, "fold_info")$ndim
#> [1] 2

measure_fold() reconstructs the original nD structure:

# Reconstruct 2D from 1D
m2d_restored <- measure_fold(m1d)
measure_ndim(m2d_restored)
#> [1] 2

Slicing

measure_slice() extracts subsets at specific coordinates:

# Extract data at wavelength = 254
slice_254 <- measure_slice(m1, wavelength = 254)
slice_254
#> <measure_tbl [10 x 2]>
#> # A tibble: 10 × 2
#>    location value
#>       <dbl> <dbl>
#>  1        0 114. 
#>  2        1 104. 
#>  3        2 120. 
#>  4        3  86.1
#>  5        4  97.2
#>  6        5  96.9
#>  7        6 119. 
#>  8        7 105. 
#>  9        8 110. 
#> 10        9  92.2

# Extract multiple wavelengths (keeps 2D structure)
slice_uv <- measure_slice(m1, wavelength = c(254, 280), drop = FALSE)
measure_ndim(slice_uv)
#> [1] 2

Projection

measure_project() aggregates across dimensions:

# Average across wavelengths to get time trace
time_trace <- measure_project(m1, along = "wavelength")
time_trace
#> <measure_tbl [10 x 2]>
#> # A tibble: 10 × 2
#>    location value
#>       <dbl> <dbl>
#>  1        0 105. 
#>  2        1 104. 
#>  3        2 114. 
#>  4        3  97.1
#>  5        4  89.8
#>  6        5  97.4
#>  7        6  98.6
#>  8        7 102. 
#>  9        8  98.0
#> 10        9  90.0

# Sum across time to get total absorbance per wavelength
wl_total <- measure_project(m1, along = "time", fn = sum)
wl_total
#> <measure_tbl [4 x 2]>
#> # A tibble: 4 × 2
#>   location value
#>      <dbl> <dbl>
#> 1      254 1044.
#> 2      280  920.
#> 3      320  987.
#> 4      350 1033.

Multi-Channel Operations

When working with multiple detector channels (e.g., UV + RI in SEC, or multiple wavelengths in LC-DAD), the package provides steps for aligning, combining, and computing ratios between channels.

Channel Alignment

step_measure_channel_align() aligns multiple measure columns to a common grid:

# Align UV and RI detector signals to the same time grid
rec <- recipe(outcome ~ ., data = sec_data) |>
 step_measure_input_wide(starts_with("uv_"), col_name = "uv") |>
 step_measure_input_wide(starts_with("ri_"), col_name = "ri") |>
 step_measure_channel_align(uv, ri, method = "intersection") |>
 prep()

Methods include: - "intersection": Keep only locations present in all channels - "union": Include all locations, interpolating missing values - "reference": Align all channels to a reference channel’s grid

Channel Combination

step_measure_channel_combine() merges multiple channels:

# Stack channels into a single 2D measure (location x channel)
rec <- recipe(outcome ~ ., data = multi_detector_data) |>
 step_measure_input_wide(starts_with("uv_"), col_name = "uv") |>
 step_measure_input_wide(starts_with("ri_"), col_name = "ri") |>
 step_measure_channel_align(uv, ri) |>
 step_measure_channel_combine(uv, ri, strategy = "stack") |>
 prep()

Strategies include: - "stack": Create a 2D measure with channel as a dimension - "concat": Concatenate into a single 1D vector - "mean" or "weighted_sum": Combine into a single channel

Channel Ratios

step_measure_channel_ratio() computes ratios between channels:

# Compute UV/RI ratio for each sample
rec <- recipe(outcome ~ ., data = sec_data) |>
 step_measure_input_wide(starts_with("uv_"), col_name = "uv") |>
 step_measure_input_wide(starts_with("ri_"), col_name = "ri") |>
 step_measure_channel_align(uv, ri) |>
 step_measure_channel_ratio(numerator = uv, denominator = ri) |>
 prep()

Multi-Way Analysis

For extracting interpretable components from 2D or 3D measurement data, the package provides multi-way decomposition methods.

PARAFAC Decomposition

step_measure_parafac() performs Parallel Factor Analysis, extracting trilinear components:

# Extract 3 PARAFAC components from EEM fluorescence data
rec <- recipe(concentration ~ ., data = eem_data) |>
 step_measure_input_long(
   fluorescence,
   location = vars(excitation, emission)
 ) |>
 step_measure_parafac(n_components = 3) |>
 prep()

# Result contains parafac_1, parafac_2, parafac_3 score columns
baked <- bake(rec, new_data = NULL)

PARAFAC is particularly useful for: - EEM fluorescence (excitation x emission matrices) - Resolving overlapping chromatographic peaks - Identifying underlying chemical species in mixtures

Tucker Decomposition

step_measure_tucker() provides more flexibility with independent ranks per mode:

# Tucker decomposition with different ranks for each dimension
rec <- recipe(concentration ~ ., data = lc_dad_data) |>
 step_measure_input_long(
   absorbance,
   location = vars(time, wavelength)
 ) |>
 step_measure_tucker(ranks = c(5, 3)) |>  # 5 time, 3 wavelength components
 prep()

MCR-ALS (Experimental)

step_measure_mcr_als() implements Multivariate Curve Resolution with Alternating Least Squares:

# MCR-ALS with non-negativity constraints
rec <- recipe(concentration ~ ., data = chrom_data) |>
 step_measure_input_long(
   absorbance,
   location = vars(time, wavelength)
 ) |>
 step_measure_mcr_als(
   n_components = 3,
   non_negativity = TRUE
 ) |>
 prep()

Note: MCR-ALS is marked as experimental. The implementation uses a simple ALS algorithm suitable for exploratory analysis.

Summary

Key functions for multi-dimensional measurement data:

Function Purpose
step_measure_input_long() Ingest nD data with multiple location columns
step_measure_output_long() Convert nD data back to long format
measure_ndim() Get number of dimensions
measure_dim_names() Get semantic dimension names
measure_dim_units() Get dimension units
measure_is_regular() Check if grid is regular/rectangular
measure_grid_info() Get detailed grid information
measure_apply() Apply 1D functions along specified dimensions
measure_unfold() Convert nD to 1D with fold metadata
measure_fold() Reconstruct nD from unfolded 1D
measure_slice() Extract slices at specific coordinates
measure_project() Aggregate across dimensions
step_measure_channel_align() Align channels to common grid
step_measure_channel_combine() Combine multiple channels
step_measure_channel_ratio() Compute ratios between channels
step_measure_parafac() PARAFAC decomposition
step_measure_tucker() Tucker decomposition
step_measure_mcr_als() MCR-ALS decomposition (experimental)