GPC/SEC Baseline Correction

step_measure_baseline_gpc() creates a specification of a recipe step that applies baseline correction optimized for Gel Permeation Chromatography (GPC) or Size Exclusion Chromatography (SEC) data. This method estimates the baseline by interpolating between baseline regions at the start and end of the chromatogram.

This step has been superseded by measure.sec::step_sec_baseline(). For new code, we recommend using the measure.sec package which provides more complete SEC/GPC analysis functionality.

Usage

step_measure_baseline_gpc(
  recipe,
  measures = NULL,
  left_frac = 0.05,
  right_frac = 0.05,
  method = "linear",
  role = NA,
  trained = FALSE,
  skip = FALSE,
  id = recipes::rand_id("measure_baseline_gpc")
)

Arguments

recipe

A recipe object. The step will be added to the sequence of operations for this recipe.

measures

An optional character vector of measure column names to process. If NULL (the default), all measure columns (columns with class measure_list) will be processed.

left_frac

Fraction of points from the beginning to use as the left baseline region. Default is 0.05 (first 5% of data points).

right_frac

Fraction of points from the end to use as the right baseline region. Default is 0.05 (last 5% of data points).

method

Method for baseline estimation. One of:

"linear" (default): Linear interpolation between left and right means
"median": Uses median of baseline regions (more robust to outliers)
"spline": Smooth spline through baseline regions

role

Not used by this step since no new variables are created.

trained

A logical to indicate if the quantities for preprocessing have been estimated.

skip

A logical. Should the step be skipped when the recipe is baked?

id

A character string that is unique to this step to identify it.

Value

An updated version of recipe with the new step added to the sequence of any existing operations.

Details

GPC/SEC chromatograms typically have distinct baseline regions at the beginning and end where no polymer elutes. This step leverages this characteristic by:

1 2. Computing a representative baseline value for each region (mean or median) 3. Interpolating between these values to estimate the full baseline 4. Subtracting the estimated baseline from the signal

The left_frac and right_frac parameters control how much of the chromatogram is considered "baseline". Choose values that:

Include only the flat, signal-free regions
Exclude any polymer peaks or system peaks
Are large enough to average out noise

Unlike general-purpose baseline methods like ALS or polynomial fitting, this approach is specifically designed for the characteristic shape of GPC/SEC chromatograms and is computationally very fast.

No selectors should be supplied to this step function. The data should be in the internal format produced by step_measure_input_wide() or step_measure_input_long().

Tidying

When you tidy() this step, a tibble with columns terms, left_frac, right_frac, method, and id is returned.

Examples

library(recipes)

# Using meats_long as example (works on any measurement data)
rec <- recipe(water + fat + protein ~ ., data = meats_long) |>
  update_role(id, new_role = "id") |>
  step_measure_input_long(transmittance, location = vars(channel)) |>
  step_measure_baseline_gpc(left_frac = 0.1, right_frac = 0.1) |>
  prep()

bake(rec, new_data = NULL)
#> # A tibble: 215 × 6
#>       id water   fat protein .measures channel    
#>    <int> <dbl> <dbl>   <dbl>    <meas> <list>     
#>  1     1  60.5  22.5    16.7 [100 × 2] <int [100]>
#>  2     2  46    40.1    13.5 [100 × 2] <int [100]>
#>  3     3  71     8.4    20.5 [100 × 2] <int [100]>
#>  4     4  72.8   5.9    20.7 [100 × 2] <int [100]>
#>  5     5  58.3  25.5    15.5 [100 × 2] <int [100]>
#>  6     6  44    42.7    13.7 [100 × 2] <int [100]>
#>  7     7  44    42.7    13.7 [100 × 2] <int [100]>
#>  8     8  69.3  10.6    19.3 [100 × 2] <int [100]>
#>  9     9  61.4  19.9    17.7 [100 × 2] <int [100]>
#> 10    10  61.4  19.9    17.7 [100 × 2] <int [100]>
#> # ℹ 205 more rows