Skip to contents

step_measure_baseline_rf() creates a specification of a recipe step that applies robust fitting baseline correction to measurement data. This method uses local regression with iterative reweighting to fit a baseline that is resistant to peaks.

Usage

step_measure_baseline_rf(
  recipe,
  measures = NULL,
  span = 2/3,
  maxit = c(5L, 5L),
  role = NA,
  trained = FALSE,
  skip = FALSE,
  id = recipes::rand_id("measure_baseline_rf")
)

Arguments

recipe

A recipe object. The step will be added to the sequence of operations for this recipe.

measures

An optional character vector of measure column names to process. If NULL (the default), all measure columns (columns with class measure_list) will be processed.

span

Controls the amount of smoothing. This is the fraction of data used in computing each fitted value. Default is 2/3. Smaller values produce less smooth baselines that follow local features more closely.

maxit

A length-2 integer vector specifying the number of iterations for the robust fit. The first value is for the asymmetric weighting function, the second for symmetric weighting. Default is c(5, 5).

role

Not used by this step since no new variables are created.

trained

A logical to indicate if the quantities for preprocessing have been estimated.

skip

A logical. Should the step be skipped when the recipe is baked?

id

A character string that is unique to this step to identify it.

Value

An updated version of recipe with the new step added to the sequence of any existing operations.

Details

Robust fitting baseline correction uses local polynomial regression (LOESS/LOWESS) with iterative reweighting to estimate the baseline. The algorithm uses asymmetric weights in initial iterations to down-weight peaks, then symmetric weights for final smoothing.

This method is particularly effective for:

  • Spectra with peaks of varying widths

  • Data where the baseline shape is not well-described by a polynomial

  • Situations where peaks should not influence the baseline estimate

The span parameter controls the trade-off between smoothness and local adaptation:

  • Larger span (e.g., 0.8): Smoother baseline, may miss local variations

  • Smaller span (e.g., 0.3): More local adaptation, may overfit

No selectors should be supplied to this step function. The data should be in the internal format produced by step_measure_input_wide() or step_measure_input_long().

Tidying

When you tidy() this step, a tibble with columns terms, span, and id is returned.

Examples

library(recipes)

rec <- recipe(water + fat + protein ~ ., data = meats_long) |>
  update_role(id, new_role = "id") |>
  step_measure_input_long(transmittance, location = vars(channel)) |>
  step_measure_baseline_rf(span = 0.5) |>
  prep()

bake(rec, new_data = NULL)
#> # A tibble: 215 × 5
#>       id water   fat protein .measures
#>    <int> <dbl> <dbl>   <dbl>    <meas>
#>  1     1  60.5  22.5    16.7 [100 × 2]
#>  2     2  46    40.1    13.5 [100 × 2]
#>  3     3  71     8.4    20.5 [100 × 2]
#>  4     4  72.8   5.9    20.7 [100 × 2]
#>  5     5  58.3  25.5    15.5 [100 × 2]
#>  6     6  44    42.7    13.7 [100 × 2]
#>  7     7  44    42.7    13.7 [100 × 2]
#>  8     8  69.3  10.6    19.3 [100 × 2]
#>  9     9  61.4  19.9    17.7 [100 × 2]
#> 10    10  61.4  19.9    17.7 [100 × 2]
#> # ℹ 205 more rows