Skip to contents

step_measure_baseline_als() creates a specification of a recipe step that applies Asymmetric Least Squares baseline correction to measurement data. ALS iteratively fits a smooth baseline giving less weight to points above the baseline (peaks).

Usage

step_measure_baseline_als(
  recipe,
  measures = NULL,
  lambda = 1e+06,
  p = 0.01,
  max_iter = 20L,
  role = NA,
  trained = FALSE,
  skip = FALSE,
  id = recipes::rand_id("measure_baseline_als")
)

Arguments

recipe

A recipe object. The step will be added to the sequence of operations for this recipe.

measures

An optional character vector of measure column names to process. If NULL (the default), all measure columns (columns with class measure_list) will be processed.

lambda

Smoothness parameter (2nd derivative constraint). Higher values produce smoother baselines. Default is 1e6. Typical range is 1e3 to 1e9. Tunable via baseline_lambda().

p

Asymmetry parameter controlling weight for positive residuals. Values near 0 (e.g., 0.001-0.05) work well for spectra with peaks above baseline. Default is 0.01. Tunable via baseline_asymmetry().

max_iter

Maximum number of iterations. Default is 20.

role

Not used by this step since no new variables are created.

trained

A logical to indicate if the quantities for preprocessing have been estimated.

skip

A logical. Should the step be skipped when the recipe is baked?

id

A character string that is unique to this step to identify it.

Value

An updated version of recipe with the new step added to the sequence of any existing operations.

Details

Asymmetric Least Squares (ALS) baseline correction uses a Whittaker smoother with asymmetric weights to fit a baseline that follows the lower envelope of the spectrum. The algorithm iteratively:

1

. Fits a smooth baseline using penalized least squares 2. Calculates residuals (spectrum - baseline) 3. Assigns weights: p for positive residuals (peaks), 1-p for negative 4. Repeats until convergence or max iterations

The smoothness is controlled by lambda, which penalizes the second derivative of the baseline. Larger lambda produces smoother baselines.

ALS is particularly effective for:

  • NIR/IR spectroscopy with broad baseline drift

  • Raman spectroscopy with fluorescence background

  • UV-Vis spectroscopy with scattering effects

No selectors should be supplied to this step function. The data should be in the internal format produced by step_measure_input_wide() or step_measure_input_long().

Tidying

When you tidy() this step, a tibble with columns terms, lambda, p, and id is returned.

Tuning

This step has parameters that can be tuned:

References

Eilers, P.H.C. and Boelens, H.F.M. (2005). Baseline Correction with Asymmetric Least Squares Smoothing. Leiden University Medical Centre report.

Examples

library(recipes)

rec <- recipe(water + fat + protein ~ ., data = meats_long) |>
  update_role(id, new_role = "id") |>
  step_measure_input_long(transmittance, location = vars(channel)) |>
  step_measure_baseline_als(lambda = 1e6, p = 0.01) |>
  prep()

bake(rec, new_data = NULL)
#> # A tibble: 215 × 5
#>       id water   fat protein .measures
#>    <int> <dbl> <dbl>   <dbl>    <meas>
#>  1     1  60.5  22.5    16.7 [100 × 2]
#>  2     2  46    40.1    13.5 [100 × 2]
#>  3     3  71     8.4    20.5 [100 × 2]
#>  4     4  72.8   5.9    20.7 [100 × 2]
#>  5     5  58.3  25.5    15.5 [100 × 2]
#>  6     6  44    42.7    13.7 [100 × 2]
#>  7     7  44    42.7    13.7 [100 × 2]
#>  8     8  69.3  10.6    19.3 [100 × 2]
#>  9     9  61.4  19.9    17.7 [100 × 2]
#> 10    10  61.4  19.9    17.7 [100 × 2]
#> # ℹ 205 more rows