Skip to contents

step_measure_baseline_poly() creates a specification of a recipe step that applies polynomial baseline correction to measurement data. The method fits a polynomial to the spectrum, optionally with iterative peak exclusion.

Usage

step_measure_baseline_poly(
  recipe,
  measures = NULL,
  degree = 2L,
  max_iter = 0L,
  threshold = 1.5,
  role = NA,
  trained = FALSE,
  skip = FALSE,
  id = recipes::rand_id("measure_baseline_poly")
)

Arguments

recipe

A recipe object. The step will be added to the sequence of operations for this recipe.

measures

An optional character vector of measure column names to process. If NULL (the default), all measure columns (columns with class measure_list) will be processed.

degree

Polynomial degree for baseline fitting. Default is 2 (quadratic). Higher degrees fit more complex baselines but risk overfitting. Tunable via baseline_degree().

max_iter

Maximum number of iterations for peak exclusion. Default is 0 (no iteration, fit polynomial to all points). Set to a positive integer to iteratively exclude points above the fitted baseline.

threshold

Number of standard deviations above baseline for a point to be excluded in iterative fitting. Default is 1.5. Only used when max_iter > 0.

role

Not used by this step since no new variables are created.

trained

A logical to indicate if the quantities for preprocessing have been estimated.

skip

A logical. Should the step be skipped when the recipe is baked?

id

A character string that is unique to this step to identify it.

Value

An updated version of recipe with the new step added to the sequence of any existing operations.

Details

Polynomial baseline correction fits a polynomial function to the spectrum and subtracts it. This is effective for removing smooth, curved baselines caused by instrumental drift, scattering, or other slowly varying effects.

When max_iter > 0, the algorithm uses iterative peak exclusion:

  1. Fit polynomial to all points

  2. Calculate residuals (spectrum - baseline)

  3. Exclude points where residual > threshold * SD(residuals)

  4. Refit polynomial to remaining points

  5. Repeat until convergence or max_iter reached

This iterative approach prevents peaks from pulling up the baseline estimate.

Degree selection:

  • degree = 1: Linear baseline (for simple drift)

  • degree = 2: Quadratic (most common, handles gentle curvature)

  • degree = 3-5: Higher-order (for complex baselines, use cautiously)

No selectors should be supplied to this step function. The data should be in the internal format produced by step_measure_input_wide() or step_measure_input_long().

Tidying

When you tidy() this step, a tibble with columns terms, degree, and id is returned.

Examples

library(recipes)

# Simple polynomial baseline (no iteration)
rec <- recipe(water + fat + protein ~ ., data = meats_long) |>
  update_role(id, new_role = "id") |>
  step_measure_input_long(transmittance, location = vars(channel)) |>
  step_measure_baseline_poly(degree = 2) |>
  prep()

bake(rec, new_data = NULL)
#> # A tibble: 215 × 5
#>       id water   fat protein .measures
#>    <int> <dbl> <dbl>   <dbl>    <meas>
#>  1     1  60.5  22.5    16.7 [100 × 2]
#>  2     2  46    40.1    13.5 [100 × 2]
#>  3     3  71     8.4    20.5 [100 × 2]
#>  4     4  72.8   5.9    20.7 [100 × 2]
#>  5     5  58.3  25.5    15.5 [100 × 2]
#>  6     6  44    42.7    13.7 [100 × 2]
#>  7     7  44    42.7    13.7 [100 × 2]
#>  8     8  69.3  10.6    19.3 [100 × 2]
#>  9     9  61.4  19.9    17.7 [100 × 2]
#> 10    10  61.4  19.9    17.7 [100 × 2]
#> # ℹ 205 more rows

# With iterative peak exclusion
rec2 <- recipe(water + fat + protein ~ ., data = meats_long) |>
  update_role(id, new_role = "id") |>
  step_measure_input_long(transmittance, location = vars(channel)) |>
  step_measure_baseline_poly(degree = 3, max_iter = 5, threshold = 2) |>
  prep()