step_measure_baseline_rf() creates a specification of a recipe step
that applies robust fitting baseline correction to measurement data. This
method uses local regression with iterative reweighting to fit a baseline
that is resistant to peaks.
Arguments
- recipe
A recipe object. The step will be added to the sequence of operations for this recipe.
- measures
An optional character vector of measure column names to process. If
NULL(the default), all measure columns (columns with classmeasure_list) will be processed.- span
Controls the amount of smoothing. This is the fraction of data used in computing each fitted value. Default is
2/3. Smaller values produce less smooth baselines that follow local features more closely.- maxit
A length-2 integer vector specifying the number of iterations for the robust fit. The first value is for the asymmetric weighting function, the second for symmetric weighting. Default is
c(5, 5).- role
Not used by this step since no new variables are created.
- trained
A logical to indicate if the quantities for preprocessing have been estimated.
- skip
A logical. Should the step be skipped when the recipe is baked?
- id
A character string that is unique to this step to identify it.
Value
An updated version of recipe with the new step added to the
sequence of any existing operations.
Details
Robust fitting baseline correction uses local polynomial regression (LOESS/LOWESS) with iterative reweighting to estimate the baseline. The algorithm uses asymmetric weights in initial iterations to down-weight peaks, then symmetric weights for final smoothing.
This method is particularly effective for:
Spectra with peaks of varying widths
Data where the baseline shape is not well-described by a polynomial
Situations where peaks should not influence the baseline estimate
The span parameter controls the trade-off between smoothness and local
adaptation:
Larger span (e.g., 0.8): Smoother baseline, may miss local variations
Smaller span (e.g., 0.3): More local adaptation, may overfit
No selectors should be supplied to this step function. The data should be
in the internal format produced by step_measure_input_wide() or
step_measure_input_long().
Tidying
When you tidy() this step, a tibble with columns
terms, span, and id is returned.
See also
subtract_rf_baseline() for the standalone function this step wraps.
Other measure-baseline:
step_measure_baseline_airpls(),
step_measure_baseline_als(),
step_measure_baseline_arpls(),
step_measure_baseline_auto(),
step_measure_baseline_custom(),
step_measure_baseline_gpc(),
step_measure_baseline_minima(),
step_measure_baseline_morph(),
step_measure_baseline_poly(),
step_measure_baseline_py(),
step_measure_baseline_rolling(),
step_measure_baseline_snip(),
step_measure_baseline_tophat(),
step_measure_detrend()
Examples
library(recipes)
rec <- recipe(water + fat + protein ~ ., data = meats_long) |>
update_role(id, new_role = "id") |>
step_measure_input_long(transmittance, location = vars(channel)) |>
step_measure_baseline_rf(span = 0.5) |>
prep()
bake(rec, new_data = NULL)
#> # A tibble: 215 × 5
#> id water fat protein .measures
#> <int> <dbl> <dbl> <dbl> <meas>
#> 1 1 60.5 22.5 16.7 [100 × 2]
#> 2 2 46 40.1 13.5 [100 × 2]
#> 3 3 71 8.4 20.5 [100 × 2]
#> 4 4 72.8 5.9 20.7 [100 × 2]
#> 5 5 58.3 25.5 15.5 [100 × 2]
#> 6 6 44 42.7 13.7 [100 × 2]
#> 7 7 44 42.7 13.7 [100 × 2]
#> 8 8 69.3 10.6 19.3 [100 × 2]
#> 9 9 61.4 19.9 17.7 [100 × 2]
#> 10 10 61.4 19.9 17.7 [100 × 2]
#> # ℹ 205 more rows