step_measure_normalize_peak() creates a specification of a recipe step that
divides each spectrum by a summary statistic computed from a specified region.
This is commonly used for internal standard normalization.
Usage
step_measure_normalize_peak(
recipe,
measures = NULL,
role = NA,
trained = FALSE,
location_min = NULL,
location_max = NULL,
method = "mean",
skip = FALSE,
id = recipes::rand_id("measure_normalize_peak")
)Arguments
- recipe
A recipe object. The step will be added to the sequence of operations for this recipe.
- measures
An optional character vector of measure column names to process. If
NULL(the default), all measure columns (columns with classmeasure_list) will be processed. Use this to limit processing to specific measure columns when working with multiple measurement types.- role
Not used by this step since no new variables are created.
- trained
A logical to indicate if the quantities for preprocessing have been estimated.
- location_min
Numeric. The lower bound of the region to use for normalization. This parameter is tunable with
peak_location_min().- location_max
Numeric. The upper bound of the region to use for normalization. This parameter is tunable with
peak_location_max().- method
Character. The summary statistic to compute from the region. One of
"mean"(default),"max", or"integral".- skip
A logical. Should the step be skipped when the recipe is baked by
recipes::bake()? While all operations are baked whenrecipes::prep()is run, some operations may not be able to be conducted on new data (e.g. processing the outcome variable(s)). Care should be taken when usingskip = TRUEas it may affect the computations for subsequent operations.- id
A character string that is unique to this step to identify it.
Value
An updated version of recipe with the new step added to the
sequence of any existing operations.
Details
For each spectrum, this step:
Selects values in the region
[location_min, location_max]Computes a summary statistic (mean, max, or integral) from that region
Divides the entire spectrum by this value
This is useful when you have an internal standard peak at a known location and want to normalize all spectra to that peak.
The location_min and location_max parameters are tunable with
peak_location_min() and peak_location_max() for hyperparameter
optimization.
If no values fall within the specified region, an error is raised. If the computed normalizer is zero or NA, a warning is issued and the original values are returned unchanged.
No selectors should be supplied to this step function. The data should be
in the internal format produced by step_measure_input_wide() or
step_measure_input_long().
Examples
library(recipes)
# Normalize to mean of region 40-60
rec <-
recipe(water + fat + protein ~ ., data = meats_long) |>
update_role(id, new_role = "id") |>
step_measure_input_long(transmittance, location = vars(channel)) |>
step_measure_normalize_peak(location_min = 40, location_max = 60) |>
prep()
bake(rec, new_data = NULL)
#> # A tibble: 215 × 5
#> id water fat protein .measures
#> <int> <dbl> <dbl> <dbl> <meas>
#> 1 1 60.5 22.5 16.7 [100 × 2]
#> 2 2 46 40.1 13.5 [100 × 2]
#> 3 3 71 8.4 20.5 [100 × 2]
#> 4 4 72.8 5.9 20.7 [100 × 2]
#> 5 5 58.3 25.5 15.5 [100 × 2]
#> 6 6 44 42.7 13.7 [100 × 2]
#> 7 7 44 42.7 13.7 [100 × 2]
#> 8 8 69.3 10.6 19.3 [100 × 2]
#> 9 9 61.4 19.9 17.7 [100 × 2]
#> 10 10 61.4 19.9 17.7 [100 × 2]
#> # ℹ 205 more rows