Skip to contents

step_measure_normalize_peak() creates a specification of a recipe step that divides each spectrum by a summary statistic computed from a specified region. This is commonly used for internal standard normalization.

Usage

step_measure_normalize_peak(
  recipe,
  measures = NULL,
  role = NA,
  trained = FALSE,
  location_min = NULL,
  location_max = NULL,
  method = "mean",
  skip = FALSE,
  id = recipes::rand_id("measure_normalize_peak")
)

Arguments

recipe

A recipe object. The step will be added to the sequence of operations for this recipe.

measures

An optional character vector of measure column names to process. If NULL (the default), all measure columns (columns with class measure_list) will be processed. Use this to limit processing to specific measure columns when working with multiple measurement types.

role

Not used by this step since no new variables are created.

trained

A logical to indicate if the quantities for preprocessing have been estimated.

location_min

Numeric. The lower bound of the region to use for normalization. This parameter is tunable with peak_location_min().

location_max

Numeric. The upper bound of the region to use for normalization. This parameter is tunable with peak_location_max().

method

Character. The summary statistic to compute from the region. One of "mean" (default), "max", or "integral".

skip

A logical. Should the step be skipped when the recipe is baked by recipes::bake()? While all operations are baked when recipes::prep() is run, some operations may not be able to be conducted on new data (e.g. processing the outcome variable(s)). Care should be taken when using skip = TRUE as it may affect the computations for subsequent operations.

id

A character string that is unique to this step to identify it.

Value

An updated version of recipe with the new step added to the sequence of any existing operations.

Details

For each spectrum, this step:

  1. Selects values in the region [location_min, location_max]

  2. Computes a summary statistic (mean, max, or integral) from that region

  3. Divides the entire spectrum by this value

This is useful when you have an internal standard peak at a known location and want to normalize all spectra to that peak.

The location_min and location_max parameters are tunable with peak_location_min() and peak_location_max() for hyperparameter optimization.

If no values fall within the specified region, an error is raised. If the computed normalizer is zero or NA, a warning is issued and the original values are returned unchanged.

No selectors should be supplied to this step function. The data should be in the internal format produced by step_measure_input_wide() or step_measure_input_long().

Examples

library(recipes)

# Normalize to mean of region 40-60
rec <-
  recipe(water + fat + protein ~ ., data = meats_long) |>
  update_role(id, new_role = "id") |>
  step_measure_input_long(transmittance, location = vars(channel)) |>
  step_measure_normalize_peak(location_min = 40, location_max = 60) |>
  prep()

bake(rec, new_data = NULL)
#> # A tibble: 215 × 5
#>       id water   fat protein .measures
#>    <int> <dbl> <dbl>   <dbl>    <meas>
#>  1     1  60.5  22.5    16.7 [100 × 2]
#>  2     2  46    40.1    13.5 [100 × 2]
#>  3     3  71     8.4    20.5 [100 × 2]
#>  4     4  72.8   5.9    20.7 [100 × 2]
#>  5     5  58.3  25.5    15.5 [100 × 2]
#>  6     6  44    42.7    13.7 [100 × 2]
#>  7     7  44    42.7    13.7 [100 × 2]
#>  8     8  69.3  10.6    19.3 [100 × 2]
#>  9     9  61.4  19.9    17.7 [100 × 2]
#> 10    10  61.4  19.9    17.7 [100 × 2]
#> # ℹ 205 more rows