Skip to contents

step_measure_qc_saturated() creates a specification of a recipe step that detects saturated (clipped) regions in measurements and adds metadata columns indicating saturation status.

Usage

step_measure_qc_saturated(
  recipe,
  measures = NULL,
  upper_limit = NULL,
  lower_limit = NULL,
  tolerance = 0.01,
  new_col_flag = ".saturated",
  new_col_pct = ".sat_pct",
  role = "predictor",
  trained = FALSE,
  skip = FALSE,
  id = recipes::rand_id("measure_qc_saturated")
)

Arguments

recipe

A recipe object.

measures

An optional character vector of measure column names.

upper_limit

Upper saturation threshold. Default is NULL (auto-detect).

lower_limit

Lower saturation threshold. Default is NULL (auto-detect).

tolerance

How close to the limit counts as saturated. Default is 0.01.

new_col_flag

Name of column for saturation flag. Default is ".saturated".

new_col_pct

Name of column for saturation percentage. Default is ".sat_pct".

role

Role for new columns. Default is "predictor".

trained

Logical indicating if the step has been trained.

skip

Logical. Should the step be skipped when baking?

id

Unique step identifier.

Value

An updated recipe with the new step added.

Details

Saturation occurs when detector response reaches its maximum (or minimum) capacity. Saturated data points lose quantitative information and may need special handling.

If limits are not specified, they are auto-detected as values appearing as flat regions at extreme values (using min() and max()).

Two new columns are added:

  • .saturated: Logical, TRUE if any saturation detected

  • .sat_pct: Percentage of points that are saturated

Examples

library(recipes)

rec <- recipe(water + fat + protein ~ ., data = meats_long) |>
  update_role(id, new_role = "id") |>
  step_measure_input_long(transmittance, location = vars(channel)) |>
  step_measure_qc_saturated() |>
  prep()

bake(rec, new_data = NULL)
#> # A tibble: 215 × 7
#>       id water   fat protein .measures .saturated .sat_pct
#>    <int> <dbl> <dbl>   <dbl>    <meas> <lgl>         <dbl>
#>  1     1  60.5  22.5    16.7 [100 × 2] FALSE             0
#>  2     2  46    40.1    13.5 [100 × 2] FALSE             0
#>  3     3  71     8.4    20.5 [100 × 2] FALSE             0
#>  4     4  72.8   5.9    20.7 [100 × 2] FALSE             0
#>  5     5  58.3  25.5    15.5 [100 × 2] FALSE             0
#>  6     6  44    42.7    13.7 [100 × 2] FALSE             0
#>  7     7  44    42.7    13.7 [100 × 2] FALSE             0
#>  8     8  69.3  10.6    19.3 [100 × 2] FALSE             0
#>  9     9  61.4  19.9    17.7 [100 × 2] FALSE             0
#> 10    10  61.4  19.9    17.7 [100 × 2] FALSE             0
#> # ℹ 205 more rows