step_measure_qc_saturated() creates a specification of a recipe step
that detects saturated (clipped) regions in measurements and adds metadata
columns indicating saturation status.
Usage
step_measure_qc_saturated(
recipe,
measures = NULL,
upper_limit = NULL,
lower_limit = NULL,
tolerance = 0.01,
new_col_flag = ".saturated",
new_col_pct = ".sat_pct",
role = "predictor",
trained = FALSE,
skip = FALSE,
id = recipes::rand_id("measure_qc_saturated")
)Arguments
- recipe
A recipe object.
- measures
An optional character vector of measure column names.
- upper_limit
Upper saturation threshold. Default is
NULL(auto-detect).- lower_limit
Lower saturation threshold. Default is
NULL(auto-detect).- tolerance
How close to the limit counts as saturated. Default is 0.01.
- new_col_flag
Name of column for saturation flag. Default is
".saturated".- new_col_pct
Name of column for saturation percentage. Default is
".sat_pct".- role
Role for new columns. Default is
"predictor".- trained
Logical indicating if the step has been trained.
- skip
Logical. Should the step be skipped when baking?
- id
Unique step identifier.
Details
Saturation occurs when detector response reaches its maximum (or minimum) capacity. Saturated data points lose quantitative information and may need special handling.
If limits are not specified, they are auto-detected as values appearing
as flat regions at extreme values (using min() and max()).
Two new columns are added:
.saturated: Logical, TRUE if any saturation detected.sat_pct: Percentage of points that are saturated
See also
Other measure-qc:
step_measure_impute(),
step_measure_qc_outlier(),
step_measure_qc_snr()
Examples
library(recipes)
rec <- recipe(water + fat + protein ~ ., data = meats_long) |>
update_role(id, new_role = "id") |>
step_measure_input_long(transmittance, location = vars(channel)) |>
step_measure_qc_saturated() |>
prep()
bake(rec, new_data = NULL)
#> # A tibble: 215 × 7
#> id water fat protein .measures .saturated .sat_pct
#> <int> <dbl> <dbl> <dbl> <meas> <lgl> <dbl>
#> 1 1 60.5 22.5 16.7 [100 × 2] FALSE 0
#> 2 2 46 40.1 13.5 [100 × 2] FALSE 0
#> 3 3 71 8.4 20.5 [100 × 2] FALSE 0
#> 4 4 72.8 5.9 20.7 [100 × 2] FALSE 0
#> 5 5 58.3 25.5 15.5 [100 × 2] FALSE 0
#> 6 6 44 42.7 13.7 [100 × 2] FALSE 0
#> 7 7 44 42.7 13.7 [100 × 2] FALSE 0
#> 8 8 69.3 10.6 19.3 [100 × 2] FALSE 0
#> 9 9 61.4 19.9 17.7 [100 × 2] FALSE 0
#> 10 10 61.4 19.9 17.7 [100 × 2] FALSE 0
#> # ℹ 205 more rows