step_measure_augment_noise() creates a specification of a recipe step that
adds controlled random noise to spectral data for data augmentation.
Arguments
- recipe
A recipe object.
- sd
Standard deviation of noise. If
relative = TRUE(default), this is relative to the signal range (0.01 = 1% of range). Ifrelative = FALSE, this is the absolute noise level.- distribution
Noise distribution:
"gaussian"(default) or"uniform".- relative
Logical. If
TRUE(default),sdis relative to signal range.- measures
An optional character vector of measure column names.
- role
Not used.
- trained
Logical indicating if the step has been trained.
- skip
Logical. Should the step be skipped when baking? Default is
TRUE, meaning augmentation only applies during training.- id
Unique step identifier.
Details
Data augmentation adds variability to training data to improve model robustness. Adding noise simulates measurement uncertainty and helps models generalize better.
Default behavior (skip = TRUE):
The augmentation is only applied during prep() on training data.
When bake() is called on new data, the step is skipped.
Reproducibility: The noise is deterministic based on the row content, so the same input always produces the same augmented output within a session.
See also
Other measure-augmentation:
step_measure_augment_scale(),
step_measure_augment_shift()
Examples
library(recipes)
rec <- recipe(water + fat + protein ~ ., data = meats_long) |>
update_role(id, new_role = "id") |>
step_measure_input_long(transmittance, location = vars(channel)) |>
step_measure_augment_noise(sd = 0.02) |>
prep()
# Noise only applied to training data
bake(rec, new_data = NULL)
#> # A tibble: 215 × 5
#> id water fat protein .measures
#> <int> <dbl> <dbl> <dbl> <meas>
#> 1 1 60.5 22.5 16.7 [100 × 2]
#> 2 2 46 40.1 13.5 [100 × 2]
#> 3 3 71 8.4 20.5 [100 × 2]
#> 4 4 72.8 5.9 20.7 [100 × 2]
#> 5 5 58.3 25.5 15.5 [100 × 2]
#> 6 6 44 42.7 13.7 [100 × 2]
#> 7 7 44 42.7 13.7 [100 × 2]
#> 8 8 69.3 10.6 19.3 [100 × 2]
#> 9 9 61.4 19.9 17.7 [100 × 2]
#> 10 10 61.4 19.9 17.7 [100 × 2]
#> # ℹ 205 more rows