Skip to contents

step_measure_smooth_ma() creates a specification of a recipe step that applies moving average smoothing to measurement data. This is a simple and fast method for reducing high-frequency noise.

Usage

step_measure_smooth_ma(
  recipe,
  measures = NULL,
  window = 5L,
  edge_method = c("reflect", "constant", "NA"),
  role = NA,
  trained = FALSE,
  skip = FALSE,
  id = recipes::rand_id("measure_smooth_ma")
)

Arguments

recipe

A recipe object.

measures

An optional character vector of measure column names.

window

The window size for the moving average. Must be an odd integer of at least 3. Default is 5. Larger values produce more smoothing. Tunable via smooth_window().

edge_method

How to handle edges where the full window doesn't fit. One of "reflect" (default, reflects values at boundaries), "constant" (pads with edge values), or "NA" (returns NA for edge values).

role

Not used.

trained

Logical indicating if the step has been trained.

skip

Logical. Should the step be skipped when baking?

id

Unique step identifier.

Value

An updated recipe with the new step added.

Details

Moving average smoothing replaces each point with the mean of its neighbors within a sliding window. This is equivalent to convolution with a uniform kernel.

For a window size of w, the smoothed value at position i is: $$y_i = \frac{1}{w} \sum_{j=-k}^{k} x_{i+j}$$

where k = (w-1)/2 is the half-window size.

Examples

library(recipes)

rec <- recipe(water + fat + protein ~ ., data = meats_long) |>
  update_role(id, new_role = "id") |>
  step_measure_input_long(transmittance, location = vars(channel)) |>
  step_measure_smooth_ma(window = 5) |>
  prep()

bake(rec, new_data = NULL)
#> # A tibble: 215 × 5
#>       id water   fat protein .measures
#>    <int> <dbl> <dbl>   <dbl>    <meas>
#>  1     1  60.5  22.5    16.7 [100 × 2]
#>  2     2  46    40.1    13.5 [100 × 2]
#>  3     3  71     8.4    20.5 [100 × 2]
#>  4     4  72.8   5.9    20.7 [100 × 2]
#>  5     5  58.3  25.5    15.5 [100 × 2]
#>  6     6  44    42.7    13.7 [100 × 2]
#>  7     7  44    42.7    13.7 [100 × 2]
#>  8     8  69.3  10.6    19.3 [100 × 2]
#>  9     9  61.4  19.9    17.7 [100 × 2]
#> 10    10  61.4  19.9    17.7 [100 × 2]
#> # ℹ 205 more rows