Skip to contents

step_measure_smooth_median() creates a specification of a recipe step that applies median filter smoothing. This is a robust method that is particularly effective at removing spike noise while preserving edges.

Usage

step_measure_smooth_median(
  recipe,
  measures = NULL,
  window = 5L,
  edge_method = c("reflect", "constant", "NA"),
  role = NA,
  trained = FALSE,
  skip = FALSE,
  id = recipes::rand_id("measure_smooth_median")
)

Arguments

recipe

A recipe object.

measures

An optional character vector of measure column names.

window

The window size for the moving average. Must be an odd integer of at least 3. Default is 5. Larger values produce more smoothing. Tunable via smooth_window().

edge_method

How to handle edges where the full window doesn't fit. One of "reflect" (default, reflects values at boundaries), "constant" (pads with edge values), or "NA" (returns NA for edge values).

role

Not used.

trained

Logical indicating if the step has been trained.

skip

Logical. Should the step be skipped when baking?

id

Unique step identifier.

Value

An updated recipe with the new step added.

Details

Median filtering replaces each point with the median of its neighbors within a sliding window. Unlike moving average, median filtering is robust to outliers and spikes, making it ideal for:

  • Removing cosmic ray spikes in Raman spectroscopy

  • Cleaning detector artifacts

  • Preserving sharp edges while removing noise

Examples

library(recipes)

rec <- recipe(water + fat + protein ~ ., data = meats_long) |>
  update_role(id, new_role = "id") |>
  step_measure_input_long(transmittance, location = vars(channel)) |>
  step_measure_smooth_median(window = 5) |>
  prep()

bake(rec, new_data = NULL)
#> # A tibble: 215 × 6
#>       id water   fat protein .measures channel    
#>    <int> <dbl> <dbl>   <dbl>    <meas> <list>     
#>  1     1  60.5  22.5    16.7 [100 × 2] <int [100]>
#>  2     2  46    40.1    13.5 [100 × 2] <int [100]>
#>  3     3  71     8.4    20.5 [100 × 2] <int [100]>
#>  4     4  72.8   5.9    20.7 [100 × 2] <int [100]>
#>  5     5  58.3  25.5    15.5 [100 × 2] <int [100]>
#>  6     6  44    42.7    13.7 [100 × 2] <int [100]>
#>  7     7  44    42.7    13.7 [100 × 2] <int [100]>
#>  8     8  69.3  10.6    19.3 [100 × 2] <int [100]>
#>  9     9  61.4  19.9    17.7 [100 × 2] <int [100]>
#> 10    10  61.4  19.9    17.7 [100 × 2] <int [100]>
#> # ℹ 205 more rows