step_measure_bin() creates a specification of a recipe step that
reduces a spectrum to fewer points by averaging within bins.
Arguments
- recipe
A recipe object.
- n_bins
Number of bins (mutually exclusive with
bin_width).- bin_width
Width of each bin in location units (mutually exclusive with
n_bins).- method
Aggregation method:
"mean"(default),"sum","median", or"max".- measures
An optional character vector of measure column names.
- role
Not used (modifies existing data).
- trained
Logical indicating if the step has been trained.
- bin_breaks
The computed bin breaks (after training).
- skip
Logical. Should the step be skipped when baking?
- id
Unique step identifier.
Details
This step reduces the number of points in each spectrum by dividing the
x-axis into bins and aggregating values within each bin. The result
replaces the .measures column with the binned data.
This is useful for:
Reducing data dimensionality
Decreasing noise through averaging
Speeding up downstream processing
Aligning data from different resolutions
The bin boundaries are determined during prep() from the training data
and stored for consistent application to new data.
See also
Other measure-features:
step_measure_integrals(),
step_measure_moments(),
step_measure_ratios()
Examples
library(recipes)
# Bin to 20 points
rec <- recipe(water + fat + protein ~ ., data = meats_long) |>
update_role(id, new_role = "id") |>
step_measure_input_long(transmittance, location = vars(channel)) |>
step_measure_bin(n_bins = 20) |>
prep()
bake(rec, new_data = NULL)
#> # A tibble: 215 × 5
#> id water fat protein .measures
#> <int> <dbl> <dbl> <dbl> <meas>
#> 1 1 60.5 22.5 16.7 [20 × 2]
#> 2 2 46 40.1 13.5 [20 × 2]
#> 3 3 71 8.4 20.5 [20 × 2]
#> 4 4 72.8 5.9 20.7 [20 × 2]
#> 5 5 58.3 25.5 15.5 [20 × 2]
#> 6 6 44 42.7 13.7 [20 × 2]
#> 7 7 44 42.7 13.7 [20 × 2]
#> 8 8 69.3 10.6 19.3 [20 × 2]
#> 9 9 61.4 19.9 17.7 [20 × 2]
#> 10 10 61.4 19.9 17.7 [20 × 2]
#> # ℹ 205 more rows