step_measure_input_wide creates a specification of a recipe
step that converts measures organized in multiple columns into an internal
format used by the package.
Usage
step_measure_input_wide(
recipe,
...,
role = "measure",
trained = FALSE,
columns = NULL,
location_values = NULL,
col_name = ".measures",
skip = FALSE,
id = rand_id("measure_input_wide")
)Arguments
- recipe
A recipe object. The step will be added to the sequence of operations for this recipe.
- ...
One or more selector functions to choose variables for this step. See
selections()for more details.- role
Not used by this step since no new variables are created.
- trained
A logical to indicate if the quantities for preprocessing have been estimated.
- columns
A character string of the selected variable names. This field is a placeholder and will be populated once
recipes::prep()is used.- location_values
A numeric vector of values that specify the location of the measurements (e.g., wavelength etc.) in the same order as the variables selected by
.... If not specified, a sequence of integers (starting at 1L) is used.- col_name
A single character string specifying the name of the output column that will contain the measure data. Defaults to
".measures". Use different names when creating multiple measure columns (e.g.,".uv_spectrum"and".ms_spectrum").- skip
A logical. Should the step be skipped when the recipe is baked by
bake()? While all operations are baked whenprep()is run, some operations may not be able to be conducted on new data (e.g. processing the outcome variable(s)). Care should be taken when usingskip = TRUEas it may affect the computations for subsequent operations.- id
A character string that is unique to this step to identify it.
Details
This step is designed for data in a format where the analytical measurements are in separate columns.
step_measure_input_wide() will collect those data and put them into a
format used internally by this package. The data structure has a row for
each independent experimental unit and a nested tibble with that sample's
measure (measurement and location). It assumes that there are unique
combinations of the other columns in the data that define individual
patterns associated with the pattern. If this is not the case, the special
values might be inappropriately restructured.
The best advice is to have a column of any type that indicates the unique sample number for each measure. For example, if there are 20 rows in the input data set, the columns that are not analytically measurements show have no duplicate combinations in the 20 rows.
Tidying
When you tidy() this step, a tibble indicating which of
the original columns were used to reformat the data.
See also
Other input/output steps:
step_measure_input_long(),
step_measure_output_long(),
step_measure_output_wide()
Examples
data(meats, package = "modeldata")
# Outcome data is to the right
names(meats) |> tail(10)
#> [1] "x_094" "x_095" "x_096" "x_097" "x_098" "x_099" "x_100"
#> [8] "water" "fat" "protein"
# ------------------------------------------------------------------------------
# Ingest data without adding the location (i.e. wave number) for the spectra
rec <-
recipe(water + fat + protein ~ ., data = meats) |>
step_measure_input_wide(starts_with("x_")) |>
prep()
summary(rec)
#> # A tibble: 4 × 4
#> variable type role source
#> <chr> <list> <chr> <chr>
#> 1 water <chr [2]> outcome original
#> 2 fat <chr [2]> outcome original
#> 3 protein <chr [2]> outcome original
#> 4 .measures <chr [1]> measure derived
# ------------------------------------------------------------------------------
# Ingest data without adding the location (i.e. wave number) for the spectra
# Make up some locations for the spectra's x-axis
index <- seq(1, 2, length.out = 100)
rec <-
recipe(water + fat + protein ~ ., data = meats) |>
step_measure_input_wide(starts_with("x_"), location_values = index) |>
prep()
summary(rec)
#> # A tibble: 4 × 4
#> variable type role source
#> <chr> <list> <chr> <chr>
#> 1 water <chr [2]> outcome original
#> 2 fat <chr [2]> outcome original
#> 3 protein <chr [2]> outcome original
#> 4 .measures <chr [1]> measure derived