step_measure_output_wide creates a specification of a recipe
step that converts measures to multiple columns (i.e., "wide" format).
Usage
step_measure_output_wide(
recipe,
prefix = "measure_",
measures = NULL,
role = "predictor",
trained = FALSE,
skip = FALSE,
id = rand_id("measure_output_wide")
)Arguments
- recipe
A recipe object. The step will be added to the sequence of operations for this recipe.
- prefix
A character string used to name the new columns.
- measures
An optional single character string specifying which measure column to output. If
NULL(the default) and only one measure column exists, that column will be used. If multiple measure columns exist andmeasuresisNULL, an error will be thrown prompting you to specify which column to output.- role
Not used by this step since no new variables are created.
- trained
A logical to indicate if the quantities for preprocessing have been estimated.
- skip
A logical. Should the step be skipped when the recipe is baked by
bake()? While all operations are baked whenprep()is run, some operations may not be able to be conducted on new data (e.g. processing the outcome variable(s)). Care should be taken when usingskip = TRUEas it may affect the computations for subsequent operations.- id
A character string that is unique to this step to identify it.
Details
This step is designed convert analytical measurements from their internal data structure to separate columns.
Wide outputs can be helpful when you want to use standard recipes steps with
the measuresments, such as recipes::step_pca(), recipes::step_pls(), and
so on.
See also
Other input/output steps:
step_measure_input_long(),
step_measure_input_wide(),
step_measure_output_long()
Examples
library(dplyr)
data(glucose_bioreactors)
bioreactors_small$batch_sample <- NULL
small_tr <- bioreactors_small[1:200, ]
small_te <- bioreactors_small[201:210, ]
small_rec <-
recipe(glucose ~ ., data = small_tr) |>
update_role(batch_id, day, new_role = "id columns") |>
step_measure_input_wide(`400`:`3050`) |>
prep()
# Before reformatting:
small_rec |> bake(new_data = small_te)
#> # A tibble: 10 × 4
#> batch_id day glucose .measures
#> <chr> <int> <dbl> <meas>
#> 1 S_15 5 2.04 [2,651 × 2]
#> 2 S_15 6 5.56 [2,651 × 2]
#> 3 S_15 7 4.65 [2,651 × 2]
#> 4 S_15 8 9.91 [2,651 × 2]
#> 5 S_15 9 4.96 [2,651 × 2]
#> 6 S_15 10 6.78 [2,651 × 2]
#> 7 S_15 11 6.72 [2,651 × 2]
#> 8 S_15 12 4.69 [2,651 × 2]
#> 9 S_15 13 6.30 [2,651 × 2]
#> 10 S_15 14 3.10 [2,651 × 2]
# After reformatting:
output_rec <-
small_rec |>
step_measure_output_wide() |>
prep()
output_rec |> bake(new_data = small_te)
#> # A tibble: 10 × 2,654
#> batch_id day glucose measure_0001 measure_0002 measure_0003 measure_0004
#> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 S_15 5 2.04 760094. 779053. 797154. 817226.
#> 2 S_15 6 5.56 854812. 874105. 893653. 912190.
#> 3 S_15 7 4.65 961786. 979449. 1000054. 1019400.
#> 4 S_15 8 9.91 1092681. 1110867. 1132371. 1151944.
#> 5 S_15 9 4.96 1243318. 1259961. 1281630. 1301011.
#> 6 S_15 10 6.78 1500414. 1517867. 1536436. 1557016.
#> 7 S_15 11 6.72 1818328. 1836446. 1856983. 1876906.
#> 8 S_15 12 4.69 2035750. 2053903. 2073901. 2093514.
#> 9 S_15 13 6.30 2192191. 2211502. 2230969. 2251089.
#> 10 S_15 14 3.10 2353638. 2371877. 2392865. 2412822.
#> # ℹ 2,647 more variables: measure_0005 <dbl>, measure_0006 <dbl>,
#> # measure_0007 <dbl>, measure_0008 <dbl>, measure_0009 <dbl>,
#> # measure_0010 <dbl>, measure_0011 <dbl>, measure_0012 <dbl>,
#> # measure_0013 <dbl>, measure_0014 <dbl>, measure_0015 <dbl>,
#> # measure_0016 <dbl>, measure_0017 <dbl>, measure_0018 <dbl>,
#> # measure_0019 <dbl>, measure_0020 <dbl>, measure_0021 <dbl>,
#> # measure_0022 <dbl>, measure_0023 <dbl>, measure_0024 <dbl>, …