library(tidymodels)
#> ── Attaching packages ────────────────────────────────────── tidymodels 1.0.0 ──
#> ✔ broom        1.0.2      ✔ recipes      1.0.3 
#> ✔ dials        1.1.0      ✔ rsample      1.1.1 
#> ✔ dplyr        1.0.10     ✔ tibble       3.1.8 
#> ✔ ggplot2      3.4.0      ✔ tidyr        1.2.1 
#> ✔ infer        1.0.4      ✔ tune         1.0.1 
#> ✔ modeldata    1.0.1      ✔ workflows    1.1.2 
#> ✔ parsnip      1.0.3      ✔ workflowsets 1.0.0 
#> ✔ purrr        1.0.0      ✔ yardstick    1.1.0
#> ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
#> ✖ purrr::discard() masks scales::discard()
#> ✖ dplyr::filter()  masks stats::filter()
#> ✖ dplyr::lag()     masks stats::lag()
#> ✖ recipes::step()  masks stats::step()
#> • Use tidymodels_prefer() to resolve common conflicts.
library(measure)
data("credit_data")
set.seed(55)
train_test_split <- initial_split(credit_data)
credit_train <- training(train_test_split)
credit_test <- testing(train_test_split)Creating a Recipe
We specify a recipe providing formula and data arguments. Check out Tidy Modeling with R to learn more about specifying formulas in R.
rec_obj <- recipe(Status ~ ., data = credit_train)The recipe funtion returns a recipe object. The formula
argument determines the roles of each variables. Status is
assigned the role of outcome, and the 13 other variables
are assigned to role of predictor.
rec_obj
#> Recipe
#> 
#> Inputs:
#> 
#>       role #variables
#>    outcome          1
#>  predictor         13Diving a bit deeper, the recipe object is a list with 7 elements.
Within these elements, we can see more details are saved about our
variables. This includes the type and source
stored in rec_obj$var_info.
cat(names(rec_obj), sep = "\n")
#> var_info
#> term_info
#> steps
#> template
#> levels
#> retained
#> requirements
rec_obj$var_info
#> # A tibble: 14 × 4
#>    variable  type      role      source  
#>    <chr>     <list>    <chr>     <chr>   
#>  1 Seniority <chr [2]> predictor original
#>  2 Home      <chr [3]> predictor original
#>  3 Time      <chr [2]> predictor original
#>  4 Age       <chr [2]> predictor original
#>  5 Marital   <chr [3]> predictor original
#>  6 Records   <chr [3]> predictor original
#>  7 Job       <chr [3]> predictor original
#>  8 Expenses  <chr [2]> predictor original
#>  9 Income    <chr [2]> predictor original
#> 10 Assets    <chr [2]> predictor original
#> 11 Debt      <chr [2]> predictor original
#> 12 Amount    <chr [2]> predictor original
#> 13 Price     <chr [2]> predictor original
#> 14 Status    <chr [3]> outcome   originalAdding a Step
The recipe does not yet contain any steps.
rec_obj$steps
#> NULL
rec_obj_add_step <- rec_obj |> 
  step_impute_knn(all_predictors())
rec_obj_add_step$steps
#> [[1]]
#> K-nearest neighbor imputation for all_predictors()