Skip to contents
library(tidymodels)
#> ── Attaching packages ────────────────────────────────────── tidymodels 1.0.0 ──
#>  broom        1.0.2       recipes      1.0.3 
#>  dials        1.1.0       rsample      1.1.1 
#>  dplyr        1.0.10      tibble       3.1.8 
#>  ggplot2      3.4.0       tidyr        1.2.1 
#>  infer        1.0.4       tune         1.0.1 
#>  modeldata    1.0.1       workflows    1.1.2 
#>  parsnip      1.0.3       workflowsets 1.0.0 
#>  purrr        1.0.0       yardstick    1.1.0
#> ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
#>  purrr::discard() masks scales::discard()
#>  dplyr::filter()  masks stats::filter()
#>  dplyr::lag()     masks stats::lag()
#>  recipes::step()  masks stats::step()
#>  Use tidymodels_prefer() to resolve common conflicts.
library(measure)
data("credit_data")

set.seed(55)
train_test_split <- initial_split(credit_data)

credit_train <- training(train_test_split)
credit_test <- testing(train_test_split)

Creating a Recipe

We specify a recipe providing formula and data arguments. Check out Tidy Modeling with R to learn more about specifying formulas in R.

rec_obj <- recipe(Status ~ ., data = credit_train)

The recipe funtion returns a recipe object. The formula argument determines the roles of each variables. Status is assigned the role of outcome, and the 13 other variables are assigned to role of predictor.

rec_obj
#> Recipe
#> 
#> Inputs:
#> 
#>       role #variables
#>    outcome          1
#>  predictor         13

Diving a bit deeper, the recipe object is a list with 7 elements. Within these elements, we can see more details are saved about our variables. This includes the type and source stored in rec_obj$var_info.

cat(names(rec_obj), sep = "\n")
#> var_info
#> term_info
#> steps
#> template
#> levels
#> retained
#> requirements
rec_obj$var_info
#> # A tibble: 14 × 4
#>    variable  type      role      source  
#>    <chr>     <list>    <chr>     <chr>   
#>  1 Seniority <chr [2]> predictor original
#>  2 Home      <chr [3]> predictor original
#>  3 Time      <chr [2]> predictor original
#>  4 Age       <chr [2]> predictor original
#>  5 Marital   <chr [3]> predictor original
#>  6 Records   <chr [3]> predictor original
#>  7 Job       <chr [3]> predictor original
#>  8 Expenses  <chr [2]> predictor original
#>  9 Income    <chr [2]> predictor original
#> 10 Assets    <chr [2]> predictor original
#> 11 Debt      <chr [2]> predictor original
#> 12 Amount    <chr [2]> predictor original
#> 13 Price     <chr [2]> predictor original
#> 14 Status    <chr [3]> outcome   original

Adding a Step

The recipe does not yet contain any steps.

rec_obj$steps
#> NULL

rec_obj_add_step <- rec_obj |> 
  step_impute_knn(all_predictors())

rec_obj_add_step$steps
#> [[1]]
#> K-nearest neighbor imputation for all_predictors()