library(tidymodels)
#> ── Attaching packages ────────────────────────────────────── tidymodels 1.0.0 ──
#> ✔ broom 1.0.2 ✔ recipes 1.0.3
#> ✔ dials 1.1.0 ✔ rsample 1.1.1
#> ✔ dplyr 1.0.10 ✔ tibble 3.1.8
#> ✔ ggplot2 3.4.0 ✔ tidyr 1.2.1
#> ✔ infer 1.0.4 ✔ tune 1.0.1
#> ✔ modeldata 1.0.1 ✔ workflows 1.1.2
#> ✔ parsnip 1.0.3 ✔ workflowsets 1.0.0
#> ✔ purrr 1.0.0 ✔ yardstick 1.1.0
#> ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
#> ✖ purrr::discard() masks scales::discard()
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag() masks stats::lag()
#> ✖ recipes::step() masks stats::step()
#> • Use tidymodels_prefer() to resolve common conflicts.
library(measure)
data("credit_data")
set.seed(55)
train_test_split <- initial_split(credit_data)
credit_train <- training(train_test_split)
credit_test <- testing(train_test_split)
Creating a Recipe
We specify a recipe providing formula and data arguments. Check out Tidy Modeling with R to learn more about specifying formulas in R.
rec_obj <- recipe(Status ~ ., data = credit_train)
The recipe
funtion returns a recipe object. The formula
argument determines the roles of each variables. Status
is
assigned the role of outcome
, and the 13 other variables
are assigned to role of predictor
.
rec_obj
#> Recipe
#>
#> Inputs:
#>
#> role #variables
#> outcome 1
#> predictor 13
Diving a bit deeper, the recipe object is a list with 7 elements.
Within these elements, we can see more details are saved about our
variables. This includes the type
and source
stored in rec_obj$var_info
.
cat(names(rec_obj), sep = "\n")
#> var_info
#> term_info
#> steps
#> template
#> levels
#> retained
#> requirements
rec_obj$var_info
#> # A tibble: 14 × 4
#> variable type role source
#> <chr> <list> <chr> <chr>
#> 1 Seniority <chr [2]> predictor original
#> 2 Home <chr [3]> predictor original
#> 3 Time <chr [2]> predictor original
#> 4 Age <chr [2]> predictor original
#> 5 Marital <chr [3]> predictor original
#> 6 Records <chr [3]> predictor original
#> 7 Job <chr [3]> predictor original
#> 8 Expenses <chr [2]> predictor original
#> 9 Income <chr [2]> predictor original
#> 10 Assets <chr [2]> predictor original
#> 11 Debt <chr [2]> predictor original
#> 12 Amount <chr [2]> predictor original
#> 13 Price <chr [2]> predictor original
#> 14 Status <chr [3]> outcome original
Adding a Step
The recipe does not yet contain any steps.
rec_obj$steps
#> NULL
rec_obj_add_step <- rec_obj |>
step_impute_knn(all_predictors())
rec_obj_add_step$steps
#> [[1]]
#> K-nearest neighbor imputation for all_predictors()