Tidymodels
R
data science
Tidymodels is a collection of packages for machine learning. It is part of the tidyverse.
load packages:
library(tidymodels)Rsample (sampling & splitting)
- training- & test-split
-
data_split <- initial_split(dataset, prop=0.75, strata=target_col) training_set <- data_split %>% training() testing_set <- data_split %>% testing()
strata generates a stratified split with approximately the same fraction of each target class.
Recipes (feature enigneering)
Parsnip (model fitting)
- General model formula
-
outcome_variable ~ predictor_1 + predictor_2 + ... outcome_variable ~ . # to use all available predictors - Create model object
-
model <- linear_reg() %>% set_engine('lm') %>% set_mode('regression') - Fit the model
-
model_fit <- model %>% fit(target_col ~ ., data=train_set) - Get model summary
-
Returns parameter estimates, std.errors & p-values
tidy(model_fit) - Predict on new values
-
predictions <- model_fit %>% predict(new_data = test_set)
Tune & Dials (hyper-)parameter optimization)
Yardstick (performance evaluation)
Requires a tibble/datset with the true and predicted outcomes.
predictions %>%
rmse(truth = label_col, estimate = .pred) # .pred is the standard name of the prediction col| Regression quality metrics | |
|---|---|
| R squared | prediction_set %>% rsq(truth = ..., estimate = ...) |
| Root mean squared error | ... rmse() |
| Classification quality metrics | |
|---|---|
| accuracy | prediction_set %>% accuracy(truth = ..., estimate = ...) |
| balanced accuracy | ... bal_accuracy() |
| precision | ... precision() |
| recall | ... recall() |
| sensitivity | ... sensitivity() |
| specificity | ... specificity() |
| area under the curve | ... roc_auc() |
More metrics here.
- Streamlined approach
-
# Fit model: model_last_fit <- model %>% last_fit(target_col ~ ., split = data_split) # Return standard quality metrics: model_last_fit %>% collect_metrics() # Return tibble with predictions and true target values: model_last_fit %>% collect_predictions()