Tidymodels
language
data science
Tidymodels is a collection of packages for machine learning. It is part of the tidyverse.
load packages:
library(tidymodels)
Rsample (sampling & splitting)
- training- & test-split
-
<- initial_split(dataset, prop=0.75, strata=target_col) data_split <- data_split %>% training() training_set <- data_split %>% testing() testing_set
strata
generates a stratified split with approximately the same fraction of each target class.
Recipes (feature enigneering)
Parsnip (model fitting)
- General model formula
-
~ predictor_1 + predictor_2 + ... outcome_variable ~ . # to use all available predictors outcome_variable
- Create model object
-
<- linear_reg() %>% model set_engine('lm') %>% set_mode('regression')
- Fit the model
-
<- model %>% model_fit fit(target_col ~ ., data=train_set)
- Get model summary
-
Returns parameter estimates, std.errors & p-values
tidy(model_fit)
- Predict on new values
-
<- model_fit %>% predictions predict(new_data = test_set)
Tune & Dials (hyper-)parameter optimization)
Yardstick (performance evaluation)
Requires a tibble/datset with the true and predicted outcomes.
%>%
predictions rmse(truth = label_col, estimate = .pred) # .pred is the standard name of the prediction col
Regression quality metrics | |
---|---|
R squared | prediction_set %>% rsq(truth = ..., estimate = ...) |
Root mean squared error | ... rmse() |
Classification quality metrics | |
---|---|
accuracy | prediction_set %>% accuracy(truth = ..., estimate = ...) |
balanced accuracy | ... bal_accuracy() |
precision | ... precision() |
recall | ... recall() |
sensitivity | ... sensitivity() |
specificity | ... specificity() |
area under the curve | ... roc_auc() |
More metrics here.
- Streamlined approach
-
# Fit model: <- model %>% model_last_fit last_fit(target_col ~ ., split = data_split) # Return standard quality metrics: %>% model_last_fit collect_metrics() # Return tibble with predictions and true target values: %>% model_last_fit collect_predictions()