‘TARGET’ | ‘FEATURES’ | ||||
---|---|---|---|---|---|
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g |
Adelie | Torgersen | 39.1 | 18.7 | 181 | 3750 |
Gentoo | Biscoe | 46.1 | 13.2 | 211 | 4500 |
Chinstrap | Dream | 46.5 | 17.9 | 192 | 3500 |
Scientific seminar 05.12.24
‘TARGET’ | ‘FEATURES’ | ||||
---|---|---|---|---|---|
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g |
Adelie | Torgersen | 39.1 | 18.7 | 181 | 3750 |
Gentoo | Biscoe | 46.1 | 13.2 | 211 | 4500 |
Chinstrap | Dream | 46.5 | 17.9 | 192 | 3500 |
(fig from app.datacamp.com/learn/courses/understanding-machine-learning)
Supervised learning
Unsupervised learning
Reinforcement learning (AlphaGo) is the third class, although outside our scope.
(fig from app.datacamp.com/learn/courses/understanding-machine-learning)
Random forest applied to penguin data
fig by Jeremybeauchamp @wikipedia
Rows: 344
Columns: 7
$ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
$ island <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
$ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
$ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
$ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
$ body_mass_g <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
$ sex <fct> male, female, female, NA, female, male, female, male…
(fig from app.datacamp.com/learn/courses/understanding-machine-learning
Rows: 344
Columns: 7
$ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
$ island <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
$ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
$ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
$ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
$ body_mass_g <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
$ sex <fct> male, female, female, NA, female, male, female, male…
Set up model and tune it: run it using different parameters and assess perfomance.
tune_spec <- rand_forest(mtry = tune(), trees = 1000, min_n = tune() ) %>% set_mode("classification") %>% set_engine("ranger")
rf_recipe <- recipe(species ~ ., data = penguin_train)
tune_wf <- workflow() %>% add_recipe(rf_recipe) %>% add_model(tune_spec)
doParallel::registerDoParallel(cores = 14)
tune_res <- tune_grid(tune_wf, resamples = penguin_boot, grid = 200)
tune_res %>% autoplot(metric = "accuracy")
We select the best performing model and apply it.
We can also get the variable importance:
We have 96.61% accuracy. Not bad!
Ok. Great!
Accuracy = 82.4324324 %
# A tibble: 11 × 2
Variable Importance
<chr> <dbl>
1 pp_dde 0.127
2 pp_ddd 0.106
3 pcb_180 0.105
4 sum_ddt 0.103
5 pcb_153 0.101
6 pcb_138 0.0390
7 hcb 0.0363
8 pcb_118 0.0350
9 sum_6pcb 0.0298
10 sum_7pcb 0.0260
11 sum_hch 0.0207
push code and data to github: https://github.com/arebruvold/FRES_machine_learning