AIMeetup #2: Jak dzięki Data Mining księgujemy automatycznie koszty w Infakt.pl?
AIMeetup #3: Uczenie maszynowe - rocket science czy chleb powszedni?
-
Upload
2040io -
Category
Technology
-
view
44 -
download
2
Transcript of AIMeetup #3: Uczenie maszynowe - rocket science czy chleb powszedni?
Understand
Business & DataRead and explore data
Feature EngineeringCreate a new ones based on already exists
Feature SelectionSelect only useful features
Model SelectionFind the best model(s) model
Amodel
Bmodel
Cmodel
Dmodel
E
Tuning
HyperparametersFind the best hyperparameters for given model
Ensemble ModelingCombine few models into one more better
x0.6 x0.4+
model B
model E
datetime season temp count
2011-01-01 08:32:02 1 9.23 5
2012-04-02 12:10:00 2 18.78 32
2012-08-07 15:47:01 3 15.45 15
datetime season temp hour day month … count count_log
2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
model B
model E
datetime season temp hour day month … count count_log
2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
Understand
Business & DataRead and explore data
Feature EngineeringCreate a new ones based on already exists
Feature SelectionSelect only useful features
Model SelectionFind the best model(s) model
Amodel
Bmodel
Cmodel
Dmodel
E
Tuning
HyperparametersFind the best hyperparameters for given model
Ensemble ModelingCombine few models into one more better
x0.6 x0.4+
model B
model E
datetime season temp count
2011-01-01 08:32:02 1 9.23 5
2012-04-02 12:10:00 2 18.78 32
2012-08-07 15:47:01 3 15.45 15
datetime season temp hour day month … count count_log
2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
model B
model E
datetime season temp hour day month … count count_log
2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
Understand
Business & DataRead and explore data
Feature EngineeringCreate a new ones based on already exists
Feature SelectionSelect only useful features
Model SelectionFind the best model(s) model
Amodel
Bmodel
Cmodel
Dmodel
E
Tuning
HyperparametersFind the best hyperparameters for given model
Ensemble ModelingCombine few models into one more better
x0.6 x0.4+
model B
model E
datetime season temp count
2011-01-01 08:32:02 1 9.23 5
2012-04-02 12:10:00 2 18.78 32
2012-08-07 15:47:01 3 15.45 15
datetime season temp hour day month … count count_log
2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
model B
model E
datetime season temp hour day month … count count_log
2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
Wytworzenie cech(feature engineering)
• ilościowe => od 1 do 10, 11 do 20…
• daty => dzień, miesiąc, rok, godzina, czy weekend…
• kategorii/jakościowe (czerwony, zielony, biały)
• przypisać identyfikator liczbowy (1, 2, 3)
• stworzyć n-kolumn binarnych (jest czerwony? itd)
• prawdopodobieństwa ze zmienną docelową
Understand
Business & DataRead and explore data
Feature EngineeringCreate a new ones based on already exists
Feature SelectionSelect only useful features
Model SelectionFind the best model(s) model
Amodel
Bmodel
Cmodel
Dmodel
E
Tuning
HyperparametersFind the best hyperparameters for given model
Ensemble ModelingCombine few models into one more better
x0.6 x0.4+
model B
model E
datetime season temp count
2011-01-01 08:32:02 1 9.23 5
2012-04-02 12:10:00 2 18.78 32
2012-08-07 15:47:01 3 15.45 15
datetime season temp hour day month … count count_log
2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
model B
model E
datetime season temp hour day month … count count_log
2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
Selekcja cech(feature selection)
• Czym mniej tym lepiej (prostszy model)
• Zostawić najbardziej wartościowe (idealnie jedna :)
• Cechy (zazwyczaj) są zależny, więc trzeba uważać… (sprawdzać empirycznie)
• Szybciej
Understand
Business & DataRead and explore data
Feature EngineeringCreate a new ones based on already exists
Feature SelectionSelect only useful features
Model SelectionFind the best model(s) model
Amodel
Bmodel
Cmodel
Dmodel
E
Tuning
HyperparametersFind the best hyperparameters for given model
Ensemble ModelingCombine few models into one more better
x0.6 x0.4+
model B
model E
datetime season temp count
2011-01-01 08:32:02 1 9.23 5
2012-04-02 12:10:00 2 18.78 32
2012-08-07 15:47:01 3 15.45 15
datetime season temp hour day month … count count_log
2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
model B
model E
datetime season temp hour day month … count count_log
2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
Dobór Modelu(model selection)
• Linear
• Decision Tree
• Random Forest
• Gradient Boosting
• Neural Network
Ensemble trees
• Bagging (bootstrap aggregation)
• Random Forest
• Extra Trees
• Boosting
• Gradient Boosting
Understand
Business & DataRead and explore data
Feature EngineeringCreate a new ones based on already exists
Feature SelectionSelect only useful features
Model SelectionFind the best model(s) model
Amodel
Bmodel
Cmodel
Dmodel
E
Tuning
HyperparametersFind the best hyperparameters for given model
Ensemble ModelingCombine few models into one more better
x0.6 x0.4+
model B
model E
datetime season temp count
2011-01-01 08:32:02 1 9.23 5
2012-04-02 12:10:00 2 18.78 32
2012-08-07 15:47:01 3 15.45 15
datetime season temp hour day month … count count_log
2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
model B
model E
datetime season temp hour day month … count count_log
2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
Understand
Business & DataRead and explore data
Feature EngineeringCreate a new ones based on already exists
Feature SelectionSelect only useful features
Model SelectionFind the best model(s) model
Amodel
Bmodel
Cmodel
Dmodel
E
Tuning
HyperparametersFind the best hyperparameters for given model
Ensemble ModelingCombine few models into one more better
x0.6 x0.4+
model B
model E
datetime season temp count
2011-01-01 08:32:02 1 9.23 5
2012-04-02 12:10:00 2 18.78 32
2012-08-07 15:47:01 3 15.45 15
datetime season temp hour day month … count count_log
2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
model B
model E
datetime season temp hour day month … count count_log
2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
Sprawdzian krzyżowy(cross-validation)
http://blog.goldenhelix.com/bchristensen/cross-validation-for-genomic-prediction-in-svs/