Krzysztof Karpio Piotr Łukasiewicz Arkadiusz Orłowski Tomasz Ząbkowski Warsaw University of Life...
-
Upload
imogen-harrell -
Category
Documents
-
view
225 -
download
6
Transcript of Krzysztof Karpio Piotr Łukasiewicz Arkadiusz Orłowski Tomasz Ząbkowski Warsaw University of Life...
Discovering and analyzing income determinants using decision trees.
Krzysztof KarpioPiotr ukasiewiczArkadiusz OrowskiTomasz ZbkowskiWarsaw University of Life Sciences - SGGW1Discovering and analyzing income determinants using decision trees1
Data Households incomesPolandYears: 2000 2010Budety Gospodarstw Domowych - GUSAbout 36 000 households in each yearHousehold income / Number of earnersReal income (based on prices in 2008).Warsaw University of Life Sciences - SGGW2Atrybut decyzyjny2Conditional atributesWarsaw University of Life Sciences - SGGW3Sex of a family headEducation of a family headAge of a family headEconomic group of a householdFamily typeNumber of persons in a householdNumber of childrenNumber of earnersClass of place of residenceVoivodeship
FEMALE MALEMean: 17.3 20.4 kPLN
VILLAGE CITYMean: 16.6 26.3 kPLN
PODKARPACKIE MAZOWIECKIEMean: 14.7 23.6 kPLNIncomes 2008Warsaw University of Life Sciences - SGGW4
8 kPLN16 kPLN45 kPLNMODERATE48%HIGH 5%AVERAGE40%LOW 7%MethodWarsaw University of Life Sciences - SGGW5
Rudolf Clausius (1822 1888)Decision tree
Entropy
Gain
Rudolf Clausius (1822 1888)
5
Attributes tree 2008Warsaw University of Life Sciences - SGGW6at least a secondarymarriedcouplepensionersTree nodes and leavesAttributes 2000 - 2010Warsaw University of Life Sciences - SGGW7EducationFamily typeEconomic groupNumber of earnersClass of place of residence
EducationFamily typeEconomic groupNumber of earnersClass of place of residenceNodes&LeavesLowAverageModerateHigh07%40%48%5%13%26%63%8%1.11%12%71%16%1.24%33%59%4%1.2.12%24%67%6%1.2.26%41%51%2%212%52%34%2%2.17%45%46%3%2.1.16%33%56%5%2.1.27%68%25%0%2.215%58%26%1%2.2.132%38%27%4%2.2.212%62%25%0%Nodes&LeavesLowAverageModerateHigh07%40%48%5%1.11%12%71%16%1.2.12%24%67%6%1.2.26%41%51%2%2.1.16%33%56%5%2.1.27%68%25%0%2.2.132%38%27%4%2.2.212%62%25%0%Not relevant 2000 - 2010Sex of a family headAgeNumber of personsNumber of childrenVoivodeship.Information GainWarsaw University of Life Sciences - SGGW8
GAIN0,012-classes (high income)Warsaw University of Life Sciences - SGGW9
2-classes (low income)Warsaw University of Life Sciences - SGGW10
ECONOGROUP: 10Efficiency of treesHigh incomeLow incomeWarsaw University of Life Sciences - SGGW11
SummaryWarsaw University of Life Sciences - SGGW12The most important attribute: EducationHigher Education (BA & MA) preferedImportant attributes: Education,Family Type (marriage), Economic Group (pensioners), Resindence (big cities), Number of Earners (1 or 2)Evolution of attributes (2000-2010)Education - stable, the most importantNumer of Earners decreasing importanceEconomic Group increasing importanceFamily Type the weakest but noticable importanceLack of relevance of: Sex, Age, Voivodeshipto be continued ..
Thank YouWarsaw University of Life Sciences - SGGW13REFERENCESQuinlan, J. R. C4.5: Programs for Machine Learning, Morgan Kaufmann, (1993) Los AltosKemal Polat , Salih Gunes, A novel hybrid intelligent method based on C4.5 decision treeclassifier and one-against-all approach for multi-class classification problems, Expert Systems with Applications 36 (2009) 1587.THANK YOU