Wielokryterialna ocena atrakcyjności reguł decyzyjnych i ...
Budowa reguł decyzyjnych z rozmytą granulacją wiedzy
description
Transcript of Budowa reguł decyzyjnych z rozmytą granulacją wiedzy
Budowa reguł decyzyjnych z rozmytą granulacją wiedzy
Zenon A. SosnowskiWydział Informatyki
Politechnika BiałostockaWiejska 45A, 15-351 Bialystok
Agenda
• wprowadzenie
• drzewa decyzyjne (DT)
• zbiory rozmyte w granulacji atrybutów
• algorytm generowania kontekstowych DT
• przykład
• wnioski
Rozmyta sieć RETE
The inference mechanism realizes a generalized modus ponens rule.
if A then C CFr A' CFf----------------------
C' CFc
CFr is an uncertainty of the rule CFf is an uncertainty of the fact
CFc is an uncertainty of the conclusion
CFc = CFr * CFf
Fuzzy_Fuzzy
R F Fa c
F F Rc a' '
(defrule r1 (speed very fast) => ( . . . ))(defrule r2 (speed slow) => ( . . . ))
SINGLE(LV speed)
MULTIFIELD
End of pattern
activation rule r2
M.(slow)(attached)
M.(very fast)(attached)
(speed medium) - WME
Decicion Trees – An Overview
• used to solve classification problems• structure of problem
- attributes- each attribute assumes a finite number values- finite number of discrete classes
• entropy-based optimization criterion
• architecture of decision tree: nodes – attributes, edges – values of attributes
Coping with Continuous Attributes
Decision trees require finite-valued attributesDecision trees require finite-valued attributes
What if attributes are continuous ?What if attributes are continuous ?
Attributes need to be discretrized
Options:
- discretize each attribute separately
(uniform and nonuniform)- discretize all attributes (clustering)
Quantization of attributes through clustering
• Fuzzy Clustering
• Context-based fuzzy clustering
Fuzzy Clustering (FCM) versus Context-Based FCM (cFCM)
Fuzzy clustering: objective function and its iteraive optimization
Context-base fuzzy clustering:
- objective function minimized iteratively
- continuous classification variable granulated with the use of linguistic labels
Context-Based Fuzzy Clustering
Given: data {xk,yk}, k=1,2,…,N, number of clusters (c), distance function ||.||, fuzzy set of context A defined over yk
Constrained-based optimization of objective function
subject to
Q i1
c
u ikm
k1
N
|| xk v i ||2
k
c
iik fyAu k
)(1
From context fuzzy set A to the labeling of data to be clustered
fk =A(yk)
data set
Fuzzy set of context(A)
yk
Context-Based Fuzzy Clustering:An Iterative Optimization Process
Given: The number of clusters (c). Select the distance function ||.||, termination criterion e (>0) and initialize partition matrix U U. Select the value of the fuzzification parameter “m” (the default is m=2.0)
1. Calculate centers (prototypes) of the clusters
i=1, 2, ..., c2. Update partition matrix
i=1, 2, ..., c, j=1, 2, ..., N3. Compare U' to U, if termination criterion ||U’ - U|| <e is satisfied then stop, else return to step (1)
and proceed with computing by setting up U equal to U'
Result: partition matrix and prototypes
vi uik
mx kk1
N
uik
m
k1
N
u ik fk
|| xk vi |||| xk v j ||
2m 1
j1
c
Information Granules in the Development of Decision Trees
• define contexts (fuzzy sets) for continuous classivication variable
• cluster data for each context
• project prototypes on the individual axes – this leads to their discretization
• carry out the standard ID-3 algorithm
W. Pedrycz, Z.A. Sosnowski, „The designing of decision trees in the framework of granular data and their application to software quality models”, Fuzzy Sets & Sysytems, vol. 124, (2001), p. 271-290
Fuzzy Sets of Contexts: Two Approaches
• subjective selection depending on the classification problem
• supported by statistical relevance (σ-count of fuzzy contexts)
(A) A(yk )k1
N
Constructing linguistic terms – classes (thin line) and their induced interval-
valued counterparts (solid line)
intersection context variable
membership Fuzzy setsinduced sets
C - Fuzzy Decision Trees
ALL DATA
V1
Vj
W. Pedrycz, Z.A. Sosnowski, „C-Fuzzy Decision Trees”, IEEE Transactions on Systems, Man and Cybernetics, Part C, Vol. 35, No 4, 2005, p. 498-511.
Architecture of the cluster-based decision tree
• cluster all data set X
• repeat• allocate elements of X to each cluster• choose the node with the highest value of
the spliting criterion • cluster data at selected node
until termination criterion is fulfield
Node splitting criterion
Node of the tree Ni = <Xi, Yi, Ui>where:
Xi = { x(k) | ui(x(k)) > uj(x(k))}
Yi = {y(k)| x(k) ε Xi}
Ui = [ui(x(1)) ui(x(2)) … ui(x(N))]
ii YXx
YXx
x
x
y(k))(k),(i
y(k))(k),(i
i
(k))(u
(k))y(k)(u
m ii
ii YXx
xy(k))(k),(
2iii )m-(k))(y(k)(uV
Stopping criterion (structurability index)
c
1iik
ck uc1ψ
)uc(1N1
ψN1
ψc
1iik
cN
1kk
N
1k
C-fuzzy tree in the classification (prediction) mode
i0
j0
c
1j
1)2/(m
j
i
i
||||
||||
1)(u
vx
vxx
assign x to class wi if ui(x) exceeds the values of the membership in all remaining clusters
Experiments
Data sets from the UCI repository of Machine Learning Databases (http://www.ics.uci.edu)
• Auto-Mpg • Pima-diabetes • Ionosphere • Hepatitis • Dermatology
Hepatitis data
Type of tree and its structural parameters
Error: Training data
Error: Testing data Number of nodes
C4.5 rev. 8 6.46 % (average)0.85 % (st. deviation)
43.86 % (average)7.05 % (st. deviation)
45 (average)7.87 (st. deviation)
C-decision treec=2 clusters, 6 iterations
17.58 % (average)3.34 % (st. deviation)
36.13 % (average)0.08 % (st. deviation)
12
C-decision treec=9 clusters, 3 iterations
24.84 % (average)5.21 % (st. deviation)
34.19 % (average)3.68 % (st. deviation)
27
Dermatology data
Type of tree and its structural parameters
Error: Training data
Error: Testing data Number of nodes
C4.5 rev. 8 1.52 % (average)0.61 % (st. deviation)
5,98% (average)3.50% (st. deviation)
18.6 (average)4.34 (st. deviation)
C-decision treeC=11 clusters, 1 iterations
7.0 % (average)1,68 % (st. deviation)
4.9 % (average)3.56 % (st. deviation)
11
C-decision treec=7 clusters, 1 iterations
6.1 % (average)1.15 % (st. deviation)
5.7 % (average)2.47 % (st. deviation)
7
Context-based Fuzzy Clustered-oriented Decision Trees (CFCDT)
1st context
c-th context
. . . . .
Architecture of the Context-based Fuzzy Clustered-oriented Decision Tree
define contexts (fuzzy sets) for classivication variable
for each context do
– cluster (cFCM) Xi (data set of i-th context)
– repeat
– allocate elements of Xi to each cluster
– choose the node with the highest value of the spliting criterion
– cluster (cFCM) data at selected node
until termination criterion is fulfield
enddo
Problem
Implementation issues:
• high complexity –> grid or cluster computing
• agregation -> testing of different appraches
Dziękuję za uwagęDziękuję za uwagę