Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween...

49
Uplift modeling Szymon Jaroszewicz Institute of Computer Science Polish Academy of Sciences Warsaw, Poland National Institute of Telecommunications Warsaw, Poland Szymon Jaroszewicz Uplift modeling

Transcript of Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween...

Page 1: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

Uplift modeling

Szymon Jaroszewicz

Institute of Computer SciencePolish Academy of Sciences

Warsaw, Poland

National Institute of TelecommunicationsWarsaw, Poland

Szymon Jaroszewicz Uplift modeling

Page 2: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

Joint work with:

Piotr Rzepakowski

Maciej Jaskowski

Lukasz Zaniewicz

Micha l So ltys

Szymon Jaroszewicz Uplift modeling

Page 3: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

What is uplift modeling?

A typical marketing campaign

SamplePilot

campaignModel P(buy |X)

Selecttargets forcampaign

But this is not what we need!

We want people who bought because of the campaign

Not people who bought after the campaign

Szymon Jaroszewicz Uplift modeling

Page 4: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

What is uplift modeling?

A typical marketing campaign

SamplePilot

campaignModel P(buy |X)

Selecttargets forcampaign

But this is not what we need!

We want people who bought because of the campaign

Not people who bought after the campaign

Szymon Jaroszewicz Uplift modeling

Page 5: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

A typical marketing campaign

We can divide potential customers into four groups

1 Responded because of the action(the people we want)

2 Responded, but would have responded anyway(unnecessary costs)

3 Did not respond and the action had no impact(unnecessary costs)

4 Did not respond because the action had a(negative impact)

Szymon Jaroszewicz Uplift modeling

Page 6: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

Solution: Uplift modeling

Solution: Uplift modeling

Two training sets:1 the treatment group

on which the action was taken2 the control group

on which no action was takenused as background

Build a model which predicts the difference between classprobabilities in the treatment and control groups

Szymon Jaroszewicz Uplift modeling

Page 7: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

Difference with traditional classification

Notation:

PT probabilities in the treatment group

PC probabilities in the control group

Traditional models predict the conditional probability

PT (Y | X1, . . . ,Xm)

Uplift models predict change in behaviour resulting from the action

PT (Y | X1, . . . ,Xm)− PC (Y | X1, . . . ,Xm)

Szymon Jaroszewicz Uplift modeling

Page 8: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

Marketing campaign (uplift modeling approach)

Marketing campaign (uplift modeling approach)

Treatmentsample

Controlsample

Pilotcampaign

ModelPT (buy |X ) −

PC (buy |X )

Selecttargets forcampaign

Szymon Jaroszewicz Uplift modeling

Page 9: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

Applications in medicine

A typical medical trial:

treatment group: gets the treatmentcontrol group: gets placebo (or another treatment)do a statistical test to show that the treatment is better thanplacebo

With uplift modeling we can find out for whom the treatmentworks best

Personalized medicine

Szymon Jaroszewicz Uplift modeling

Page 10: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

Main difficulty of uplift modeling

The fundamental problem of causal inference

Our knowledge is always incomplete

For each training case we know either

what happened after the treatment, orwhat happened if no treatment was given

Never both!

This makes designing uplift algorithms challenging

... and the intuitions are often hard to grasp

Szymon Jaroszewicz Uplift modeling

Page 11: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

Evaluating uplift models

Szymon Jaroszewicz Uplift modeling

Page 12: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

Evaluating uplift models

We have two separate test sets:

a treatment test seta control test set

Problem

To assess the gain for a customer we need to know both treatmentand control responses, but only one of them is known

Solution

Assess gains for groups of customers

Szymon Jaroszewicz Uplift modeling

Page 13: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

Evaluating uplift classifiers

For example:

Gain for the 10% highest scoring customers =

% of successes for top 10% treated customers

− % of successes for top 10% control customers

Szymon Jaroszewicz Uplift modeling

Page 14: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

Uplift curves

Uplift curves are a more convenient tool:

Draw separate lift curves on treatment and control data(TPR on the Y axis is replaced with percentage of successes inthe whole population)Uplift curve =

lift curve on treatment data – lift curve on control data

Interpretation: net gain in success rate if a given percentage ofthe population is treated

We can of course compute the Area Under the Uplift Curve(AUUC)

Szymon Jaroszewicz Uplift modeling

Page 15: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

An uplift curve for marketing data

0 20 40 60 80 100percent targeted

0.01

0.00

0.01

0.02

0.03

0.04

0.05

cum

ulat

ive

gain

[% to

tal]

Hillstrom.Hillstrom_T1C1_visit_w

MultiP.Logit: area = 0.7599Zet.Logit: area = 0.7573T.Logit: area = 0.4872

Szymon Jaroszewicz Uplift modeling

Page 16: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

An uplift curve for a medical trial dataset comparing twocancer treatments

Szymon Jaroszewicz Uplift modeling

Page 17: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

Uplift modeling algorithms

Szymon Jaroszewicz Uplift modeling

Page 18: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

The two model approach

An obvious approach to uplift modeling:

1 Build a classifier MT modeling PT (Y |X) on the treatmentsample

2 Build a classifier MC modeling PC (Y |X) on the controlsample

3 The uplift model subtracts probabilities predicted by bothclassifiers

MU(Y |X) = MT (Y |X)−MC (Y |X)

Szymon Jaroszewicz Uplift modeling

Page 19: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

Two model approach

Advantages:

Works with existing classification models

Good probability predictions ⇒ good uplift prediction

Disadvantages:

Differences between class probabilities can follow a differentpattern than the probabilities themselves

each classifier focuses on changes in class probabilities butignores the weaker ‘uplift signal’algorithms designed to focus directly on uplift can give betterresults

Szymon Jaroszewicz Uplift modeling

Page 20: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

Decision trees for uplift modeling

Szymon Jaroszewicz Uplift modeling

Page 21: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

Hansotia, Rukstales 2002

The ∆∆P criterion

PT (Y = 1) = 5%PC (Y = 1) = 3%

∆P = 2%

PT (Y = 1) = 8%PC (Y = 1) = 3.5%

∆P = 4.5%

X < a

PT (Y = 1) = 3.7%PC (Y = 1) = 2.8%

∆P = 0.9%

X >= a

∆∆P = 3.6%

Pick a test with highest ∆∆P

Szymon Jaroszewicz Uplift modeling

Page 22: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

Hansotia, Rukstales 2002

It is not in line with modern decision tree learning such asC4.5

splitting criterion directly maximizes the difference betweenprobabilities (target criterion)no pruning

Rzepakowski, Jaroszewicz 2010, 2012

splitting criterion based on Information Theory, more in linewith modern decision treespruning designed for uplift modelingmulticlass problems and multiway splits possibleif the control group is empty, the algorithm reduces to classicaldecision tree learning

Szymon Jaroszewicz Uplift modeling

Page 23: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

KL divergence as a splitting criterion for uplift trees

Measure difference between treatment and control groupsusing KL divergence

KL(

PT (Y ) : PC (Y ))

=∑

y∈Dom(Y )

PT (y) logPT (y)

PC (y)

KL-divergence conditional on a test X

KL(PT (Y ) : PC (Y ) | X ) =∑x∈Dom(X )

NT (X = x) + NC (X = x)

NT + NCKL(PT (Y |X = x) : PC (Y |X = x)

)

note the weighting factorsNT and NC denote counts in the treatment and controldatasets

Szymon Jaroszewicz Uplift modeling

Page 24: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

KL divergence as a splitting criterion for uplift trees

Measure difference between treatment and control groupsusing KL divergence

KL(

PT (Y ) : PC (Y ))

=∑

y∈Dom(Y )

PT (y) logPT (y)

PC (y)

KL-divergence conditional on a test X

KL(PT (Y ) : PC (Y ) | X ) =∑x∈Dom(X )

NT (X = x) + NC (X = x)

NT + NCKL(PT (Y |X = x) : PC (Y |X = x)

)

note the weighting factorsNT and NC denote counts in the treatment and controldatasets

Szymon Jaroszewicz Uplift modeling

Page 25: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

The KLgain

How much larger does the difference between class distributions inT and C groups become after a split on X ?

KLgain(X ) =

KL(

PT (Y ) : PC (Y )|X)− KL

(PT (Y ) : PC (Y )

)

Szymon Jaroszewicz Uplift modeling

Page 26: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

Properties of the KLgain

Properties:

If Y ⊥ X then KLgain(X ) = 0

If PT (Y |X ) = PC (Y |X ) then

KLgain(X ) = minimum

If the control group is empty, KLgain reduces to entropy gain(Laplace correction is used on P(Y ))

Szymon Jaroszewicz Uplift modeling

Page 27: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

Negative values of KLgain

Classification decision trees: gain(X ) ≥ 0

KLgain(X ) can be negative:

PT (Y = 1) = 0.28PC (Y = 1) = 0.85

PT (Y = 1) = 0.7PC (Y = 1) = 0.9

X = 0

PT (Y = 1) = 0.1PC (Y = 1) = 0.4

X = 1

PT (X = 0) = 0.3PC (X = 0) = 0.9

Note the dependence of X on T /C group selection

Szymon Jaroszewicz Uplift modeling

Page 28: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

Negative values of KLgain

Negative gain values are only possible when X depends ongroup selection

This a variant of the Simpson’s paradox

Theorem

If X is independent of the selection of the T and C groups then

KLgain(X ) ≥ 0

In practice we want X to be independent of the T /C groupselection

In medical research great care is taken to ensure this

Szymon Jaroszewicz Uplift modeling

Page 29: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

The KLgain ratio

In standard decision trees, the gain is divided by test’s entropyto punish tests with large number of outcomes

In our case:

KLratio(X ) =KLgain(X )

I (X )

where

I (X ) = H

(NT

N,NC

N

)KL(PT (X ) : PC (X )) +

NT

NH(PT (X )) +

NC

NH(PC (X )) +

1

2

Tests with large numbers of outcomes are punished

Tests for which PT (X ) and PC (X ) differ are punished

This prevents splits correlated with the division into treatmentand control groups

Szymon Jaroszewicz Uplift modeling

Page 30: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

Uplift modeling through class variable

transformation

Szymon Jaroszewicz Uplift modeling

Page 31: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

Uplift modeling through class variable transformation

Introduced in Jaskowski, Jaroszewicz, 2012

Allows for adapting an arbitrary classifier to uplift modeling

Let G ∈ {T ,C} denote the group membership (treatment orcontrol)

Define an r.v.

Z =

1 if G = T and Y = 1,

1 if G = C and Y = 0,

0 otherwise.

In plain English: flip the class in the control dataset

Szymon Jaroszewicz Uplift modeling

Page 32: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

Uplift modeling through class variable transformation

Now

P(Z = 1|X1, . . . ,Xm)

= PT (Y = 1|X1, . . . ,Xm)P(G = T |X1, . . . ,Xm)

+ PC (Y = 0|X1, . . . ,Xm)P(G = C |X1, . . . ,Xm)

Assume that G is independent of X1, . . . ,Xm (otherwise thestudy is badly constructed):

P(Z = 1|X1, . . . ,Xm) = PT (Y = 1|X1, . . . ,Xm)P(G = T )

+ PC (Y = 0|X1, . . . ,Xm)P(G = C )

Szymon Jaroszewicz Uplift modeling

Page 33: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

Uplift modeling through class variable transformation

Assume P(G = T ) = P(G = C ) = 12 (otherwise reweight one

of the datasets):

2P(Z = 1|X1, . . . ,Xm)

= PT (Y = 1|X1, . . . ,Xm) + PC (Y = 0|X1, . . . ,Xm)

= PT (Y = 1|X1, . . . ,Xm) + 1− PC (Y = 1|X1, . . . ,Xm)

Finally

PT (Y = 1|X1, . . . ,Xm)− PC (Y = 1|X1, . . . ,Xm)

= 2P(Z = 1|X1, . . . ,Xm)− 1

Szymon Jaroszewicz Uplift modeling

Page 34: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

Uplift modeling through class variable transformation

Conclusion

Modeling P(Z = 1|X ) is equivalent to modeling the differencebetween class probabilities in the treatment and control groups

The algorithm:

1 Flip the class in DC

2 Concatenate D = DT ∪DC

3 Build any classifier on D

4 The classifier is actually an uplift model

Szymon Jaroszewicz Uplift modeling

Page 35: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

Advantages

Any classifier can be turned into an uplift model

A single model is built

coefficients are easier to interpret than for the double modelthe model predicts uplift directly

(will not focus on predicting classes themselves)a single model is built on a large dataset

(double model method subtracts two models built on smalldatasets)

Disadvantage: the double model may represent a morecomplex decision surface

Szymon Jaroszewicz Uplift modeling

Page 36: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

Uplift Support Vector Machines

Szymon Jaroszewicz Uplift modeling

Page 37: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

Uplift Support Vector Machines

Introduced in Zaniewicz, Jaroszewicz, 2013

Recall that the outcome of an action can be

positivenegativeneutral

Main idea

Use two parallel hyperplanes dividing the sample space into threeareas:

positive (+1)

neutral (0)

negative (-1)

Szymon Jaroszewicz Uplift modeling

Page 38: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

Uplift Support Vector Machines

Introduced in Zaniewicz, Jaroszewicz, 2013

Recall that the outcome of an action can be

positivenegativeneutral

Main idea

Use two parallel hyperplanes dividing the sample space into threeareas:

positive (+1)

neutral (0)

negative (-1)

Szymon Jaroszewicz Uplift modeling

Page 39: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

Uplift Support Vector Machines

H1

H2

+1

0

−1H1 : 〈w, x〉+ b1 = 0H2 : 〈w, x〉+ b2 = 0

Szymon Jaroszewicz Uplift modeling

Page 40: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

Uplift Support Vector Machines

Fundamental problem of causal inference⇒ We never know if a point was classified correctly!

Need to use as much information as possible

Four types of points: T+, T−, C+, C−

Positive area (+1):

T−, C+ definitely misclassifiedT+, C− may be correct and definitely not a loss

(true outcome may only be neutral)

Negative area (-1):

T+, C− definitely misclassifiedT−, C+ may be correct and definitely not a loss

(true outcome may only be neutral)

Neutral area (0):

all predictions may be correct or incorrect

Szymon Jaroszewicz Uplift modeling

Page 41: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

Uplift Support Vector Machines – problem formulation

Penalize points for being on the wrong side of eachhyperplane separately

Points in the neutral area are penalized for crossing onehyperplane

this prevents all points from being classified as neutral

Points which are definitely misclassified are penalized forcrossing two hyperplanes

such points should be avoided, thus the higher penalty

Other points are not penalized

Szymon Jaroszewicz Uplift modeling

Page 42: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

Uplift Support Vector Machines – problem formulation

H1

H2T+

ξi ,1C+

ξi ,2

T+C−

T+

ξi ,1ξi2

T−ξi ,2ξi ,1

C+

Szymon Jaroszewicz Uplift modeling

Page 43: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

Optimization task – primal form

minw,b1,b2∈Rm+2

1

2〈w,w〉+ C1

∑DT

+∪DC−

ξi ,1 + C2

∑DT−∪DC

+

ξi ,1

+ C2

∑DT

+∪DC−

ξi ,2 + C1

∑DT−∪DC

+

ξi ,2,

subject to:

〈w, xi 〉+ b1 ≤ −1 + ξi ,1, for (xi , yi ) ∈ DT+ ∪DC

−,

〈w, xi 〉+ b1 ≥ +1− ξi ,1, for (xi , yi ) ∈ DT− ∪DC

+,

〈w, xi 〉+ b2 ≤ −1 + ξi ,2, for (xi , yi ) ∈ DT+ ∪DC

−,

〈w, xi 〉+ b2 ≥ +1− ξi ,2, for (xi , yi ) ∈ DT− ∪DC

+,

ξi ,j ≥ 0, dla i = 1, . . . , n, j ∈ {1, 2},

Szymon Jaroszewicz Uplift modeling

Page 44: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

Optimization task – primal form

We have two penalty parameters:

C1 penalty coefficient for being on the wrong side of onehyperplane

C2 coefficient of additional penalty for crossing also thesecond hyperplane

All points classified as neutral are penalized with C1ξ

All definitely misclassified points are penalized with C1ξ andC2ξ

How do C1 and C2 influence the model?

Szymon Jaroszewicz Uplift modeling

Page 45: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

Influence of penalty coefficients C1 and C2 on the model

Lemma

For a well defined model C2 ≥ C1. Otherwise the order of thehyperplanes would be reversed.

Lemma

If C2 = C1 then no points are classified as neutral.

Lemma

For sufficiently large ratio C2/C1 no point is penalized for crossingboth hyperplanes. (Almost all points are classified as neutral.)

Szymon Jaroszewicz Uplift modeling

Page 46: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

Influence of penalty coefficients C1 and C2 on the model

The C1 coefficient plays the role of the penalty in classicalSVMs

The ratio C2/C1 decides on the proportion of cases classifiedas neutral

Szymon Jaroszewicz Uplift modeling

Page 47: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

Example: the tamoxifen drug trial data

1.00 1.02 1.04 1.06 1.08 1.10 1.12 1.14 1.16C2 /C1

40

80

120

160

200

240

280

320nu

mbe

r of c

ases

tamoxifen

classified negativeclassified neutralclassified positive

Szymon Jaroszewicz Uplift modeling

Page 48: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

Example: the tamoxifen drug trial data

1.00 1.02 1.04 1.06 1.08 1.10 1.12 1.14 1.16C2 /C1

16

12

8

4

0

4

8

12up

lift [

%]

tamoxifen

classified negativeclassified neutralclassified positive

Szymon Jaroszewicz Uplift modeling

Page 49: Uplift modeling - Instytut Informatyki Politechniki … a model which predicts thedi erencebetween class probabilities in the treatment and control groups Szymon Jaroszewicz Uplift

Literature

Nicholas J. Radcliffe and Patrick D. Surry. Real-World UpliftModelling with Significance-Based Uplift Trees, White paper,Stochastic solution ltd, 2011 (PDF) [a good overview paper]

P. Rzepakowski, S. Jaroszewicz. Decision Trees for UpliftModeling, ICDM’2010 (PDF)

M. Jaskowski, S. Jaroszewicz. Uplift Modeling for ClinicalTrial Data, In ICML 2012 Workshop on Machine Learning forClinical Data Analysis (PDF)

L. Zaniewicz, S. Jaroszewicz. Support Vector Machines forUplift Modeling, IEEE ICDM Workshop on Causal Discovery(CD 2013) at ICDM 2013

Szymon Jaroszewicz Uplift modeling