Lec 3 Sampling

download Lec 3 Sampling

of 8

Transcript of Lec 3 Sampling

  • 8/14/2019 Lec 3 Sampling

    1/8

    8- 1

    Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.

    llpp ggnniimmaaSSMethodsMethods&&

    Central Limit TheoremCentral Limit Theorem

    8- 2

    Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.

    We use sample informationtomake decisions or inferences

    about the population.

    We use sample informationtomake decisions or inferences

    about the population.

    TwoKEYsteps:TwoKEYsteps:

    1. Choice of a proper method for selecting sample data

    &2. Proper analysis of the sample data (more later)

    KEY 1.KEY 1.

    8- 3

    Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.

    KEY 1. KEY 1.If the proper

    method for selecting

    the sample is

    NOT MADE

    If the proper

    method for selecting

    the sample is

    NOT MADE the SAMPLEwill not be truly

    representative of theTOTAL Population!

    the SAMPLEwill not be truly

    representative of theTOTAL Population!

    and wrong conclusions can be drawn!

    8- 4

    Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.

    of the physical impossibility of checkingall items in the population, and,

    also, it would be too time-consuming

    $ the studying of all the items in a populationwould NOT be cost effective

    the sample results are usually sufficient

    the destructive nature of certain tests

    Why Sample the Population?Why Sample the Population?Why Sample the Population?Why Sample the Population?

    Because

  • 8/14/2019 Lec 3 Sampling

    2/8

    8- 5

    Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.

    with Replacementwith Replacement

    Each data unit in the

    population is allowed to

    appear in the sample

    more than once

    Each data unit in the

    population is allowed to

    appear in the sample

    more than once

    Each data unit in the

    population is allowed to

    appear in the sample

    no more than once

    Each data unit in the

    population is allowed to

    appear in the sample

    no more than once

    Each data unit in the

    population

    has a known likelihoodof being

    included in the sample

    Each data unit in the

    population

    has a known likelihoodof being

    included in the sample

    Non-Probability SamplingNon-Probability Sampling

    Doesnot involverandom selection;

    inclusion of an item is

    based onconvenience

    Doesnot involverandom selection;

    inclusion of an item is

    based onconvenience

    Probability SamplingProbability Sampling

    without Replacementwithout Replacement

    Techniques8- 6

    Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.

    MethodsSimple Random

    Systematic Random

    StratifiedRandom

    Cluster

    ...each item(person) in the population

    has an equal chance of being included

    items(people) of the population

    are arranged in some order.

    A random starting point is selected, and

    then everykth member of the populationis selected for the sample

    a population is

    first divided into subgroups, called strata,

    and a sample is selected from each strata

    a population is

    first divided into primary units, and

    samples are selected from each unit

    8- 7

    Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.

    The law firm of Hoya and Associates has five partners.At their weekly partners meeting each reported thenumber of hours they billed their clients last week:

    ExampleExamplePartner Hours

    Dunn 22

    Hardy 26

    Kiers 30

    Malinowski 26

    Tillman 22Iftwo partners are selected randomly

    how many different samples are possible?

    Iftwo partners are selected randomlyhow many different samples are possible?

    8- 8

    Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.

    Partner Hours

    Dunn 22

    Hardy 26

    Kiers 30

    Malinowski 26

    Tillman 22

    Objects

    5 taken 2 at a time

    Using 5C2 Using 5C2

    for a Total of 10 Samples!

    Iftwo partners are selected randomlyhow many different samples are possible?

    Iftwo partners are selected randomlyhow many different samples are possible?

  • 8/14/2019 Lec 3 Sampling

    3/8

    8- 9

    Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.

    Partner Hours

    Dunn 22

    Hardy 26

    Kiers 30

    Malinowski 26

    Tillman 22

    Objects

    5

    5C2 =5C2 =5!

    =2!

    = 10 Samples= 10 Samples

    (5 2!)

    Iftwo partners are selected randomlyhow many different samples are possible?Iftwo partners are selected randomly

    how many different samples are possible?

    8-10

    Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.

    Partners Samples of 2 Mean

    1&2

    1&3

    1&4

    1&5

    2&3

    2&4

    2&5

    3&4

    3&5

    4&5

    (22+26)/2 =

    (22+30)/2 =

    (22+26)/2 =

    24

    2624

    (22+22)/2 =

    (26+30)/2 =

    (26+26)/2 =

    (26+22)/2 =

    (30+26)/2 =

    (30+22)/2 =

    (26+22)/2 =

    22

    28

    26

    2428

    26

    24

    8-11

    Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.

    Terminology

    is the difference betweena sample statistic

    and its

    corresponding populationparameter

    is the difference betweena sample statistic

    and its

    corresponding populationparameter

    is a probability distributionconsisting of

    all possible sample meansof a given sample size

    selected from a population

    is a probability distributionconsisting of

    all possible sample meansof a given sample size

    selected from a population

    Sampling error

    Sampling distributionof the sample mean

    ExampleExample

    8-12

    Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.

    Sample

    Mean

    Frequency Relative frequency

    Probability

    Organize the sample meansinto a Sampling Distribution

    Organize the sample meansinto a Sampling Distribution

    Example continuedExample continuedMean

    24

    26

    24

    22

    28

    26

    24

    2826

    24

    22

    24

    26

    28

    1

    4

    3

    2

    1/10

    4/10

    3/10

    2/10

  • 8/14/2019 Lec 3 Sampling

    4/8

    8-13

    Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.

    Sample Mean Frequency

    10

    ==== 22(1)+ 24(4)+ 26(3) + 28(2)

    Example continuedExample continued

    22

    24

    26

    28

    1

    4

    3

    2

    Compute themean of the sample means .Compare it with the population mean

    Compute themean of the sample means .Compare it with the population mean

    = 25.2= 25.2

    X

    8-14

    Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.

    Example continuedExample continued

    5

    2226302622 ++++++++++++++++====

    Thepopulation mean is also the same asthesample means25.2 hours!

    Thepopulation mean is also the same asthesample means25.2 hours!

    Note

    Partner Hours

    Dunn 22

    Hardy 26

    Kiers 30

    Malinowski 26

    Tillman 22

    = 25.2= 25.2

    8-15

    Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.

    The sampling distribution of the means

    ofall possible samples of sizengenerated from the population

    will beapproximately normally distributed!

    CentralLimit TheoremCentralLimit Theorem

    Sampling Distributions:Sampling Distributions:

    VarianceVariance 2

    /n2

    /n

    Mean (x)Mean (x)

    / n/ nStandard Deviation

    (standard error of the mean)Standard Deviation

    (standard error of the mean)X

    8-16

    Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.

    samplemeansamplestandarddeviation

    sample variancesampleproportion

    A point estimate is one value (a single point)that is used to estimate a population parameter

    PointEstimatesPointEstimates

    More

  • 8/14/2019 Lec 3 Sampling

    5/8

    8-17

    Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.

    PointEstimatesPointEstimates

    Population followsthe normal distribution

    Population followsthe normal distribution

    The sampling distribution

    of thesample means also follows

    the normal distribution

    The sampling distribution

    of thesample means also follows

    the normal distribution

    Probability ofa sample mean

    falling within a particular region,

    use:

    Probability ofa sample mean

    falling within a particular region,

    use: Z =n

    X

    Population doesNOTfollowthe normal distribution

    Population doesNOTfollowthe normal distribution

    If the sample is of at least 30

    observations, the sample WILL

    follow the normal distribution

    If the sample is of at least 30

    observations, the sample WILL

    follow the normal distribution

    Probability ofa sample mean

    falling within a particular region,

    use:

    Probability ofa sample mean

    falling within a particular region,

    use: Z =n

    X

    s

    8-18

    Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.

    CentralLimit TheoremCentralLimit Theorem

    Chart 8 6 Results for Several PopulationsChart 8 6 Results for Several Populations

    8-19

    Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.

    Suppose it takes anaverage of 330 minutes

    for taxpayers toprepare, copy, andmail an income tax

    return form.

    Suppose it takes anaverage of 330 minutes

    for taxpayers toprepare, copy, andmail an income tax

    return form.

    Using the Sampling Distribution

    of the Sample Mean

    Using the Sampling Distribution

    of the Sample Mean

    = 12.6= 12.6

    A consumer watchdogagency selects arandomsample of 40 taxpayersand finds the standarddeviation of the timeneeded is 80 minutes

    A consumer watchdogagency selects arandomsample of 40 taxpayersand finds the standarddeviation of the timeneeded is 80 minutes

    What is thestandard error of the mean?What is thestandard error of the mean?

    Data

    /n/nFormulaFormula = 80/= 80/ 40

    8-20

    Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.

    What is the likelihood the sample mean

    isgreater than 320 minutes?

    What is the likelihood the sample mean

    isgreater than 320 minutes?

    Using the Sampling Distribution

    of the Sample Mean

    Using the Sampling Distribution

    of the Sample Mean

    Suppose it takes anaverage of 330 minutes fortaxpayers to prepare, copy, and mail an income taxreturn form. A consumer watchdog agency selects a

    random sample of 40 taxpayers and finds thestandard deviation of the time needed is 80 minutes.

    Suppose it takes anaverage of 330 minutes fortaxpayers to prepare, copy, and mail an income taxreturn form. A consumer watchdog agency selects a

    random sample of 40 taxpayers and finds thestandard deviation of the time needed is 80 minutes.

    Data

    nswer

  • 8/14/2019 Lec 3 Sampling

    6/8

    8-21

    Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.

    Using the Sampling Distributionof the Sample Mean

    Using the Sampling Distributionof the Sample Mean

    What is the likelihood the sample meanisgreater than 320 minutes?

    What is the likelihood the sample meanisgreater than 320 minutes?

    * average of 330 minutes *random sample of 40* standard deviation is 80 minutes

    * average of 330 minutes *random sample of 40* standard deviation is 80 minutes

    Data

    ns

    Xz

    ====FormulaFormula

    4080

    330320 ==== = 0.79= 0.79

    1111

    330320

    a1

    8-22

    Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.

    Using the Sampling Distributionof the Sample Mean

    Using the Sampling Distributionof the Sample Mean

    What is the likelihood the sample meanisgreater than 320 minutes?

    What is the likelihood the sample meanisgreater than 320 minutes?

    * average of 330 minutes *random sample of 40* standard deviation is 80 minutes

    * average of 330 minutes *random sample of 40* standard deviation is 80 minutes

    Data

    Look up 0.79in Table

    Look up 0.79in Table

    2222

    a1 =0.2852a1 =0.2852Required Area =

    0.2852 + .5 = 0.7852Required Area =

    0.2852 + .5 = 0.7852

    330320

    a1

    8-23

    Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.

    Sampling Distribution of

    Proportion

    Sampling Distribution of

    Proportion

    The normal distribution(acontinuous distribution)

    yields a goodapproximation ofthe binomial distribution

    (adiscrete distribution)

    for large values ofn.

    Use whennp andn(1-p) are both greater than 5!Use whennp andn(1-p) are both greater than 5!

    8-24

    Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.

    np=

    )1( pnp ====

    Mean andVarianceof a

    Binomial ProbabilityDistribution

    Mean andVarianceof a

    Binomial ProbabilityDistribution

    2

    FormulaFormula

    2FormulaFormula

  • 8/14/2019 Lec 3 Sampling

    7/8

    8-25

    Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.

    A multinational company claims that 55% of itsemployees are bilingual. To verify this claim, a

    statistician selected a sample of 60 employees of thecompany usingsimple random sampling and

    found 48% to be bilingual.

    np = 60(.55)= 33

    n(1-p) = 60(.45)

    = 27

    The sample size is bigenough to use the normal

    approximation with amean of.55and astandard deviation

    of (.55)(.45)/60 = 0.064

    The sample size is bigenough to use the normal

    approximation with amean of.55and astandard deviation

    of (.55)(.45)/60 = 0.064

    Sampling Distribution ofProportion

    Sampling Distribution ofProportion

    Based on this information,what can we say about the companys claim?

    8-26

    Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.

    s

    Xz

    ====

    1111 Z = (0.48 -0.55) / 0.064Z = -1.09

    Look up 1.09 in TableLook up 1.09 in Table22222222

    a1 =0.3621a1 =0.3621

    Required Area= .5 0.3621 = 0.1379

    or 14%

    Required Area= .5 0.3621 = 0.1379

    or 14%

    Sampling Distribution ofProportion

    Sampling Distribution ofProportion

    continuedcontinued

    FormulaFormula

    .55.48

    a1

    8-27

    Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.

    s

    Xz

    ====

    1111 Z = (0.48 -0.55) / 0.064Z = -1.09

    Look up 1.09 in TableLook up 1.09 in Table22222222

    a =0.3621a =0.3621

    Required Area= .5 0.3621 = 0.1379

    or 14%

    Required Area= .5 0.3621 = 0.1379

    or 14%

    There is

    approximately

    a 14%chance

    that the

    companys claim

    is true, based onthis sample.

    There is

    approximately

    a 14%chance

    that the

    companys claim

    is true, based onthis sample.

    Sampling Distribution of

    Proportion

    Sampling Distribution of

    Proportion

    Conclusion

    continuedcontinued

    FormulaFormula

    8-28

    Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.

    Suppose themean selling price of a

    litre of gasoline in Canada is$.659.

    Further, assume the distribution ispositively

    skewed, with astandarddeviationof$0.08.

    What is the probability of selecting a

    sample of 35 gasoline stations and

    finding the sample mean within$.03 ofthe population mean?

    Sampling Distribution of

    Mean

    Sampling Distribution of

    Mean

  • 8/14/2019 Lec 3 Sampling

    8/8

    8-29

    Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.

    Sampling Distribution ofMean

    Sampling Distribution ofMean

    nsz X ====

    1 3508.0$

    659$.629$. ==== 22.-2====

    ns

    z X ====2

    3508.0$

    659$.689$. ==== 2.22====

    mean selling price is$.659 SDof $0.08Sample of 35 gasoline stations

    Probability ofsample mean within$.03?

    mean selling price is$.659 SDof $0.08Sample of 35 gasoline stations

    Probability ofsample mean within$.03?

    Data

    Find thez-scores for.659 +/- .03

    Find thez-scores for.659 +/- .03 i.e. 0.629and.689

    .629 .689

    8-30

    Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.

    We would expect about97%

    of the sample means tobe within $0.03 of the

    population mean.

    We would expect about97%

    of the sample means tobe within $0.03 of the

    population mean.

    a1 = .4868a2 = .4868

    Sampling Distribution ofMean

    Sampling Distribution ofMean

    mean selling price is$.659 SDof $0.08Sample of 35 gasoline stations

    Probability ofsample mean within $.03?

    mean selling price is$.659 SDof $0.08Sample of 35 gasoline stations

    Probability ofsample mean within $.03?

    Data

    Find areas from tableFind areas from table

    Required A = .9736

    z ==== -2.221z ==== 2.22

    2

    8-31

    Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.

    This completes Chapter 8This completes Chapter 8