Calculating p Values
Here we look at some examples of calculating p values. The examples
are for both normal and t distributions. We assume that you
can enter
data and know the commands associated
with
We look at the steps necessary to calculate the p-value for a
particular test. In the interest of simplicity we only look at a
two sided test, and we focus on one example. Here we want to show
that the mean is not close to a fixed value, a.
H0: mux = a,
Ha: mux not = a,
The p value is
calculated for a particular sample mean. Here we assume that we
obtained a sample
mean, x and want to find
its p value. It is the
probability that we would obtain a given sample mean that is
greater than the absolute value of its Z-score or less than the
negative of the absolute value of its Z-score.
For the special case of a normal distribution we also need the
standard deviation. We will assume that we are given the standard
deviation and call it s. The calculation
for the p value can be
done in several of ways. We will look at two ways here. The first
way is to convert the sample means to their associated Z-score. The
other way is to simply specify the standard deviation and let the
computer do the conversion. At first glance it may seem like a no
brainer, and we should just use the second method. Unfortunately,
when using the t-distribution we need to
convert to the t-score, so it is a good
idea to know both ways.
We first look at how to calculate
the p-value using the Z-score. The
Z-score is found by assuming that the null hypothesis is true,
subtracting the assumed mean, and dividing by the theoretical
standard deviation. Once the Z-score is found the probability that
the value could be less the Z-score is found using
the pnorm command.
This is not enough to get the p-value. If
the Z-score that is found is positive then we need to take one
minus the associated probability. Also, for a two sided test we
need to multiply the result by two. Here we avoid these issues and
insure that the Z-score is negative by taking the negative of the
absolute value.
We now look at a specific example. In the example below we will use
a value of a of 5, a
standard deviation of 2, and a sample size of 20. We then find
the p-value for a sample mean of 7:
> a
> s
> n
> xbar
> z
> z
[1] 4.472136
> 2*pnorm(-abs(z))
[1] 7.744216e-06
>
We now look at the same problem only specifying the mean and
standard deviation within the pnorm command. Note
that for this case we cannot so easily force the use of the left
tail. Since the sample mean is more than the assumed mean we have
to take two times one minus the probability:
> a
> s
> n
> xbar
> 2*(1-pnorm(xbar,mean=a,sd=s/sqrt(20)))
[1] 7.744216e-06
>
Finding the p-value using
a t distribution is very
similar to using the Z-score as demonstrated above. The only
difference is that you have to specify the number of degrees of
freedom. Here we look at the same example as above but use
the t distribution
instead:
> a
> s
> n
> xbar
> t
> t
[1] 4.472136
> 2*pt(-abs(t),df=n-1)
[1] 0.0002611934
>
We now look at an example where we have a univariate data set and
want to find the p-value. In this example
we use one of the data sets given in the data
input chapter. We use
the w1 data set:
> w1
> summary(w1)
vals
Min. :0.130
1st Qu.:0.480
Median :0.720
Mean :0.765
3rd Qu.:1.008
Max. :1.760
> length(w1$vals)
[1] 54
Here we use a two sided hypothesis test,
H0: mu1 = 0.7,
Ha: mu1 not = 0.7,
So we calculate the sample mean and sample standard deviation in
order to calculate the p-value:
> t
> t
[1] 1.263217
> 2*pt(-abs(t),df=length(w1$vals)-1)
[1] 0.21204
Suppose that you want to find
the p-values for many tests. This is a
common task and most software packages will allow you to do this.
Here we see how it can be done in R.
Here we assume that we want to do a one-sided hypothesis test for a
number of comparisons. In particular we will look at three
hypothesis tests. All are of the following form:
H0: mu1 –
mu2 = 0,
Ha: mu1 –
mu2 not = 0,
We have three different sets of comparisons to make:
Comparison 1
Mean
Std. Dev.
Number (pop.)
Group I
10
3
300
Group II
10.5
2.5
230
Comparison 2
Mean
Std. Dev.
Number (pop.)
Group I
12
4
210
Group II
13
5.3
340
Comparison 3
Mean
Std. Dev.
Number (pop.)
Group I
30
4.5
420
Group II
28.5
3
400
For each of these comparisons we want to calculate
a p-value. For each comparison there are
two groups. We will refer to group
one as the group whose results are in the
first row of each comparison above. We will refer
to group two as the
group whose results are in the second row of each comparison above.
Before we can do that we must first compute a standard error and
a t-score. We will find general formulae
which is necessary in order to do all three calculations at
once.
We assume that the means for the first group are defined in a
variable called m1. The means for the
second group are defined in a variable
called m2. The standard deviations for
the first group are in a variable
called sd1. The standard deviations for
the second group are in a variable
called sd2. The number of samples for the
first group are in a variable
called num1. Finally, the number of
samples for the second group are in a variable
called num2.
With these definitions the standard error is the square root of
(sd1^2)/num1+(sd2^2)/num2. The
associated t-score is m1 minus m2 all
divided by the standard error. The R comands to do this can be
found below:
> m1
> m2
> sd1
> sd2
> num1
> num2
> se
> t
To see the values just type in the variable name on a line
alone:
> m1
[1] 10 12 30
> m2
[1] 10.5 13.0 28.5
> sd1
[1] 3.0 4.0 4.5
> sd2
[1] 2.5 5.3 3.0
> num1
[1] 300 210 420
> num2
[1] 230 340 400
> se
[1] 0.2391107 0.3985074 0.2659216
> t
[1] -2.091082 -2.509364 5.640761
To use the pt command we
need to specify the number of degrees of freedom. This can be done
using the pmin command.
Note that there is also a command
called min, but it does not work the same
way. You need to
use pmin to get the
correct results. The numbers of degrees of freedom
are pmin(num1,num2)-1. So
the p-values can be found using the
following R command:
> pt(t,df=pmin(num1,num2)-1)
[1] 0.01881168 0.00642689 0.99999998
If you enter all of these commands into R you should have noticed
that the last p-value is not correct.
The pt command gives the
probability that a score is less that the
specified t.
The t-score for the last entry is
positive, and we want the probability that
a t-score is bigger. One way around this
is to make sure that all of the t-scores
are negative. You can do this by taking the negative of the
absolute value of the t-scores:
> pt(-abs(t),df=pmin(num1,num2)-1)
[1] 1.881168e-02 6.426890e-03 1.605968e-08
The results from the command above should give you
the p-values for a one-sided test. It is
left as an exercise how to find
the p-values for a two-sided test.
相关资源:糖烟茶酒行业管理软件 烟酒进销存店铺销售收银管理系统
声明:本站部分文章及图片源自用户投稿,如本站任何资料有侵权请您尽早请联系jinwei@zod.com.cn进行处理,非常感谢!