r语言p值%3c2e-16,Calculating?p?Values(R软件计算p值)

Calculating p Values

Here we look at some examples of calculating p values. The examples

are for both normal and t distributions. We assume that you

can enter

data and know the commands associated

with

We look at the steps necessary to calculate the p-value for a

particular test. In the interest of simplicity we only look at a

two sided test, and we focus on one example. Here we want to show

that the mean is not close to a fixed value, a.

H0: mux = a,

Ha: mux not = a,

The p value is

calculated for a particular sample mean. Here we assume that we

obtained a sample

mean, x and want to find

its p value. It is the

probability that we would obtain a given sample mean that is

greater than the absolute value of its Z-score or less than the

negative of the absolute value of its Z-score.

For the special case of a normal distribution we also need the

standard deviation. We will assume that we are given the standard

deviation and call it s. The calculation

for the p value can be

done in several of ways. We will look at two ways here. The first

way is to convert the sample means to their associated Z-score. The

other way is to simply specify the standard deviation and let the

computer do the conversion. At first glance it may seem like a no

brainer, and we should just use the second method. Unfortunately,

when using the t-distribution we need to

convert to the t-score, so it is a good

idea to know both ways.

We first look at how to calculate

the p-value using the Z-score. The

Z-score is found by assuming that the null hypothesis is true,

subtracting the assumed mean, and dividing by the theoretical

standard deviation. Once the Z-score is found the probability that

the value could be less the Z-score is found using

the pnorm command.

This is not enough to get the p-value. If

the Z-score that is found is positive then we need to take one

minus the associated probability. Also, for a two sided test we

need to multiply the result by two. Here we avoid these issues and

insure that the Z-score is negative by taking the negative of the

absolute value.

We now look at a specific example. In the example below we will use

a value of a of 5, a

standard deviation of 2, and a sample size of 20. We then find

the p-value for a sample mean of 7:

> a

> s

> n

> xbar

> z

> z

[1] 4.472136

> 2*pnorm(-abs(z))

[1] 7.744216e-06

>

We now look at the same problem only specifying the mean and

standard deviation within the pnorm command. Note

that for this case we cannot so easily force the use of the left

tail. Since the sample mean is more than the assumed mean we have

to take two times one minus the probability:

> a

> s

> n

> xbar

> 2*(1-pnorm(xbar,mean=a,sd=s/sqrt(20)))

[1] 7.744216e-06

>

Finding the p-value using

a t distribution is very

similar to using the Z-score as demonstrated above. The only

difference is that you have to specify the number of degrees of

freedom. Here we look at the same example as above but use

the t distribution

instead:

> a

> s

> n

> xbar

> t

> t

[1] 4.472136

> 2*pt(-abs(t),df=n-1)

[1] 0.0002611934

>

We now look at an example where we have a univariate data set and

want to find the p-value. In this example

we use one of the data sets given in the data

input chapter. We use

the w1 data set:

> w1

> summary(w1)

vals

Min. :0.130

1st Qu.:0.480

Median :0.720

Mean :0.765

3rd Qu.:1.008

Max. :1.760

> length(w1$vals)

[1] 54

Here we use a two sided hypothesis test,

H0: mu1 = 0.7,

Ha: mu1 not = 0.7,

So we calculate the sample mean and sample standard deviation in

order to calculate the p-value:

> t

> t

[1] 1.263217

> 2*pt(-abs(t),df=length(w1$vals)-1)

[1] 0.21204

Suppose that you want to find

the p-values for many tests. This is a

common task and most software packages will allow you to do this.

Here we see how it can be done in R.

Here we assume that we want to do a one-sided hypothesis test for a

number of comparisons. In particular we will look at three

hypothesis tests. All are of the following form:

H0: mu1 –

mu2 = 0,

Ha: mu1 –

mu2 not = 0,

We have three different sets of comparisons to make:

Comparison 1

Mean

Std. Dev.

Number (pop.)

Group I

10

3

300

Group II

10.5

2.5

230

Comparison 2

Mean

Std. Dev.

Number (pop.)

Group I

12

4

210

Group II

13

5.3

340

Comparison 3

Mean

Std. Dev.

Number (pop.)

Group I

30

4.5

420

Group II

28.5

3

400

For each of these comparisons we want to calculate

a p-value. For each comparison there are

two groups. We will refer to group

one as the group whose results are in the

first row of each comparison above. We will refer

to group two as the

group whose results are in the second row of each comparison above.

Before we can do that we must first compute a standard error and

a t-score. We will find general formulae

which is necessary in order to do all three calculations at

once.

We assume that the means for the first group are defined in a

variable called m1. The means for the

second group are defined in a variable

called m2. The standard deviations for

the first group are in a variable

called sd1. The standard deviations for

the second group are in a variable

called sd2. The number of samples for the

first group are in a variable

called num1. Finally, the number of

samples for the second group are in a variable

called num2.

With these definitions the standard error is the square root of

(sd1^2)/num1+(sd2^2)/num2. The

associated t-score is m1 minus m2 all

divided by the standard error. The R comands to do this can be

found below:

> m1

> m2

> sd1

> sd2

> num1

> num2

> se

> t

To see the values just type in the variable name on a line

alone:

> m1

[1] 10 12 30

> m2

[1] 10.5 13.0 28.5

> sd1

[1] 3.0 4.0 4.5

> sd2

[1] 2.5 5.3 3.0

> num1

[1] 300 210 420

> num2

[1] 230 340 400

> se

[1] 0.2391107 0.3985074 0.2659216

> t

[1] -2.091082 -2.509364 5.640761

To use the pt command we

need to specify the number of degrees of freedom. This can be done

using the pmin command.

Note that there is also a command

called min, but it does not work the same

way. You need to

use pmin to get the

correct results. The numbers of degrees of freedom

are pmin(num1,num2)-1. So

the p-values can be found using the

following R command:

> pt(t,df=pmin(num1,num2)-1)

[1] 0.01881168 0.00642689 0.99999998

If you enter all of these commands into R you should have noticed

that the last p-value is not correct.

The pt command gives the

probability that a score is less that the

specified t.

The t-score for the last entry is

positive, and we want the probability that

a t-score is bigger. One way around this

is to make sure that all of the t-scores

are negative. You can do this by taking the negative of the

absolute value of the t-scores:

> pt(-abs(t),df=pmin(num1,num2)-1)

[1] 1.881168e-02 6.426890e-03 1.605968e-08

The results from the command above should give you

the p-values for a one-sided test. It is

left as an exercise how to find

the p-values for a two-sided test.

相关资源:糖烟茶酒行业管理软件 烟酒进销存店铺销售收银管理系统

声明:本站部分文章及图片源自用户投稿,如本站任何资料有侵权请您尽早请联系jinwei@zod.com.cn进行处理,非常感谢!

上一篇 2021年4月19日
下一篇 2021年4月19日

相关推荐