How to Determine a pValue When Testing a Null Hypothesis
A small value (typically ≤ 0.05) indicates strong evidence against the null hypothesis, so you reject the null hypothesis.
which is equivalent to rejecting the null hypothesis:
False. A small pvalue means the value of the statisticwe observed in the sample is unlikely to have occurred when thenull hypothesis is true. Hence, a .04 pvalue means it is evenmore unlikely the observed statistic would have occurred when the nullhypothesis is true than a .08 pvalue. The smaller thepvalue, the stronger the evidenceagainst the null hypothesis.
Here are three experiments to illustrate when the different approaches to statistics are appropriate. In the first experiment, you are testing a plant extract on rabbits to see if it will lower their blood pressure. You already know that the plant extract is a diuretic (makes the rabbits pee more) and you already know that diuretics tend to lower blood pressure, so you think there's a good chance it will work. If it does work, you'll do more lowcost animal tests on it before you do expensive, potentially risky human trials. Your prior expectation is that the null hypothesis (that the plant extract has no effect) has a good chance of being false, and the cost of a false positive is fairly low. So you should do frequentist hypothesis testing, with a significance level of 0.05.
which is equivalent to rejecting the null hypothesis:
A Bayesian would insist that you put in numbers just how likely you think the null hypothesis and various values of the alternative hypothesis are, before you do the experiment, and I'm not sure how that is supposed to work in practice for most experimental biology. But the general concept is a valuable one: as Carl Sagan summarized it, "Extraordinary claims require extraordinary evidence."
In the second experiment, you are going to put human volunteers with high blood pressure on a strict lowsalt diet and see how much their blood pressure goes down. Everyone will be confined to a hospital for a month and fed either a normal diet, or the same foods with half as much salt. For this experiment, you wouldn't be very interested in the P value, as based on prior research in animals and humans, you are already quite certain that reducing salt intake will lower blood pressure; you're pretty sure that the null hypothesis that "Salt intake has no effect on blood pressure" is false. Instead, you are very interested to know how much the blood pressure goes down. Reducing salt intake in half is a big deal, and if it only reduces blood pressure by 1 mm Hg, the tiny gain in life expectancy wouldn't be worth a lifetime of bland food and obsessive labelreading. If it reduces blood pressure by 20 mm with a confidence interval of ±5 mm, it might be worth it. So you should estimate the effect size (the difference in blood pressure between the diets) and the confidence interval on the difference.
which is equivalent to rejecting the null hypothesis:
Now instead of testing 1000 plant extracts, imagine that you are testing just one. If you are testing it to see if it kills beetle larvae, you know (based on everything you know about plant and beetle biology) there's a pretty good chance it will work, so you can be pretty sure that a P value less than 0.05 is a true positive. But if you are testing that one plant extract to see if it grows hair, which you know is very unlikely (based on everything you know about plants and hair), a P value less than 0.05 is almost certainly a false positive. In other words, if you expect that the null hypothesis is probably true, a statistically significant result is probably a false positive. This is sad; the most exciting, amazing, unexpected results in your experiments are probably just your data trying to make you jump to ridiculous conclusions. You should require a much lower P value to reject a null hypothesis that you think is probably true.
A related criticism is that a significant rejection of a null hypothesis might not be biologically meaningful, if the difference is too small to matter. For example, in the chickensex experiment, having a treatment that produced 49.9% male chicks might be significantly different from 50%, but it wouldn't be enough to make farmers want to buy your treatment. These critics say you should estimate the effect size and put a on it, not estimate a P value. So the goal of your chickensex experiment should not be to say "Chocolate gives a proportion of males that is significantly less than 50% (P=0.015)" but to say "Chocolate produced 36.1% males with a 95% confidence interval of 25.9 to 47.4%." For the chickenfeet experiment, you would say something like "The difference between males and females in mean foot size is 2.45 mm, with a confidence interval on the difference of ±1.98 mm."
which is equivalent to rejecting the null hypothesis:

the null hypothesis is rejected when it is true b.
A large value ( 0.05) indicates weak evidence against the null hypothesis, so you fail to reject the null hypothesis.

the null hypothesis is not rejected when it is false c.
m) Ifyou get a pvalue of 0.13, it means thatthe null hypothesis is true in 13% of all samples.

the research hypothesis is rejected when it is true d.
This is the critical value that we need if we want to reject the null hypothesis.
the research hypothesis is not rejected when it is false7221
This criticism only applies to twotailed tests, where the null hypothesis is "Things are exactly the same" and the alternative is "Things are different." Presumably these critics think it would be okay to do a onetailed test with a null hypothesis like "Foot length of male chickens is the same as, or less than, that of females," because the null hypothesis that male chickens have smaller feet than females could be true. So if you're worried about this issue, you could think of a twotailed test, where the null hypothesis is that things are the same, as shorthand for doing two onetailed tests. A significant rejection of the null hypothesis in a twotailed test would then be the equivalent of rejecting one of the two onetailed null hypotheses.
failing to reject the null hypothesis when it is false.
In the olden days, when people looked up P values in printed tables, they would report the results of a statistical test as "PPP>0.10", etc. Nowadays, almost all computer statistics programs give the exact P value resulting from a statistical test, such as P=0.029, and that's what you should report in your publications. You will conclude that the results are either significant or they're not significant; they either reject the null hypothesis (if P is below your predetermined significance level) or don't reject the null hypothesis (if P is above your significance level). But other people will want to know if your results are "strongly" significant (P much less than 0.05), which will give them more confidence in your results than if they were "barely" significant (P=0.043, for example). In addition, other researchers will need the exact P value if they want to combine your results with others into a .
failing to reject the null hypothesis when it is true.
Because the test statistic Z = −1.92 > −2.33, we do not reject the null hypothesis. There is insufficient evidence at the α = 0.01 level to conclude that the rate has been reduced.
rejecting the null hypothesis when it is true.
The probability that was calculated above, 0.030, is the probability of getting 17 or fewer males out of 48. It would be significant, using the conventional PP=0.03 value found by adding the probabilities of getting 17 or fewer males. This is called a onetailed probability, because you are adding the probabilities in only one tail of the distribution shown in the figure. However, if your null hypothesis is "The proportion of males is 0.5", then your alternative hypothesis is "The proportion of males is different from 0.5." In that case, you should add the probability of getting 17 or fewer females to the probability of getting 17 or fewer males. This is called a twotailed probability. If you do that with the chicken result, you get P=0.06, which is not quite significant.