# The American Statistical Association issues a statement on p-values: context, process and purpose

For the first time, the American Statistical Association (ASA) has issued a statement regarding p-values.

In this post I will attempt to present the salient points of that statement.

## Background:

In recent times, several members of the scientific community, and a few journals re-ignited the debate surrounding p-values. The ASA felt necessary to take a stand and issue a statement in the interest of the wider scientific/research community.

## Key Messages:

What is a p-value?

A p-value is the probability that (under a specified statistical model) a statistical summary of the data (for example, the sample mean difference between two compared groups) would be equal to or more extreme than its observed value.

Principles:

1. P-values can indicate how incompatible the data are with a specified statistical model.

Often the null hypothesis postulates the absence of an effect, such as no difference between two groups, or the absence of a relationship between a factor and an outcome. The smaller the p-value, the greater the statistical incompatibility of the data with the null hypothesis, if the underlying assumptions used to calculate the p-value hold. This incompatibility can be interpreted as casting doubt on or providing evidence against the null hypothesis or the underlying assumptions.

2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.

The p-value is NOT a statement about the truth of a null hypothesis, or about the probability that random chance produced the observed data. It is merely a statement about data in relation to a specified hypothetical explanation, and is not a statement about the explanation itself.

3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.

The widespread use of “statistical significance” (generally interpreted as “p ≤ 0.05”) as a license for making a claim of a scientific finding (or implied truth) leads to considerable distortion of the scientific process. Decisions should be based upon a detailed examination of ALL the evidence. Researchers should bring many contextual factors into play to derive scientific inferences, including the design of a study, the quality of the measurements, the external evidence for the phenomenon under study, and the validity of assumptions that underlie the data analysis.

4. Proper inference requires full reporting and transparency

P-values and related analyses should not be reported selectively. Conducting multiple analyses of the data and reporting only those with certain p-values (typically those passing a significance threshold) renders the reported p-values essentially uninterpretable. Cherry-picking promising findings, also known by such terms as data dredging, significance chasing, significance questing, selective inference and “p-hacking,” leads to a spurious excess of statistically significant results in the published literature and should be vigorously avoided.

5. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.

Smaller p-values do not necessarily imply the presence of larger or more important effects, and larger p-values do not imply a lack of importance or even lack of effect. Any effect, no matter how tiny, can produce a small p-value if the sample size or measurement precision is high enough, and large effects may produce unimpressive p-values if the sample size is small or measurements are imprecise.

6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.

A p-value near 0.05 taken by itself offers only weak evidence against the null hypothesis. Likewise, a relatively large p-value does not imply evidence in favor of the null hypothesis; many other hypotheses may be equally or more consistent with the observed data. For these reasons, data analysis should not end with the calculation of a p-value when other approaches are appropriate and feasible.

NO SINGLE INDEX SHOULD SUBSTITUTE FOR SCIENTIFIC REASONING.