Evidence Based Physical Therapy

Basic Statistics for EBPT

Baseline Characteristics
Most comparative clinical trials include either a table or a paragraph in the text showing the baseline characteristics of the groups being studied. Such a table should show that the intervention and control groups are similar in terms of age and sex distribution and key prognostic variables. Important differences in these characteristics, even if due to chance, can pose a challenge to your interpretation of results. In this situation, adjustments can be made to allow for these differences and hence strengthen the argument.

Data Distribution and Statistical Tests
Numbers are often used to label the properties of things. We can assign a number to represent our height, weight, and so on. For properties like these, the measurements can be treated as actual numbers. We can, for example, calculate the average weight and height of a group of people by averaging the measurements. But consider an example in which we use numbers to label the property "city of origin," where 1= London , 2= Manchester , 3= Birmingham , and so on. We could still calculate the average of these numbers for a particular sample of cases, but we would be completely unable to interpret the result. The same would apply if we labelled the property "liking for x " with 1=not at all, 2=a bit, and 3=a lot. Again, we could calculate the "average liking," but the numerical result would be uninterruptable unless we knew that the difference between "not at all" and "a bit" was exactly the same as the difference between "a bit" and "a lot. "

Figure 1normal scew

All statistical tests are either parametric (that is, they assume that the data were sampled from a particular form of distribution, such as a normal distribution) or non-parametric (they make no such assumption). In general, parametric tests are more powerful than non-parametric ones and so should be used if possible.

Non-parametric tests look at the rank order of the values (which one is the smallest, which one comes next, and so on) and ignore the absolute differences between them. Statistical significance is more difficult to show with non-parametric tests.

Figure 2scew

Another consideration is the shape of the distribution from which the data were sampled.

A "normal" distribution is shown in Figure 1. (The term "normal" refers to the shape of the graph and is used because many biological phenomena show this pattern of distribution). Some biological variables such as body weight show "skew normal" distribution, as shown in figure 2.


Some Commonly Used Statistical Tests

Parametric test Example of equivalent non-parametric test Purpose of test Example
Two sample (unpaired) t test Mann-Whitney U test Compares two independent samples drawn from the same population To compare girls' heights with boys' heights
One sample (paired) t test Wilcoxon matched pairs test Compares two sets of observations on a single sample To compare weight of infants before and after a feed

One way analysis of variance (F test) using total sum of squares

Kruskall-Wallis analysis of variance by ranks

Effectively, a generalization of the paired t or Wilcoxon matched pairs test where three or more sets of observations are made on a single sample

To determine whether plasma glucose level is higher one hour, two hours, or three hours after a meal

Two way analysis of variance

Two way analysis of variance by ranks

As above, but tests the influence (and interaction) of two different co variates

In the above example, to determine if the results differ in male and female subjects

chi 2 test

Fisher's exact test

Tests the null hypothesis that the distribution of a discontinuous variable is the same in two (or more) independent samples

To assess whether acceptance into medical school is more likely if the applicant was born in Britain

Product moment correlation coefficient (Pearson's r)

Spearman's rank correlation coefficient (r s )

Assesses the strength of the straight line association between two continuous variables.

To assess whether and to what extent plasma HbA1 concentration is related to plasma triglyceride concentration in diabetic patients

Regression by least squares method

Non-parametric regression (various tests)

Describes the numerical relation between two quantitative variables, allowing one value to be predicted from the other

To see how peak expiratory flow rate varies with height

Multiple regression by least squares method

Non-parametric regression (various tests)

Describes the numerical relation between a dependent variable and several predictor variables (co variates)

To determine whether and to what extent a person's age, body fat, and sodium intake determine their blood pressure.

The r value (Pearson's product-moment correlation coefficient) is not valid unless the following criteria are fulfilled:

  • The data (or, more accurately, the population from which the data are drawn) should be normally distributed. If they are not, non-parametric tests of correlation should be used instead.
  • The two datasets should be independent (one should not automatically vary with the other). If they are not, a paired t test or other paired test should be used.
  • Only a single pair of measurements should be made on each subject. If repeated measurements are made, analysis of variance should be used instead.
  • Every r value should be accompanied by a P value, which expresses how likely an association of this strength would be to have arisen by chance, or a confidence interval, which expresses the range within which the "true" r value is likely to lie.


The term "regression" refers to a mathematical equation that allows one variable (the target variable) to be predicted from another (the independent variable). Regression, then, implies a direction of influence. In the case of multiple regression, a far more complex mathematical equation allows the target variable to be predicted from two or more independent variables (often known as co variables).


Probability and Confidence

The P value is the probability that any particular outcome would have arisen by chance. Standard scientific practice usually deems a P value of less than 1 in 20 (expressed as P0.05, and equivalent to a betting odds of 20 to 1) as "statistically significant" and a P value of less than 1 in 100 (P0.01) as "statistically highly significant."

A result in the statistically significant range (P0.05 or P0.01, depending on what is chosen as the cut off) suggests that the authors should reject the null hypothesis (the hypothesis that there is no real difference between two groups). But a P value in the non-significant range tells you that either there is no difference between the groups or that there were too few subjects to demonstrate such a difference if it existed.

A confidence interval allows you to estimate for both "positive" trials (those that show a statistically significant difference between two arms of the trial) and "negative" ones (those that seem to show no difference), whether the strength of the evidence is strong or weak, and whether the study is definitive (obviates the need for further similar studies).

If you repeated the same clinical trial hundreds of times, you would not get exactly the same result each time. But, on average, you would establish a particular level of difference (or lack of difference) between the two arms of the trial. In 90% of the trials the difference between two arms would lie within certain broad limits, and in 95% of the trials it would lie between certain, even broader, limits.

Now, if you conducted only one trial, how do you know how close the result is to the "real" difference between the groups? The answer is you don't. But by calculating, for example, the 95% confidence interval around your result, you will be able to say that there is a 95% chance that the "real" difference lies between these two limits.

Note that the larger the trial (or the larger the pooled results of several trials), the narrower the confidence interval - and, therefore, the more likely the result is to be definitive.

Calculating the "bottom line" effects on an intervention.

  Outcome event  
Group Yes No Total
Control group a b a + b
Experimental group c d c + d

Control event rate (CER) = risk of outcome event in control group = a/(a+b) Experimental event rate (EER) = risk of outcome event in experimental group = c/(c+d)

Relative risk reduction (RRR) = (CER - EER)/CER

Absolute risk reduction (ARR) = CER - EER

Number needed to treat (NNT) = 1/ARR=1/(CER - EER)


Odds ratio (OR) = (odds of outcome event v odds of no event) in intervention group/(odds of outcome event v odds of no event) in control group



Greenhalgh T. Statistics for the non-statistician. I. Different types of data need different statistical tests. BMJ 1997; 315(7104): 364-366. Link to this article online  / Or, pdf of article begins on page 4.

Greenhalgh T. Statistics for the non-statistician. II: "Significant" relations and their pitfalls. BMJ 1997; 315(7105): 422-425. Link to this article online / Or, pdf of article begins on page 5

Guyatt G, Jaeschke R, Heddle, N, Cook D, Shannon H, Walter S. Basic statistics for clinicians. 1. Hypothesis testing. Can Med Assoc J 1995;152:27-32.

Guyatt G, Jaeschke R, Heddle, N, Cook D, Shannon H, Walter S. Basic statistics for clinicians. 2. Interpreting study results: confidence intervals. Can Med Assoc J 1995;152:169-73.

Jaeschke R, Guyatt G, Shannon H, Walter S, Cook D, Heddle, N. Basic statistics for clinicians: 3. Assessing the effects of treatment: measures of association. Can Med Assoc J 1995;152:351-7. Correction to article.

Guyatt G, Walter S, Shannon H, Cook D, Jaeschke R, and Heddle N. Basic statistics for clinicians. 4. Correlation and regression. Can Med Assoc J 1995;152:497-504.

Support Research, Teaching, & Learning - Give to the HSL