The Pearson product-moment correlation (often called Pearson’s *r*, among others) is a parametric test which measures the *linear* relationship between two variables. In brief, Pearson’s correlation virtually draws a line through the data points trying to make the best fit line; the coefficient tells you how well the data are “dispatched” relative to that line.

This test comes with assumptions, and one must check that everything is OK before going further:

- this is a parametric test, samples/variables must be normally distributed (run the Shapiro-Wilk test),
- the variables are continuous,
- the variables work in pairs,
- outliers are not allowed,
- the variances of these variables are “relatively” similar (Run Fisher’s
*F*-test).

Let’s see this with an example. Here, we consider the weight and height of 16 individuals. Both weight and height are continuous variables, arranged in pairs ( 1 weight entry and 1 height entry per individual).

We need to check that both variables are normally distributed:

weight<-c(84,64,73,78,70,79,74,68,73,63,62,69,54,64,66,70) height<-c(183,174,179,174,164,184,179,154,167,170,168,164,166,163,154,174) par(mfrow=c(1,2)) hist(weight, col="red", prob=TRUE) hist(height, col="green", prob=TRUE) shapiro.test(weight) shapiro.test(height)

As you may see with the histograms and using the Shapiro-Wilk test, both sets are normally distributed. Let’s draw the boxplots and check for similar variance:

par(mfrow=c(1,2)) boxplot(weight, main="weight") boxplot(height, main="height") var.test(weight,height)

Variances are apparently not significantly different according to Fisher’s *F* test, and no outlier seems to show up on the boxplots. We can proceed…

We may now vizualise these 2 variables in a scatter plot where we add a line of best fit:

plot(weight~height) abline(lm(weight~height))

Now that the assumptions are checked and that we have a quick idea of the linear relationship, let’s check Pearson’s product-moment correlation. The function is `cor.test()`

. Note that the function is the same as for Spearman’s *rho *and Kendall’s* tau. *The extra parameter `method=" "`

defines which correlation coefficient is to be considered in the test (choose between `"pearson"`

, `"spearman"`

and `"kendall"`

; if the parameter `method`

is omitted, the default test will be Pearson’s *r*).

In this test, the null hypothesis H_{0} states that there is no relationship between the variables.

cor.test(height, weight, method="pearson")

The test concludes that it is very unlikely that there exists no relationship between the variables (p-value under 0.05). The alternative hypothesis (there is a relationship…) is thus accepted.