The Pearson product-moment correlation (often called Pearson’s r, among others) is a parametric test which measures the linear relationship between two variables. In brief, Pearson’s correlation virtually draws a line through the data points trying to make the best fit line; the coefficient tells you how well the data are “dispatched” relative to that line.
This test comes with assumptions, and one must check that everything is OK before going further:
- this is a parametric test, samples/variables must be normally distributed (run the Shapiro-Wilk test),
- the variables are continuous,
- the variables work in pairs,
- outliers are not allowed,
- the variances of these variables are “relatively” similar (Run Fisher’s F-test).
Let’s see this with an example. Here, we consider the weight and height of 16 individuals. Both weight and height are continuous variables, arranged in pairs ( 1 weight entry and 1 height entry per individual).
We need to check that both variables are normally distributed:
weight<-c(84,64,73,78,70,79,74,68,73,63,62,69,54,64,66,70) height<-c(183,174,179,174,164,184,179,154,167,170,168,164,166,163,154,174) par(mfrow=c(1,2)) hist(weight, col="red", prob=TRUE) hist(height, col="green", prob=TRUE) shapiro.test(weight) shapiro.test(height)
As you may see with the histograms and using the Shapiro-Wilk test, both sets are normally distributed. Let’s draw the boxplots and check for similar variance:
par(mfrow=c(1,2)) boxplot(weight, main="weight") boxplot(height, main="height") var.test(weight,height)
Variances are apparently not significantly different according to Fisher’s F test, and no outlier seems to show up on the boxplots. We can proceed…
We may now vizualise these 2 variables in a scatter plot where we add a line of best fit:
Now that the assumptions are checked and that we have a quick idea of the linear relationship, let’s check Pearson’s product-moment correlation. The function is
cor.test(). Note that the function is the same as for Spearman’s rho and Kendall’s tau. The extra parameter
method=" " defines which correlation coefficient is to be considered in the test (choose between
"kendall"; if the parameter
method is omitted, the default test will be Pearson’s r).
In this test, the null hypothesis H0 states that there is no relationship between the variables.
cor.test(height, weight, method="pearson")
The test concludes that it is very unlikely that there exists no relationship between the variables (p-value under 0.05). The alternative hypothesis (there is a relationship…) is thus accepted.