Linear regression helps you simplifying a dataset by modelling and drawing a straight line representing this dataset. It is often used to find a relationship between a continuous response variable and a continuous independent/predictor variable. Examples are numerous: finding the relationship between bodyweight and height is one of them, for instance.
Let’s use the following dataset as an example:
bodyweight <- c(70, 75, 72, 58, 80, 80, 48, 56, 103, 51) size <- c(177, 178, 167, 153, 174, 177, 152, 134, 191, 136) dataset.df <- data.frame(bodyweight, size)
Everything starts with a plot. A scatter plot of the dataset is usually a good beginning.
plot(bodyweight~size, ylab="bodyweight (kg)", xlab="size (cm)", col="blue", pch = 10)
Then we try to fit a linear model with the function
lm() which we have already encountered when performing analysis of variance (ANOVA).
Note that you find in this output everything you need to draw the expected line: the intercept is clearly indicated (-56.2716) and is followed by the value of the slope (0.7661). Let’s add these values to the function
abline() with the syntax
abline(intercept, slope) which will create the regression line on the existing plot:
Note also that we can directly use the result of
lm() into the function
abline() to obtain the exact same graph:
At all time, it is of course possible to store the result of
lm() into a vector for later use. Here we’ll simply call it
lin.mod. Using the function
summary(), more information about the model may be obtained:
lin.mod <-lm(bodyweight~size) summary(lin.mod)
This output provides you with several interesting values such as quartiles, median, minimum and maximum at the top, and the (adjusted) R-squared (R2) at the bottom, which describes how well the model matches the data (NB: be careful when interpreting R-squared, see this blogpost for some info).
Finally, it is good practice to check the model by plotting the line in the following manner to visualize in a few plots how good your model fits with the actual data: