**Linear regression** helps you simplifying a dataset by modelling and drawing a **straight line** representing this dataset. It is often used to find a relationship between a continuous response variable and a continuous independent/predictor variable. Examples are numerous: finding the relationship between bodyweight and height is one of them, for instance.

Let’s use the following dataset as an example:

bodyweight <- c(70, 75, 72, 58, 80, 80, 48, 56, 103, 51) size <- c(177, 178, 167, 153, 174, 177, 152, 134, 191, 136) dataset.df <- data.frame(bodyweight, size)

Everything starts with a plot. A scatter plot of the dataset is usually a good beginning.

plot(bodyweight~size, ylab="bodyweight (kg)", xlab="size (cm)", col="blue", pch = 10)

Then we try to fit a linear model with the function `lm()`

which we have already encountered when performing analysis of variance (ANOVA).

lm(bodyweight~size)

Note that you find in this output everything you need to **draw the expected line**: the **intercept** is clearly indicated (-56.2716) and is followed by the value of the **slope** (0.7661). Let’s add these values to the function `abline()`

with the syntax `abline(intercept, slope)`

which will create the regression line on the *existing* plot:

abline(-56.2716, 0.7661)

Note also that we can directly use the result of `lm()`

into the function `abline()`

to obtain the exact same graph:

abline(lm(bodyweight~size))

At all time, it is of course possible to store the result of `lm()`

into a vector for later use. Here we’ll simply call it `lin.mod`

. Using the function `summary()`

, more information about the model may be obtained:

lin.mod <-lm(bodyweight~size) summary(lin.mod)

This output provides you with several interesting values such as quartiles, median, minimum and maximum at the top, and the (adjusted) R-squared (R^{2}) at the bottom, which describes how well the model matches the data (NB: be careful when interpreting R-squared, see this blogpost for some info).

Finally, it is good practice to check the model by plotting the line in the following manner to visualize in a few plots how good your model fits with the actual data:

plot(lin.mod)