When plotting your dataset, you will often realize (or have the feeling) that there is no way that a simple, straight line can represent the data. You might think that there exists a curved relationship between the continuous response variable and the continuous predictor variable, in which case polynomial regression may help you.
Let’s take an example. We follow the growth of a rat (bodyweight in grams) between the 4th and the 20th week after birth. Here is the code for the dataframe:
rat.bodyweight <- c(65,99,123,148,172,194,212,230,248,276,288,296,307,321,325,337,345) week <- c(seq(from= 4, to=20)) rat.life.df <- data.frame(week, rat.bodyweight)
Of course, we shall start by plotting the data:
plot(rat.bodyweight~week, col="blue", pch=2, ylab="rat bodyweight (g)", xlab="week")
A quick look at this chart makes you realize that growth has not been very linear…
Let’s try to fit a linear model to see whether that fits anyway:
lm(rat.bodyweight~week) abline(lm(rat.bodyweight~week), col="green", lwd=3)
The model does not appear to be a great fit… The regression line (green line) underpredicts the data in the central part of the range while it overpredicts the data at both the start and the end of the range. You might thus be interested in running a polynomial regression to find a better curve to fit the data.
The way to go is to use
lm() to fit a linear model, while we introduced the parameter
poly() to set a degree of the polynomial. Assuming that we wish to get a second order polynomial model, we can run the following code:
polynomial2.lm <- lm(rat.bodyweight~poly(week,2)) polynomial2.lm
This output gives you the intercept as well as the coefficients for the model.
To plot it on the current graph, follow these steps:
week.range <- seq(min(week),max(week),1) fit.predict <- predict(polynomial2.lm, list(week=week.range), se.fit=T) lines(week.range, fit.predict$fit, col="red", lwd=3)
As you see here, the new curve in red fits much better the data.