Summarize Cases


The function summarize() (which may also be written summarise()) creates a table in which you will find the result(s) of the summary function(s) you have chosen to apply to a data frame. The summary functions may be:

  • mean(): which returns the mean of a variable,
  • sd(): which returns the standard deviation of a variable,
  • median(): which returns the median of a variable,
  • min(): which returns the minimum value of a variable,
  • max(): which returns the maximum value of a variable,
  • var(): which returns the variance of a variable,
  • sum(): which returns the sum of a variable,
  • etc.

To apply one or more of these summary functions to a data frame, you just have to indicate in summarise() which function(s) you want to apply and on which variable of the data frame. The syntax is:

summarise(dataframe, function1(variable), function2(variable), ...) 

Alternatively, using pipes, the syntax is:

dataframe %>%
  summarise(function1(variable), function2(variable), ...) 

 
 
Let’s use the data frame Orange as an example. The top of the data frame looks like this:

head(Orange)

To calculate the mean and the standard deviation of the variable circumference, we write either

summarise(Orange, mean(circumference), sd(circumference))

OR

Orange %>%
  summarise(mean(circumference), sd(circumference))

which both result in:

 
This example actually does not make much sense in terms of biology. Indeed, we have calculated the average of circumference for different trees, but considering measurements performed at 7 different time points… Instead we could calculate the average circumference and standard deviation for each time point described in age by using group_by on the variable age (read more about group_by here).

To calculate the group means and standard deviations of the variable circumference, we write:

Orange %>% 
  group_by(age) %>% 
  summarise(mean(circumference), sd(circumference))

which results in:

 
Each line in the result table now shows the mean and standard deviation for each of 7 factors in age described in the first column.

  Fant du det du lette etter? Did you find this helpful?
[Average: 0]