Group Cases


group_by() is a function that groups the cases (rows) of the table according to the different factors of a chosen categorical variable. When used alone, it transforms a data frame into a new table where the factors of the chosen variable are registered as grouped. The output table is then very similar to the original dataframe. When used in combination with a second function in a pipe (read about pipes here), group_by() splits the data frame by factor, applies the second function to each of the corresponding groups, and finally reassembles the data into a new table.

 
Let’s use the data frame Orange as an example. The top of the data frame looks like this:

head(Orange)


 
Here, for example, we group Orange by age and store the result in the object Orange_grouped_by_age:

Orange_grouped_by_age <- Orange %>% group_by(age)


As you see above here, the data look unchanged, but R says that there exist 7 groups for the variable age (yellow box).

If we then decide to calculate the mean of circumference for each factor of age, we may do so by applying summarise(mean(circumference)) directly on Orange_grouped_by_age:

Orange_grouped_by_age %>% 
  summarise(mean(circumference))


We thus obtain a new table where the 7 rows show the mean of circumference for each factor of age.
 
For comparison, this is what the same code does when applied to Orange (the original data frame without grouping):

Orange %>%
  summarise(mean(circumference))


 
Note that grouping is reversible, and that you may ungroup data in a table by using the function ungroup(). In our example, simply type:

ungroup(Orange_grouped_by_age)


As you may see, the line that used to show the groups is now gone.

  Fant du det du lette etter? Did you find this helpful?
[Average: 0]