# Group Cases

`group_by()` is a function that groups the cases (rows) of the table according to the different factors of a chosen categorical variable. When used alone, it transforms a data frame into a new table where the factors of the chosen variable are registered as grouped. The output table is then very similar to the original dataframe. When used in combination with a second function in a pipe (read about pipes here), `group_by()` splits the data frame by factor, applies the second function to each of the corresponding groups, and finally reassembles the data into a new table.

Let’s use the data frame `Orange` as an example. The top of the data frame looks like this:

`head(Orange)`

Here, for example, we group `Orange` by `age` and store the result in the object `Orange_grouped_by_age`:

`Orange_grouped_by_age <- Orange %>% group_by(age)`

As you see above here, the data look unchanged, but R says that there exist 7 groups for the variable `age` (yellow box).

If we then decide to calculate the mean of `circumference` for each factor of `age`, we may do so by applying `summarise(mean(circumference))` directly on `Orange_grouped_by_age`:

```Orange_grouped_by_age %>%
summarise(mean(circumference))```

We thus obtain a new table where the 7 rows show the mean of `circumference` for each factor of `age`.

For comparison, this is what the same code does when applied to `Orange` (the original data frame without grouping):

```Orange %>%
summarise(mean(circumference))```

Note that grouping is reversible, and that you may ungroup data in a table by using the function `ungroup()`. In our example, simply type:

`ungroup(Orange_grouped_by_age)`

As you may see, the line that used to show the groups is now gone.

Fant du det du lette etter? Did you find this helpful?
[Average: 0]