Combine Entire Data Frames – Beside Each Other


Entire data frames may be put together either beside each other (thus increasing the number of variables) or below each other (thus increasing the number of cases) into a single, large table. Here we focus on combining data frames beside each other.

One of the functions that can do such an operation is bind_cols(). Note that bind_cols() may be applied ONLY to data frames with equal length (number of cases). If the data frames are different in length, you will have to use another function such as left_join or right_join (among others), which may add NA wherever necessary.
To illustrate how bind_cols() works, we will use the data frames Orange and Orange2 as examples. Orange2 is a data frame similar to Orange, with the difference that the values of the variable circumference have been multiplied by 5 using the following line of code:

Orange2 <- Orange %>% mutate_at(vars(circumference), list(~.*5))

We thus have the following two data frames Orange and Orange2:

 
Note that the data frames have three identical variables: Tree, age, and circumference. We can combine the two data frames with the following code:

bind_cols(Orange, Orange2)

which gives us a table with 35 rows (like the original data frames), but 6 columns instead of 3.

As you may see here, bind_cols() does not automatically work on recognizing identical variables, but rather copies all the variables of Orange2 to the right of the variables of Orange, and then adds a digit next to the name of a variable which has been encountered before.

bind_cols() does not (need to) rename variables if they have been used already once. Here is what happens when the data frames have (at least) one variable which is not common to both. Let’s use Orange3 which is similar to Orange, with the difference that the variable circumference has been renamed to circumferenceNEW using the following code:

Orange3 <- Orange %>% rename(circumferenceNEW = circumference)
head(Orange3)


Let’s put them together:

bind_cols(Orange, Orange3)


We end up with a table made of 35 observations and 6 variables, but neither circumference nor circumferenceNEW has been renamed. Still, Tree and age are found in duplicates.

  Fant du det du lette etter? Did you find this helpful?
[Average: 0]