Transform Variables


A couple of variants of the functions mutate() and transmute() which are described here allow for modifying the content of a selection of variable(s). This means that, instead of adding new columns with the results of a given transformation/operation, the contents of the original variables are replaced by these results.

These variants are:

 

Modify variables based on a condition with mutate_if()

mutate_if() modifies variables only if a given condition is fulfilled. The condition may be a logical operation which result is TRUE/FALSE. Such operations are for example is.numeric (which checks whether the variable is a number vs. a factor or text), is.integer (which checks whether the variable is an integer vs. decimal), is.factor,etc. Here, we take the following example where we multiply by 10 all the variables which are defined as numeric. Arithmetic operations, unlike other functions, must by written in list(~), and in our case, it will be written list(~.*10) :

Orange %>% mutate_if(is.numeric, list(~.*10))


Note that the values in Tree have not been multiplied by 10 since the variable is defined in the data frame as an ordered factor, not a number, as confirmed by the command str(Orange) which gives details about the nature of the variables:

As you can see here above thanks to str, Tree is defined as ordered factor.
We can also use mutate_if() to transform the nature of Tree from ordered factor to numeric:

Orange %>% mutate_if(is.factor, as.numeric) %>% str


 

Modify a selection of variables with mutate_at()

mutate_at() may be used to transform the content of a predefined selection of variables. In this function, the variables to be modified must be listed with vars(). Then, like in mutate_if(), arithmetic operations shall be applied with list(~), while functions may be simply added as is. Here we selectively transform age by multiplying it by 10:

Orange %>% mutate_at(vars(age), list(~.*10))


 
We may also use the function to transform age into a factor variable with as.factor. In this example, we use str again to reveal the nature of the variables:

Orange %>% mutate_at(vars(age), as.factor) %>% str

 

Modify variables based on a condition and then drop the unmodified variables with transmute_if()

In a similar way to mutate(), transmute() modifies the content of variables, but it actually drops the variables which are not modified in the output table (as we have seen here). This applies also to transmute_if(). transmute_if() modifies conditionally the content of variables like mutate_if(), but drops the rest. Here, we take the following example where we multiply by 10 all the variables which are defined as numeric:

Orange %>% transmute_if(is.numeric, list(~.*10))


As expected, age and circumference have been transformed and kept, while the variable Tree was
discarded.
 

Modify a selection of variables and then drop the unmodified variables with transmute_at()

Like mutate_at(), transmute_at() modifies the content of a selection of variables, but in this case the function discards unmodified variables in the output table. Here is an example where age is multiplied by 10, and is the only variable that shows up in the output:

Orange %>% transmute_at(vars(age), list(~*10))

  Fant du det du lette etter? Did you find this helpful?
[Average: 0]