It is possible to overlay two or more histograms of frequency of values originating from a single numerical variable (a single column in a dataframe) if these values are associated with a categorical/nominal variable (stored in another column of the dataframe). Let’s take the following example where 200 random, normally distributed values in the column `values`

of the dataframe `my.dataframe`

are associated to the label “first” in the column `class`

and 200 random, normally distributed values in that same column `values`

are associated to “second”.

Here is the code for the dataframe:

ID <- 1:400 values <- c(rnorm(200, mean=80, sd=12), rnorm(200, mean=20, sd=12)) class <- c(rep("first", 200), rep("second",200)) my.dataframe <- data.frame(ID, values, class)

We wish to build a single figure where one histogram displays the frequency distribution of the “first” values, while a second histogram displays the frequency distribution of the “second” values. The strategy is to use the `fill=class`

to tell aes() to separate the values according to the variable `class`

. Additionally we use `bins=25`

to determine the number of bars in the histograms. The code is as follows:

ggplot(my.dataframe, aes(values, fill=class)) + geom_histogram(bins= 25)

Interestingly, the fact of separating classes with `fill=`

automatically creates a legend to the right side of the chart.

We can play a bit more with the look of the histograms by making them slightly transparent. To do so, we use `alpha=.5`

ggplot(my.dataframe, aes(values, fill=class, alpha =.5)) + geom_histogram(bins= 25)

Here we also get to realize, looking at the graph in the range of values between 40 and 60, that the counts for “first” and “second” are added on top of each other, not merged since the pink and turquoise colors are clearly separated and not mixed.

`rnorm()`

.
First, let’s load the dataframe that contains the dataset:

ID <- 1:200 data <- rnorm(200, mean=65, sd=15) my.dataframe <- data.frame(ID, data)

Now, let’s map the data from the variable `values`

by typing

ggplot(my.dataframe, aes(values))and use

geom_histogram()to draw the bars:

ggplot(my.dataframe, aes(values)) + geom_histogram()

The result is:

Note that the console displays the following warning:

This warning tells you that ggplot2 has split your plot in 30 bars (or bins) *by default*. It is up to you to define whether or not this was a good way to represent your data. You are strongly advised to try several options, with fewer or more bins. To define a specific bin number, use `bins=`

inside `geom_histogram()`

; you may as well indicate a specific binwidth using `binwidth=`

. Here is an example with 60 bins:

ggplot(my.dataframe, aes(values)) + geom_histogram(bins=60)

And here is an example with a binwidth of 10:

ggplot(my.dataframe, aes(values)) + geom_histogram(binwidth=10)

Based on the same dataset, one can use `qplot()`

to draw the same histograms.

In `qplot()`

, we first indicate that we want to use the variable `values`

from the dataframe `my.dataframe`

, and we indicate with `geom`

that we want a histogram:

qplot(values, data = my.dataframe, geom = "histogram")

Again, there comes a warning about bins/binwidth:

And again you can play with `binwidth=`

or `bins=`

to draw the histogram the way you wish:

qplot(values, data = my.dataframe, geom = "histogram", binwidth=10)]]>

ggplot2 is an R package for producing statistical, or data, graphics, but it is unlike most other graphics packages because it has a deep underlying grammar. This grammar, based on the Grammar of Graphics (Wilkinson, 2005), is made up of a set of independent components that can be composed in many different ways. This makes ggplot2 very powerful because you are not limited to a set pre-specified graphics, but you can create new graphics that are precisely tailored for your problem.

**– Hadley Wickham, ggplot2, 2016**

As you may understand from the author’s words, ggplot2 is an R package written specifically to help you make graphs and charts in R, in a way that makes the coding more comprehensible. That being said, let’s be fair and say that you do not NEED to use ggplot2 to create proper graphs in R. The built-in R base graphics are good enough, and if you have learned R base and understood the code and its details, arguments, parameters, you will certainly get the graph you want with not too much effort.

So why should I convince you to use ggplot2 instead of R base graphics? Well, i’m not going to (try to) convince you. There is already a long debate about why to use ggplot2 vs. R base, and you may find good or bad arguments here and there. Instead of that, we are going to look at the basics of ggplot2, and learn how to use it to create simple graphs and charts. Then you will decide on your own whether you like it or prefer R base graphics.

]]>`install.packages()`

command.Simply type the following code in your console:

install.packages("ggplot2")

The following “victory screen” should appear:

Do not forget to activate thee package using `library()`

library(ggplot2)

Despite the warning, you are now ready to play with ggplot2.

`ggplot()`

and `qplot()`

(for quick plot). While `ggplot()`

is quite a powerful, flexible tool to create advanced graphs by the mean of multiple layers and arguments, `qplot()`

is a simpler function that can be useful to produce a quick graph with minimalist code and syntax.
There are, of course, plenty of differences between the two functions. For instance, `qplot()`

accepts simple vectors for loading the dataset, while `ggplot()`

needs dataframes; `qplot()`

will pick a chart type (for example scatter plot instead of bar chart) based on the variables you feed it, while `ggplot()`

will depend on a specific parameter (geometry) you have indicated in your code.

Let’s take an example to see how these functions work practically. Here, we take a dataset that we have used earlier in the seventh chapter of our Introduction to R in order to draw a simple plot.

y <- c(45,12,48,79,65,32,78,95,12,75) x <- c(1,2,3,4,5,6,7,8,9,10) my.dataframe<-data.frame(y,x) my.dataframe

If we want to use `qplot()`

, we shall write

qplot(x, y)to obtain this:

“Amazingly”, the resulting plot looks rather good, with a grey background, a grid, ticks and labels on the axes, and a series of black dots to reveal your data. And here is a quick comparison with the result that you would have obtained if you had used Rbase graphics and the function `plot()`

as described here (and coded as follows:

plot(y~x)):

Aesthetically talking, the graph generated by `qplot()`

looks much better and presentable that the one produced by R base graphics. So there is not much of a doubt that you will prefer ggplot2 for its `qplot()`

function.

Now, what about `ggplot()`

? Well, in order to code for the same graph, we shall write

ggplot(my.dataframe, aes(x,y)) + geom_point(). This is a bit more complex but the result is identical:

OK. So, same result, but much more to write with `ggplot()`

? And possibly a more complex syntax? What is the point? Why choosing `ggplot()`

over `qplot()`

? Well, it essentially depends on the use you will have for the plot. If you just need to plot your data to get a quick overview, go ahead with `qplot()`

. If you plan to plot something simple like a scatter plot, a single line plot, a simple bar graph), so go ahead with `qplot()`

. But if you plan to make a fine plot ready for publication, for an exam or a report, and you need something finer, that you can tune exactly as you wish (and not only just the color of the dots, then start immediately with `ggplot()`

as it will be certainly easier to tweak all parameters at any time with not too much effort. This is where its reknown flexibility will be a bonus.