Thursday December 18th, 2014 by Heiko Tietze
R is an open source programming language for statistical computing and graphics (#12 in TIOBE index 2014). Along its analytical capabilities and the extensive collection of user packages it is known for beautiful graphics. However, since R implies the use of a command line interface a lot of people are afraid of writing the code (some graphical user interfaces are available but they do not refrain from entering commands).
Lillis’ book ‘R Graph Essentials’ is an attempt to introduce R to beginners, in particular how to create graphics. He starts with the simple plot() command and explains the most relevant parameters like:
plot(var1, var2, type="o", pch=17, cex=1.2, col="darkgreen", ylim=c(0, 20)) lines(var2, type="o", pch=16, lty=2, col="blue")
Formula 1: Plot var1 vs var2 using overplotted points, filled triangles, a slightly increase font size, dark green color, predefined range on the y-axis, and overlay this plot by lines with double width and in blue color.
Other types of graphics presented in this book are barplots and histograms, boxplots, pieplots, and dotcharts. You will learn how to change color, style, axis, legend, and how to include textual descriptions, smoothers, or regression lines in your plots.
After familiarization with R’s generic way of plotting data the author introduces the advanced functions qplot() (‘quick plot’) and ggplot() (‘grammar of graphics’), both from the library ggplot2. Graphics are initiated with a certain geometry (point, lines, bar…) and aesthetics for visual attributes (color, shape, size…). The library makes it easy to control graphics by layers that can be assigned additionally. For instance:
myPlot <- qplot(var1,var2,data=myData,geom="line",size=var3,color=var4) myPlot + scale_size_manual(values = c(5,7))
Formula 2: Assign the parametrized function to variable myPlot and execute it with individual options (set size to 5x and 7x).
In case of ggplot() it looks even more tricky on the first glance since the function has a complex syntax itself. On the other hand this complexity offers full customization, though. And you do not need to write obfuscated code:
myPlot <- ggplot(myData, aes(x = var1, y = var2)) myPlot <- myPlot + labs(x = "Independent", y = "Dependent", title = "Test") myPlot <- myPlot + geom_lines(aes(color = var3, size = var4)) myPlot <- myPlot + scale_size_manual(values = c(5,7)) myPlot <- myPlot + theme(panel.background = element_rect(fill = "ivory")) myPlot <- myPlot + theme(legend.position="none") myPlot #do plot
Formula 3: Similar plot as in figure 2 using ggplot() plus additional tweaks.
The book ‘R Graph Essentials’ is written straightforward, easy to read and easy to understand. Lillis starts to explain a certain function and repeats it then several times in a chapter. This approach supports learning but fails when, for instance, color is being modified again and again. Some functions and properties are introduced “on the fly”. That makes it hard to find particular information again, like ‘writing functions to create graphs’ which is located below the chapter about scatterplot matrices (p. 91). And how to read data sets, calculate linear fits (p.34), subsetting data before graphing’ (p. 98), formatting data (p. 131) should be separated from the graphical lessons. In general, the organization of the book in terms of table of contents is weird. So boxplot() and piechart() are discussed under the section ‘bar charts’.
Although the book gives a good overview, some core features of R are missing. For example symbol() to draw shapes, image() and contour() that are useful for heatmaps, and the overlay of external images (can be done by using the library ‘png’). And a quite important but only casually discussed function is par() that allows to configure several plot properties on a low level. The book is obviously not written with an encyclopedic claim. For almost all functions good descriptions and examples can be found on the Internet. And the author refers to those pages at several places.
R takes time. Not to learn the language but to play with a myriad of options in order to plot the hell out of your data. You only need to start. And to clear the first hurdle it might be helpful to read Lillis’ books.