The free and popular statistical programming language R contains a powerful graphing library called ggplot2. The ‘gg’ stands for the grammar of graphics based of the data visualization work done by Leland Wilkinson. The grammar of graphics ensures that the aesthetics and geometric proportions that are generated are representative of the actual values that underlie the graph.
The objective of this post is demonstrate some of the simple code one can implement in R to create simple and elegant graphs, best of all the code can be reused multiple times to create high quality representations of your data. The repeatability of creating graphs in R code automates many visualizations that are often painstakingly created in programs like PowerPoint and Excel. R also has many free libraries that can generate a vast array of graphs such as maps, circle graphs, network graphs, heat maps, etc. that are not readily available with Microsoft products. This post will focus on the most common and simple graphs used for data analysis, a subsequent post will contain more detailed and elaborate graphs.
CUSTOMIZED THEMES
In R you can create customized templates for graphs using ggplot. These themes control the appearance of text, axes, font type, font size, gridlines, etc. basically anything you can imagine controlling in the look and feel of the actual graph can be controlled via themes. I find the following theme visually appealing and simple and is used in all the graphs in this post:
my_theme <- theme(
panel.background = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
axis.title.x = element_text(colour = “black”,size = 12),
axis.text.x = element_text(colour = “black”, size = 12),
axis.title.y = element_text(colour = “black”,size = 12),
axis.text.y = element_text(colour = “black”, size = 12),
plot.title = element_text(colour = “black”, size = 16, face = “bold”),
axis.ticks = element_blank(), axis.text.y = element_blank() )
BAR CHARTS
Bar charts are one of the most fundamental and basic charts for data analysis. This is the code that created the simple and elegant bar chart above:
ggplot(pg_mean, aes(x = group, y = weight)) +
geom_bar(stat = “identity”, fill = “salmon1”, colour = “black”) +
ggtitle(“Title”) +
xlab(“x label”) + ylab(“y label”) +
geom_text(aes(y = weight+0.3, label= weight)) +
my_theme
Notice the structure of the of the syntax in ggplot code. The ggplot code first creates an object that tells R what is to be plotted. Then the geom_bar syntax tells R to create a bar chart. Once the chart is created the title is created with ggtitle and the x and y labels are also added to the graph in case the variable names are cryptic and not end-user friendly. The geom_text adds the labels at the top of the chart and finally the code with my predefined them called my_theme removes cleans up the appearance (removing gridlines, backgrounds, tick marks on axes, etc.).
#stacked bar graph ce <- arrange(cabbage_exp, Date, Cultivar)
ce<-ddply(ce, “Date”, transform, label_y =cumsum(Weight))
ce <-ddply(ce, “Date”, transform, label_y = cumsum(Weight)-0.1*Weight)
ggplot(ce, aes(x = Date, y = Weight, fill = Cultivar)) +
geom_bar(stat = “identity”, colour = “black”) +
ggtitle(“Title”) + xlab(“x label”) + ylab(“y label”) +
geom_text(aes(y = label_y + 0.8, label = Weight), colour = “black”) +
my_theme +
scale_fill_brewer(palette = “Pastel1”)
ggplot(cabbage_exp, aes(x = Date, y = Weight, fill = Cultivar)) +
geom_bar(position = “dodge”, colour = “black”, stat = “identity”) +
ggtitle(“Title”) + xlab(“x label”) + ylab(“y label”) +
geom_text(aes(y = Weight+0.1, label= Weight)) +
my_theme +
scale_fill_brewer(palette = “Pastel1”)
LINE CHARTS
The next set of graphs are line graphs. Line graphs are useful for depicting relationships across time.
#multiple line graph.
tg<-ddply(ToothGrowth, c(“supp”, “dose”), summarise, length = mean(len))
ggplot(tg, aes( x = dose, y = length, fill = supp)) +
geom_line() +
geom_point(size = 6, shape = 21) +
geom_text(aes(y = length+4, label= length))+
ggtitle(“Title”) +
xlab(“x label”) + ylab(“y label”) +
my_theme
#shaded area line graph.
sunspotyear <- data.frame(
year = as.numeric(time(sunspot.year)),
sunspots = as.numeric(sunspot.year)
)
ggplot(sunspotyear, aes (x = year, y = sunspots)) +
geom_area(fill = “salmon1”) +
geom_line() +
ggtitle(“Title”) +
xlab(“x label”) + ylab(“y label”) +
my_theme
SCATTER PLOTS
The next section contains one of the workhorses of statistical graphics, namely the scatter plot which is a great way of showing the relationship between two continuous variables.
#basic scatter plot.
ggplot(heightweight, aes (x = ageYear, y = heightIn)) +
geom_point(size = 4, shape = 21, colour = “black”, fill = “salmon1”) +
ggtitle(“Title”) +
ggtitle(“Title”) +
xlab(“x label”) + ylab(“y label”) +
my_theme
#scatter plot – mapping continous variable to size of dot.
ggplot(heightweight, aes(x=ageYear, y = heightIn, size = weightLb)) +
geom_point(fill = “salmon1”) +
ggtitle(“Title”) +
ggtitle(“Title”) +
xlab(“x label”) + ylab(“y label”) +
my_theme
#scatter plot matrix.
c2009 <- subset(countries , Year == 2009,
select = c(Name, GDP, laborrate, healthexp, infmortality))
pairs(c2009[, 2:5])