Exploring the highcharter
package.
My favorite part of statistical coding is hands-down making visualizations. It’s so rewarding to watch R spit out a cool graph at the end of all your code. Data visualizations are often what draw people into the story. I’m certainly guilty of scrolling through a long news article, ignoring the text, just to play around with the interactive graph they included.
Over the course of the past year, I’ve worked a lot with ggplot2
, an R package which allows you to create highly customizable data visualizations. I love ggplot2
, and the entire tidyverse
, but I figure it’s high time I explore new ways of creating data visualizations.
Enter: the highcharter
package.
Highcharts is, in the company’s own words, “the premier and fastest growing charting-tool in a competitive market.” It’s free for non-commercial use, but otherwise the license costs a pretty penny. highcharter
is a wrapper for the highchartsJS API (which is programmed in javascript).
The main building block of highcharter
is the hchart()
function. Let’s explore hchart()
using the lovely pdxTrees
package:
pdxTrees <- get_pdxTrees_parks()
crystalSprings <- pdxTrees %>%
# We have to subset the data or else the server becomes angered
filter(Park == "Crystal Springs Rhododendron Garden")
hchart(crystalSprings,
"point",
hcaes(x = Tree_Height,
y = DBH,
group = Condition))
So, very similar to ggplot2
, highcharter
allows you to input the data, define your choice of geometric object (in this case, we opted for points, but here is a list of all the possibilities), and define which “aesthetics” you want your variables to be matched to (in this case, tree height is mapped onto the y-position, diameter at breast height is mapped to the x-position, and the tree’s condition is mapped to the color of the points).
What’s super cool about highcharter
is that you don’t have to do anything special to make the charts interactive! Go on, try and hover the points in the above plot. People have grown to expect that data visualizations will be interactive, so it’s very useful to have a package that will make your charts interactive by default.
Here’s another example:
Obviously, pie charts aren’t beloved in the statistics community, but in defense of this pie chart, when you can hover over the sections and see the exact proportion, it becomes less of a big deal that it’s difficult to visually perceive the precise differences.
However, this example demonstrates something important about highcharts
: highcharter
will not do any work on the back end for you! ggplot2
is kind, and will often compute counts and proportions and whatnot without you having to explicitly do it yourself. highcharter
is not so helpful. In this case, I had to manually compute the proportions in the pie chart.
We can add additional information to our highchart
, like we would add layers to a ggplot2
graph, using the hc_add_series()
function.
mtTabor <- pdxTrees %>%
# Again, subsetting the data to make it easier for the servers to handle
filter(Park == "Mt Tabor Park",
Condition == "Good")
lobf <- loess(Carbon_Sequestration_lb ~ Carbon_Storage_lb, data = mtTabor)
fit <-
arrange(augment(lobf), Carbon_Storage_lb) %>%
mutate(.se = predict(lobf, se = TRUE)$se.fit)
t <- qt(0.975, predict(lobf, se = TRUE)$df)
carbon <- hchart(mtTabor,
"point",
hcaes(x = Carbon_Storage_lb,
y = Carbon_Sequestration_lb)) %>%
# The `highcharter` package also uses the pipe!
hc_add_series(fit,
"spline",
hcaes(x = Carbon_Storage_lb,
y = .fitted)) %>%
hc_add_series(fit,
"arearange",
hcaes(x = Carbon_Storage_lb,
low = .fitted - t*.se,
high = .fitted + t*.se), # Creates the confidence interval
color = "lavender",
zIndex = -3) # This makes the points overlay the shaded area
carbon
In this example, we overlaid our scatterplot with a line of best fit and shaded the confidence interval. Because highcharter
is not here to hold our hand, we had to fit the regression line ourselves using the loess()
function, and then add it to our data frame using the augment()
function (which lives in broom
). Then, we had to use the qt()
function (which we learned in Math 141!) to find the critical value for a 95% confidence interval, which we then used to determine the upper and lower boundaries of our arearange
series. (This website, under the “A more advanced example” heading, shows you how to do this process if you want another demonstration.) Clearly, this is a further illustration of the fact that highcharter
will categorically NOT compute anything for you on the back end.
Really makes you long for good old geom_smooth()
.
So far, we’ve been skipping out on a critically important part of creating an effective data visualization. Let’s figure out how to add a title, subtitle and axes labels.
carbon %>%
hc_title(
text = "How do existing carbon stores affect trees' sequestration rate?"
) %>%
hc_subtitle(
text = "Data analyzes trees in good condition in Mt. Tabor Park. Carbon
storage is how much carbon (in pounds) is bound up in the tree; carbon
sequestration rate is the amount of carbon (in pounds) the tree removes
from the atmosphere annually."
) %>%
hc_xAxis( # You have to input text as a list here!
title = list(text = "Carbon storage (lbs)")
) %>%
hc_yAxis(
title = list(text = "Carbon sequestration rate (lbs per year)")
)
(Here is a CSV file with more info on the variables of this dataset.)
Basically, put hc
in front of a feature on a graph, and it’s probably a highcharter
function. In this case hc_title()
and hc_subtitle()
allow you to input your title and subtitle, and hc_xAxis()
and hc_yAxis()
allow you to edit your axes.
So far, we’ve explored:
How to create a highchart using the hchart()
function
How to map variables to aesthetics using the hcaes()
argument
How to add additional layers using the hc_add_series()
function
And how to add titles, subtitles and axes labels using the hc_title()
, hc_subtitle()
hc_xAxis()
and hc_yAxis()
functions
I would consider these to be the five most fundamental things you need to know in order to make an effective graph (not including theoretical knowledge like which type of chart is the best to display your data, etc.).
Before I go over my closing thoughts on highcharter
, I want to take a moment to make some cool graphs utilizing the interesting features the package has to offer. Also just for fun!
treevalues <- pdxTrees %>%
filter(!is.na(Native)) %>%
group_by(Condition, Native) %>%
summarize(mean = mean(Structural_Value, na.rm = TRUE)) %>%
# The following line led me to realize that `hchart()` ignores factors
mutate(Condition = factor(Condition, levels = c("Good", "Fair", "Poor")))
treevalues$mean <- round(treevalues$mean, digits = 0)
hchart(treevalues,
"column",
hcaes(x = Condition,
y = mean,
group = Native)) %>%
hc_chart(options3d = list(enabled = TRUE,
beta = 10,
alpha = 10)) %>%
hc_title(
text = "The structural value of PDX trees"
) %>%
hc_subtitle(
text = "Grouped by the tree's condition (good, fair, poor), and whether it is a native species"
) %>%
hc_xAxis(
title = list(text = "Condition of tree")
) %>%
hc_yAxis(
title = list(text = "Average structural value (in dollars)")
) %>%
hc_legend(align = "center",
verticalAlign = "top",
layout = "horizontal")
Rendering this as 3D makes a fairly banal bar graph seem more visually interesting. You can make your graph 3D using the hc_chart()
function; the alpha
and beta
arguments both alter the perspective of the graph.
In this example, we computed correlations between the relevant quantitative variables in the pdxTrees
package, so you can easily compare between them. (Some of these are pretty interesting!) Being able to easily hover over each of the tiles in the correlation matrix to view the correlation coefficient makes this chart especially effective.
OK, before things get too long, I want to take a moment to reflect on some of the benefits and drawbacks of highcharter
, especially in comparison to ggplot2
.
In terms of the charts that it makes, I really see only benefits:
The default interactivity is very handy.
The charts are very elegant and visually appealing; I found myself not being too concerned with changing the color palettes or themes or anything, whereas the default settings for ggplot2
tend to look a little clunky (salmon and teal……).
There are a ton of options for different plot types – way more than ggplot2
.
In terms of user-friendliness, the major benefit is that the function names are intuitive (for instance, hc_title()
allows you to make a title). There are two main drawbacks, however:
There is not a lot of information online to help you navigate the package, outside of the resources the company itself provides. I think this is partially attributable to the fact that you have to pay for commercial use.
You are at a pretty significant disadvantage if you don’t understand javascript well enough to be able to reverse-engineer highcharts
from javascript to R. highcharter
is a wrapper for the highchartsJS
API, and a lot of the examples that you can find online of highchart
code is written in javascript or other programming languages, but rarely in R. As such, it was difficult to teach myself this package, because there are limited examples to work from, and the API reference guide was very technical and hard to understand.
Overall, is it better than ggplot2
? It could be, depending on what you’re looking for. I would recommend highcharter
for someone who needs to make simple, interactive, visually appealing charts. For more complex charts, unless you are willing to spend a lot of time sleuthing around and/or you understand javascript, I would stick with ggplot2
or whatever other data visualization package you are most comfortable with.
(As for me, I am definitely loyal to ggplot2
.)
Here are some useful websites if you want some more info on highcharter
or highcharts
in general.
Thank you for reading!
(Final word count: approx. 1250, text only)
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".