Math 241 Blog: Better than `ggplot2`?

Show code

library(highcharter)
library(dplyr)
library(pdxTrees)
library(broom)
library(forcats)

My favorite part of statistical coding is hands-down making visualizations. It’s so rewarding to watch R spit out a cool graph at the end of all your code. Data visualizations are often what draw people into the story. I’m certainly guilty of scrolling through a long news article, ignoring the text, just to play around with the interactive graph they included.

Over the course of the past year, I’ve worked a lot with ggplot2, an R package which allows you to create highly customizable data visualizations. I love ggplot2, and the entire tidyverse, but I figure it’s high time I explore new ways of creating data visualizations.

Enter: the highcharter package.

Highcharts is, in the company’s own words, “the premier and fastest growing charting-tool in a competitive market.” It’s free for non-commercial use, but otherwise the license costs a pretty penny. highcharter is a wrapper for the highchartsJS API (which is programmed in javascript).

The main building block of highcharter is the hchart() function. Let’s explore hchart() using the lovely pdxTrees package:

Note: I will not be analyzing the charts that I make, because that would make this blog post way too long and detract from what we are really here to do: learn how to make highcharts!

pdxTrees <- get_pdxTrees_parks()

crystalSprings <- pdxTrees %>%
  # We have to subset the data or else the server becomes angered
  filter(Park == "Crystal Springs Rhododendron Garden")

hchart(crystalSprings, 
       "point", 
       hcaes(x = Tree_Height,
             y = DBH,
             group = Condition))

So, very similar to ggplot2, highcharter allows you to input the data, define your choice of geometric object (in this case, we opted for points, but here is a list of all the possibilities), and define which “aesthetics” you want your variables to be matched to (in this case, tree height is mapped onto the y-position, diameter at breast height is mapped to the x-position, and the tree’s condition is mapped to the color of the points).

What’s super cool about highcharter is that you don’t have to do anything special to make the charts interactive! Go on, try and hover the points in the above plot. People have grown to expect that data visualizations will be interactive, so it’s very useful to have a package that will make your charts interactive by default.

Here’s another example:

species <- pdxTrees %>%
  group_by(Common_Name) %>%
  summarize(n = n()) %>%
  arrange(desc(n)) %>%
  slice(1:10) %>%
  mutate(prop = n/sum(n))

hchart(species, 
       "pie", 
       hcaes(x = Common_Name, 
             y = prop))

Obviously, pie charts aren’t beloved in the statistics community, but in defense of this pie chart, when you can hover over the sections and see the exact proportion, it becomes less of a big deal that it’s difficult to visually perceive the precise differences.

However, this example demonstrates something important about highcharts: highcharter will not do any work on the back end for you! ggplot2 is kind, and will often compute counts and proportions and whatnot without you having to explicitly do it yourself. highcharter is not so helpful. In this case, I had to manually compute the proportions in the pie chart.

We can add additional information to our highchart, like we would add layers to a ggplot2 graph, using the hc_add_series() function.

mtTabor <- pdxTrees %>%
  # Again, subsetting the data to make it easier for the servers to handle
  filter(Park == "Mt Tabor Park",
         Condition == "Good")

lobf <- loess(Carbon_Sequestration_lb ~ Carbon_Storage_lb, data = mtTabor)

fit <- 
  arrange(augment(lobf), Carbon_Storage_lb) %>%
  mutate(.se = predict(lobf, se = TRUE)$se.fit)

t <- qt(0.975, predict(lobf, se = TRUE)$df)

carbon <- hchart(mtTabor,
       "point",
       hcaes(x = Carbon_Storage_lb,
             y = Carbon_Sequestration_lb)) %>%
  # The `highcharter` package also uses the pipe!
  hc_add_series(fit,
                "spline", 
                hcaes(x = Carbon_Storage_lb,
                      y = .fitted)) %>%
  hc_add_series(fit,
                "arearange",
                hcaes(x = Carbon_Storage_lb, 
                      low = .fitted - t*.se, 
                      high = .fitted + t*.se), # Creates the confidence interval
                color = "lavender",
                zIndex = -3) # This makes the points overlay the shaded area

carbon

In this example, we overlaid our scatterplot with a line of best fit and shaded the confidence interval. Because highcharter is not here to hold our hand, we had to fit the regression line ourselves using the loess() function, and then add it to our data frame using the augment() function (which lives in broom). Then, we had to use the qt() function (which we learned in Math 141!) to find the critical value for a 95% confidence interval, which we then used to determine the upper and lower boundaries of our arearange series. (This website, under the “A more advanced example” heading, shows you how to do this process if you want another demonstration.) Clearly, this is a further illustration of the fact that highcharter will categorically NOT compute anything for you on the back end.

Really makes you long for good old geom_smooth().

So far, we’ve been skipping out on a critically important part of creating an effective data visualization. Let’s figure out how to add a title, subtitle and axes labels.

carbon %>%
  hc_title(
    text = "How do existing carbon stores affect trees' sequestration rate?"
    ) %>%
  hc_subtitle(
    text = "Data analyzes trees in good condition in Mt. Tabor Park. Carbon 
    storage is how much carbon (in pounds) is bound up in the tree; carbon 
    sequestration rate is the amount of carbon (in pounds) the tree removes 
    from the atmosphere annually."
  ) %>%
  hc_xAxis( # You have to input text as a list here!
    title = list(text = "Carbon storage (lbs)")
    ) %>%
  hc_yAxis(
    title = list(text = "Carbon sequestration rate (lbs per year)")
  )

(Here is a CSV file with more info on the variables of this dataset.)

Basically, put hc in front of a feature on a graph, and it’s probably a highcharter function. In this case hc_title() and hc_subtitle() allow you to input your title and subtitle, and hc_xAxis() and hc_yAxis() allow you to edit your axes.

So far, we’ve explored:

How to create a highchart using the hchart() function
The different options for chart type
How to map variables to aesthetics using the hcaes() argument
How to add additional layers using the hc_add_series() function
And how to add titles, subtitles and axes labels using the hc_title(), hc_subtitle() hc_xAxis() and hc_yAxis() functions

I would consider these to be the five most fundamental things you need to know in order to make an effective graph (not including theoretical knowledge like which type of chart is the best to display your data, etc.).

Before I go over my closing thoughts on highcharter, I want to take a moment to make some cool graphs utilizing the interesting features the package has to offer. Also just for fun!

Show code

treevalues <- pdxTrees %>%
  filter(!is.na(Native)) %>%
  group_by(Condition, Native) %>%
  summarize(mean = mean(Structural_Value, na.rm = TRUE)) %>%
  # The following line led me to realize that `hchart()` ignores factors
  mutate(Condition = factor(Condition, levels = c("Good", "Fair", "Poor")))

treevalues$mean <- round(treevalues$mean, digits = 0)

hchart(treevalues,
       "column",
       hcaes(x = Condition,
             y = mean,
             group = Native)) %>%
  hc_chart(options3d = list(enabled = TRUE, 
                             beta = 10,
                             alpha = 10)) %>%
  hc_title(
    text = "The structural value of PDX trees"
    ) %>%
  hc_subtitle(
    text = "Grouped by the tree's condition (good, fair, poor), and whether it is a native species"
    ) %>%
  hc_xAxis(
    title = list(text = "Condition of tree")
    ) %>%
  hc_yAxis(
    title = list(text = "Average structural value (in dollars)")
  ) %>%
   hc_legend(align = "center",
             verticalAlign = "top",
             layout = "horizontal")

Rendering this as 3D makes a fairly banal bar graph seem more visually interesting. You can make your graph 3D using the hc_chart() function; the alpha and beta arguments both alter the perspective of the graph.

Show code

quantVars <- pdxTrees %>%
  select(DBH, Tree_Height, Crown_Width_NS, Crown_Width_EW, Structural_Value, 
         Carbon_Storage_value, Carbon_Sequestration_value, Pollution_Removal_value)

correlation <- cor(quantVars, use = "complete.obs")

hchart(correlation)

In this example, we computed correlations between the relevant quantitative variables in the pdxTrees package, so you can easily compare between them. (Some of these are pretty interesting!) Being able to easily hover over each of the tiles in the correlation matrix to view the correlation coefficient makes this chart especially effective.

OK, before things get too long, I want to take a moment to reflect on some of the benefits and drawbacks of highcharter, especially in comparison to ggplot2.

In terms of the charts that it makes, I really see only benefits:

The default interactivity is very handy.
The charts are very elegant and visually appealing; I found myself not being too concerned with changing the color palettes or themes or anything, whereas the default settings for ggplot2 tend to look a little clunky (salmon and teal……).
There are a ton of options for different plot types – way more than ggplot2.

In terms of user-friendliness, the major benefit is that the function names are intuitive (for instance, hc_title() allows you to make a title). There are two main drawbacks, however:

There is not a lot of information online to help you navigate the package, outside of the resources the company itself provides. I think this is partially attributable to the fact that you have to pay for commercial use.
You are at a pretty significant disadvantage if you don’t understand javascript well enough to be able to reverse-engineer highcharts from javascript to R. highcharter is a wrapper for the highchartsJS API, and a lot of the examples that you can find online of highchart code is written in javascript or other programming languages, but rarely in R. As such, it was difficult to teach myself this package, because there are limited examples to work from, and the API reference guide was very technical and hard to understand.

Overall, is it better than ggplot2? It could be, depending on what you’re looking for. I would recommend highcharter for someone who needs to make simple, interactive, visually appealing charts. For more complex charts, unless you are willing to spend a lot of time sleuthing around and/or you understand javascript, I would stick with ggplot2 or whatever other data visualization package you are most comfortable with.

(As for me, I am definitely loyal to ggplot2.)

Here are some useful websites if you want some more info on highcharter or highcharts in general.

Thank you for reading!

(Final word count: approx. 1250, text only)

Better than ggplot2?

Reuse