Felix Schönbrodt

Dr. Dipl.-Psych.

Visually weighted/ Watercolor Plots, new variants: Please vote!

Update Oct-23: Added a new parameter

add

to the function. Now multiple groups can be plotted in a single plot (see example in my comment)

As a follow-up on my R implementation of Solomon’s watercolor plots, I made some improvements to the function. I fine-tuned the graphical parameters (the median smoother line now diminishes faster with increasing CIs, and the shaded watercolors look more pretty). Furthermore, the function is faster and has more features:

  • You can define any standard regression function for the bootstrap procedure.
    • vwReg(y ~ x, df, method=lm)
    • vwReg(y ~ x + I(x^2), df, method=lm)
  • Provide parameters for the fitting function.
    • You can make the smoother’s span larger. Then it takes more points into account when doing the local fitting. Per default, the smoother fits a polynomial of degree two – that means as you increase span you will approach the overall quadratic fit: vwReg(y ~ x, df, span=2)
    • You can also make the smoother’s span smaller, then it takes less points for local fitting. If it is too small, it will overfit and approach each single data point. The default span (.75) seemed to be the best choice for me for a variety of data sets: vwReg(y ~ x, df, span=0.5)
    • Use a robust M-estimator for the smoother; see ?loess for details: vwReg(y ~ x, df, family=”symmetric”)
  • Provide your own color scheme (or, for example, a black-and-white scheme). Examples see pictures below.
  • Quantize the color ramp, so that regions for 1, 2, and 3 SD have the same color (an idea proposed by John Mashey).

At the end of this post is the source code for the R function.

Some picture – please vote!

Here are some variants of the watercolor plots – at the end, you can vote for your favorite (or write something into the comments). I am still fine-tuning the default parameters, and I am interested in your opinions what would be the best default.

Plot 1: The current default

 

Plot 2: Using an M-estimator for bootstrap smoothers. Usually you get wider confidence intervalls.

 

Plot 3:Increasing the span of the smoothers

 

Plot 4: Decreasing the span of the smoothers

Plot 5: Changing the color scheme, using a predefined ColorBrewer palette. You can see all available palettes by using this command: library(RColorBrewer); display.brewer.all()

 

Plot 6: Using a custom-made palette

 

Plot 7: Using a custom-made palette; with the parameter bias you can shift the color ramp to the “higher” colors:

 

Plot 8: A black and white version of the plot

 

Plot 9: The anti-Tufte-plot: Using as much ink as possible by reversing black and white (a.k.a. “the Milky-Way Plot“)

Plot 10: The Northern Light Plot/ fMRI plot. This plotting technique already has been used by a suspicious company (called IRET – never heard of that). I hurried to publish the R code under a FreeBSD license before they can patent it! Feel free to use, share, or change the code for whatever purpose you need. Isn’t that beautiful?

Plot 11: The 1-2-3-SD plot. You can use your own color schemes as well, e.g.: vwReg(y~x, df, bw=TRUE, quantize=”SD”)

 

Any comments or ideas? Or just a vote? If you produce some nice plots with your data, you can send it to me, and I will post a gallery of the most impressive “data art”!

Cheers,

Felix

Which water color plot do you like most?

View Results

Loading ... Loading ...
# Copyright 2012 Felix Schönbrodt
# All rights reserved.
#
# FreeBSD License
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met:
#
# 1. Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
#      
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#      
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER `AS IS'' AND ANY
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
# The views and conclusions contained in the software and documentation
# are those of the authors and should not be interpreted as representing
# official policies, either expressed or implied, of the copyright
# holder.

# Version history:
# 0.1: original code
# 0.1.1: changed license to FreeBSD; re-established compability to ggplot2 (new version 0.9.2)

## Visually weighted regression / Watercolor plots
## Idea: Solomon Hsiang, with additional ideas from many blog commenters


# B = number bootstrapped smoothers
# shade: plot the shaded confidence region?
# shade.alpha: should the CI shading fade out at the edges? (by reducing alpha; 0 = no alpha decrease, 0.1 = medium alpha decrease, 0.5 = strong alpha decrease)
# spag: plot spaghetti lines?
# spag.color: color of spaghetti lines
# mweight: should the median smoother be visually weighted?
# show.lm: should the linear regresison line be plotted?
# show.CI: should the 95% CI limits be plotted?
# show.median: should the median smoother be plotted?
# median.col: color of the median smoother
# shape: shape of points
# method: the fitting function for the spaghettis; default: loess
# bw = TRUE: define a default b&w-palette
# slices: number of slices in x and y direction for the shaded region. Higher numbers make a smoother plot, but takes longer to draw. I wouldn'T go beyond 500
# palette: provide a custom color palette for the watercolors
# ylim: restrict range of the watercoloring
# quantize: either "continuous", or "SD". In the latter case, we get three color regions for 1, 2, and 3 SD (an idea of John Mashey)
# add: if add == FALSE, a new ggplot is returned. If add == TRUE, only the elements are returned, which can be added to an existing ggplot (with the '+' operator)
# ...: further parameters passed to the fitting function, in the case of loess, for example, "span = .9", or "family = 'symmetric'"
vwReg <- function(formula, data, title="", B=1000, shade=TRUE, shade.alpha=.1, spag=FALSE, spag.color="darkblue", mweight=TRUE, show.lm=FALSE, show.median = TRUE, median.col = "white", shape = 21, show.CI=FALSE, method=loess, bw=FALSE, slices=200, palette=colorRampPalette(c("#FFEDA0", "#DD0000"), bias=2)(20), ylim=NULL, quantize = "continuous",  add=FALSE, ...) {
    IV <- all.vars(formula)[2]
    DV <- all.vars(formula)[1]
    data <- na.omit(data[order(data[, IV]), c(IV, DV)])
   
    if (bw == TRUE) {
        palette <- colorRampPalette(c("#EEEEEE", "#999999", "#333333"), bias=2)(20)
    }

    print("Computing boostrapped smoothers ...")
    newx <- data.frame(seq(min(data[, IV]), max(data[, IV]), length=slices))
    colnames(newx) <- IV
    l0.boot <- matrix(NA, nrow=nrow(newx), ncol=B)
   
    l0 <- method(formula, data)
    for (i in 1:B) {
        data2 <- data[sample(nrow(data), replace=TRUE), ]
        data2 <- data2[order(data2[, IV]), ]
        if (class(l0)=="loess") {
            m1 <- method(formula, data2, control = loess.control(surface = "i", statistics="a", trace.hat="a"), ...)
        } else {
            m1 <- method(formula, data2, ...)
        }
        l0.boot[, i] <- predict(m1, newdata=newx)
    }

    # compute median and CI limits of bootstrap
    library(plyr)
    library(reshape2)
    CI.boot <- adply(l0.boot, 1, function(x) quantile(x, prob=c(.025, .5, .975, pnorm(c(-3, -2, -1, 0, 1, 2, 3))), na.rm=TRUE))[, -1]
    colnames(CI.boot)[1:10] <- c("LL", "M", "UL", paste0("SD", 1:7))
    CI.boot$x <- newx[, 1]
    CI.boot$width <- CI.boot$UL - CI.boot$LL
   
    # scale the CI width to the range 0 to 1 and flip it (bigger numbers = narrower CI)
    CI.boot$w2 <- (CI.boot$width - min(CI.boot$width))
    CI.boot$w3 <- 1-(CI.boot$w2/max(CI.boot$w2))

   
    # convert bootstrapped spaghettis to long format
    b2 <- melt(l0.boot)
    b2$x <- newx[,1]
    colnames(b2) <- c("index", "B", "value", "x")

    library(ggplot2)
    library(RColorBrewer)
   
    # Construct ggplot
    # All plot elements are constructed as a list, so they can be added to an existing ggplot
   
    # if add == FALSE: provide the basic ggplot object
    p0 <- ggplot(data, aes_string(x=IV, y=DV)) + theme_bw()
   
    # initialize elements with NULL (if they are defined, they are overwritten with something meaningful)
    gg.tiles <- gg.poly <- gg.spag <- gg.median <- gg.CI1 <- gg.CI2 <- gg.lm <- gg.points <- gg.title <- NULL

    if (shade == TRUE) {
        quantize <- match.arg(quantize, c("continuous", "SD"))
        if (quantize == "continuous") {
            print("Computing density estimates for each vertical cut ...")
            flush.console()
       
            if (is.null(ylim)) {
                min_value <- min(min(l0.boot, na.rm=TRUE), min(data[, DV], na.rm=TRUE))
                max_value <- max(max(l0.boot, na.rm=TRUE), max(data[, DV], na.rm=TRUE))
                ylim <- c(min_value, max_value)
            }

            # vertical cross-sectional density estimate
            d2 <- ddply(b2[, c("x", "value")], .(x), function(df) {
                res <- data.frame(density(df$value, na.rm=TRUE, n=slices, from=ylim[1], to=ylim[2])[c("x", "y")])
                #res <- data.frame(density(df$value, na.rm=TRUE, n=slices)[c("x", "y")])
                colnames(res) <- c("y", "dens")
                return(res)
            }, .progress="text")

            maxdens <- max(d2$dens)
            mindens <- min(d2$dens)
            d2$dens.scaled <- (d2$dens - mindens)/maxdens  
       
            ## Tile approach
            d2$alpha.factor <- d2$dens.scaled^shade.alpha
            gg.tiles <-  list(geom_tile(data=d2, aes(x=x, y=y, fill=dens.scaled, alpha=alpha.factor)), scale_fill_gradientn("dens.scaled", colours=palette), scale_alpha_continuous(range=c(0.001, 1)))
        }
        if (quantize == "SD") {
            ## Polygon approach
           
            SDs <- melt(CI.boot[, c("x", paste0("SD", 1:7))], id.vars="x")
            count <- 0
            d3 <- data.frame()
            col <- c(1,2,3,3,2,1)
            for (i in 1:6) {
                seg1 <- SDs[SDs$variable == paste0("SD", i), ]
                seg2 <- SDs[SDs$variable == paste0("SD", i+1), ]
                seg <- rbind(seg1, seg2[nrow(seg2):1, ])
                seg$group <- count
                seg$col <- col[i]
                count <- count + 1
                d3 <- rbind(d3, seg)
            }
           
            gg.poly <-  list(geom_polygon(data=d3, aes(x=x, y=value, color=NULL, fill=col, group=group)), scale_fill_gradientn("dens.scaled", colours=palette, values=seq(-1, 3, 1)))
        }
    }
   
    print("Build ggplot figure ...")
    flush.console()
   
   
    if (spag==TRUE) {
        gg.spag <-  geom_path(data=b2, aes(x=x, y=value, group=B), size=0.7, alpha=10/B, color=spag.color)
    }
   
    if (show.median == TRUE) {
        if (mweight == TRUE) {
            gg.median <-  geom_path(data=CI.boot, aes(x=x, y=M, alpha=w3^3), size=.6, linejoin="mitre", color=median.col)
        } else {
            gg.median <-  geom_path(data=CI.boot, aes(x=x, y=M), size = 0.6, linejoin="mitre", color=median.col)
        }
    }
   
    # Confidence limits
    if (show.CI == TRUE) {
        gg.CI1 <- geom_path(data=CI.boot, aes(x=x, y=UL), size=1, color="red")
        gg.CI2 <- geom_path(data=CI.boot, aes(x=x, y=LL), size=1, color="red")
    }
   
    # plain linear regression line
    if (show.lm==TRUE) {gg.lm <- geom_smooth(method="lm", color="darkgreen", se=FALSE)}
       
    gg.points <- geom_point(data=data, aes_string(x=IV, y=DV), size=1, shape=shape, fill="white", color="black")       
       
    if (title != "") {
        gg.title <- theme(title=title)
    }
   

    gg.elements <- list(gg.tiles, gg.poly, gg.spag, gg.median, gg.CI1, gg.CI2, gg.lm, gg.points, gg.title, theme(legend.position="none"))
   
    if (add == FALSE) {
        return(p0 + gg.elements)
    } else {
        return(gg.elements)
    }
}
Comments (36) | Trackback

36 Responses to “Visually weighted/ Watercolor Plots, new variants: Please vote!”

  1. Start says:

    [...] [Update 2: Sep 6, 2012: See new improved plots, and new R code! [...]

  2. Felix, these are great. Thanks for the terrific implementation.

    Going back to original idea of Visually-Weighted Regression, i.e. to focus the viewer on important parts of the graph, I would comment that some of these (very creative) schemes might distract the viewer from the central portion of the regression. Eg. When I look at Plots 6, 7, 10 and 11, I feel like my eyes are drawn toward the edges of the graph where there’s a lot of bright yellow/orange. That’s why we originally were fading some of the plots as the sample got thinner. I think that your plots which conserve ink (1, 8 and 9) do this quite nicely. But when the color for zero-mass is something other than the background color, I think it creates visual weight in places of the graph where there is not much information.

    • Manuel Gonzalez Canche says:

      Point very well taken. Although I love the colors, they are a little bit distracting me from the original idea. SUPERB job dear Felix, simply SUPERB!!!

    • FelixS says:

      Solomon, I agree 100% with your remarks. The more colorful plots were more meant as a show-off for different color schemes. It’s fun to see how the overall impression changes with new colors!

  3. Chris Beeley says:

    Not only are the plots awesome, but I love the “anti Tufte” name. Use as much ink as possible! Write in white space on bits of paper covered in black! Use all the ink!

  4. hbaghishani says:

    When I tried to run the code, I got following error:
    Error in get(x, envir = this, inherits = inh)(this, …) :
    unused argument(s) (range = c(0.001, 1))

    • FelixS says:

      Hi, can you send me a data set to replicate that error (by email)? Then I can look for the error.

      • GraemeHill says:

        I had the same problem.

        I think it’s from an older version of ggplot2. I changed “scale_alpha_continuous(range=c(0.001, 1))” to “scale_alpha_continuous(to=c(0.001, 1))” and it worked fine. I then updated everything and the original version worked.

      • hbaghishani says:

        Thank you so much Felix for such a nice work. Good job!

  5. Magnus says:

    When I do these graphs I get the following warning:

    Warning message:
    ‘opts’ is deprecated.
    Use ‘theme’ instead.
    See help(“Deprecated”)

    The graphs still are produced though and look good. Is that something to worry about or can I assume the plots are ok anyways?

    • FelixS says:

      The plots are OK – you have a somewhat outdated version of the ggplot2 package. When you update to 0.9.2, the warnings disappear.

  6. John Mashey says:

    Good stuff. I liked 11, but many of these are interesting.

  7. [...] Watercolor your scatterplots. Yammy. Here as well. [...]

  8. Andre says:

    Hi,

    first of all, great work man ;) but I need to draw multiple regressions with different colors within one plot. Maybe this is possible by adding another function parameter, like newplot=TRUE/FALSE. Is this somehow possible, because I am not familiar with ggplot2?

    Regards …

    • FelixS says:

      Hi Andre,

      sorry for my late reply. I was thinking quite a while on how to do this in a clever way, and I think I found a way.

      Now there’s a new parameter

      add

      . You can construct a first plot (with add = FALSE), construct a second plot (with add = TRUE), and then you can merge both with the ‘+’ operator. Here’s an example:

      # build a demo data set
      set.seed(1)
      x <- rnorm(200, 0.8, 1.2)
      e <- rnorm(200, 0, 3)*(abs(x)^1.5 + .5) + rnorm(200, 0, 4)
      e2 <- rnorm(200, 0, 5)*(abs(x)^1.5 + .8) + rnorm(200, 0, 5)
      y <- 8*x - x^3 + e
      y2 <- 20 + 3*x + 0.6*x^3 + e2
      df <- data.frame(x, y, y2)

      p1 <- vwReg(y~x, df, spag=TRUE, shade=FALSE)
      p2 <- vwReg(y2~x, df, add=TRUE, spag=TRUE, shade=FALSE, spag.color="red", shape=3)
      p3 <- p1 + p2
      p3

      vwReg: Two groups

      You can control the appearance of each group with the parameters of the function, and add as many plots as you like. You can also provide different data frames for each plot.

      It works well with spaghetti plots; if you do shaded plots, at the moment all groups have the same palette. I still have to find a solution for that.

      Cheers,
      Felix

      • André says:

        Hi Felix,

        and thank you very much, for the upgrade of your function ;) I will try to use this visualization in my next publication …

        Regards,

        André

  9. Derek Jones says:

    Thanks for a great visualization function.

    When “show.CI=TRUE” is specified as an argument I get the message:
    Error in eval(expr, envir, enclos) : object ‘B’ not found

    It is generated by the “return(gg.elements)” at the end.

    It looks like the call to aes_string might be the problem.

    • Dave Auty says:

      Fantastic visualization. Aside from the same error as Derek Jones when calling ‘CI=TRUE’, calling this line:

      p3 <- p1 + p2

      Also throws up the error:

      Error in p + o : non-numeric argument to binary operator
      In addition: Warning message:
      Incompatible methods ("+.gg", "Ops.data.frame") for "+"

      Anyone got any thoughts as to why?

      Thanks,

      Dave

      • FelixS says:

        The show.CI bug should be removed now!

        Concerning the error message about “non-numeric argument”: which version of ggplot do you use? It should be >= 0.9.2. But I think that either p1 or p2 are not a ggplot-object (maybe one of them was overwritten with something else in your script)?

        • Dave Auty says:

          Many thanks- CI bug is removed and the plots work perfectly. As for the other error, I am using 0.9.2.1, but I think your hunch was right as I re-ran the code in a new session and it worked fine. Apologies for not checking that.

          Thanks again for your sharing this superb method.

          Dave

        • Dave Auty says:

          One other thought: it would be nice if this method worked with faceting too, is this something that would be easy to implement?

          Dave

  10. Rogier says:

    Dear Felix,
    Fantastic work- These look spectacular. Do you know of, or do you have any plans to, apply these visualization techniques to Kaplan-Meier survival curves? (e.g. survfit objects using the package survival). The package denstrip aims to do so but seems to have trouble with the changing (over time) and assymmetric nature of confidence intervals in survival curves. I think it would be fantastic to visualize survival curves in this manner also. Kindest regards,
    Rogier

  11. Rogier says:

    ‘assymmetric’? -s :)

  12. Conrad Zygmont says:

    Hi Felix,

    Thank you for your great work and sharing it on the blog.

    The value of visually appealing, and informative, graphics for communicating information in publications and presentations is described well in your previous post on the watercolor plots. The real value of these appealing graphs for me is in motivating my psychology statistics students to get exited about using R.

    Once again thank you for your contribution to the community.

    Kind Regards,
    Conrad Zygmont

  13. Daniele says:

    Hi Felix,

    Great work, man!

    I have tried to use your function with method=lm (to get a straight line), but this message appeared:

    [1] “Computing boostrapped smoothers …”
    Error in vwReg(Y ~ X, data = mydata, method=lm, quantize = “SD”) :
    could not find function “method”

    Could you help me with this?

    Cheers,

    Daniele

    • FelixS says:

      Try

      method="lm"

      , that should work!

      • Daniele says:

        Thanks, for the quick response!

        I have tried this too. But the message is the same:

        > vwReg(y~x, data=mydata,method=”lm”,quantize=”SD”)
        [1] “Computing boostrapped smoothers …”
        Error in vwReg(BGR ~ WGR, data = football, method = “lm”, quantize = “SD”) :
        could not find function “method”
        Cheers,D.

        • FelixS says:

          Interesting – it works for me (without the quotes – it’s

          method=lm

          ). Did you try it with an empty environment and the current version of the function?

  14. [...] ‘watercolor‘ plot (aka à la Solomon [...]

  15. mareviv says:

    Hi Felix, thanks for the code! I’ve just posted my two graphs here: http://talesofr.wordpress.com/2013/04/28/watercolor-plots/

  16. [...] my prospectus, and I have a few more SPSS blog posts I have in mind (restricted cubic splines, visually weighted regression, and using Ripley’s K to analyze temporal crime [...]

  17. Nils says:

    Im interested in visualizing uncertainty for time series data. I can easily simulate hundreds of points along the x-axis. My experience from using the vwReg function is that (a) it’s very slow (bootstrapping) and (b) the density along the y-axis looks a lot different from what I expected when I look at box plots of the same data.

    I did not take a deep dive into the code, but have you considered simply custering the points along the y-axsis and use the cluster centers plus weights instead of smoothing?

  18. TJ says:

    Felix,

    Thanks for sharing!
    TJ

Leave a Reply