Visually weighted regression in R (à la Solomon Hsiang)

Authornicebread / Posted on2012-08-3030 Comments

[Update 1: Sep 5, 2012: Explore the Magical Data Enhancer by IRES, using this visualization technique]
[Update 2: Sep 6, 2012: See new improved plots, and new R code!
Solomon Hsiang proposed an appealing method for visually displaying the uncertainty in regressions (see his blog [1][2], and also the discussions on the Statistical Modeling, Causal Inference, and Social Science Blog [1][2]).
I implemented the method in R (using ggplot2), and used an additional method of determining the shading (especially concerning Andrew Gelman’s comment that traditional statistical summaries (such as 95% intervals) give too much weight to the edges. In the following I will show how to produce plots like that:

I used following procedure:

Compute smoothers from 1000 bootstrap samples of the original sample (this results in a spaghetti plot)
Calculate a density estimate for each vertical cut through the bootstrapped smoothers. The area under the density curve always is 1, so the ink is constant for each y-slice.
Shade the figure according to these density estimates.

Now let’s construct some plots!

The basic scatter plot:

No we show the bootstrapped smoothers (a “spaghetti plot”). Each spaghetti has a low alpha. That means that overlapping spaghettis produce a darker color and already give weight to highly populated regions.

Here is the shading according to the smoother’s density:

Now, we can overplot the median smoother estimate for each x value (the “median smoother”):

Or, a visually weighted smoother:

Finally, we can add the plain linear regression line (which obviously does not refelct the data points very well):

At the end of this post is the function that produces all of these plots. The function returns a ggplot object, so you can modify it afterwards, e.g.:

vwReg(y~x, df, shade=FALSE, spag=TRUE) + xlab("Implicit power motive") + ylab("Corrugator activity during preparation")[/cc]

 
Here are two plots with actual data I am working on:
The correlation of both variables is .22 (p = .003).

A) As a heat map (note: the vertical breaks at the left and right end occur due to single data points that get either sampled or not during the bootstrap):

 



 

B) As a spaghetti plot:

 



 

 

Finally, here's the code (sometimes the code box is collapsed - click the arrow on the top right of the box to open it). Comments and additions are welcome.

[cc lang="rsplus" escaped="true"][Update: I removed the code, as an updated version has been published here (see at the end of the post)][/cc]

30 thoughts on “Visually weighted regression in R (à la Solomon Hsiang)”

Matt says:

2012-08-31 at 15:16

Great work! However, the actual code isn’t visible: just some comments and the name of the function.

Reply
1. FelixS says:
  
  2012-09-01 at 07:14
  
  There seems to be some bug in the syntax highlighter – sometimes it’s there, sometimes not. Now the code is visible – I hope it stays like that!
  
  Reply
Manuel S Gonzalez Canche says:

2012-08-31 at 18:41

Simply wonderful ideas and implementation. I am most amazed with how brilliant people like you is willing to share its wisdom!
Thank you!

Reply
1. Manuel S Gonzalez Canche says:
  
  2012-08-31 at 18:47
  
  in order to add a second predictors it would require to have a tri-dimensional plot, correct?
  
  Reply
  1. FelixS says:
    
    2012-09-01 at 07:17
    
    Thanks for your friendly comment – this blog post stands on the shoulders of Solomon’s idea and many comments on Andrew Gelman’s blog. I think that’s the power of open source!
    Concerning two predictors: yes, that would require a surface plane, or some sort moderation plot where two 2-dim plots are plotted at some selected values of one of the predictors (e.g. +/- 1 SD).
    F*
    
    Reply
    1. Manuel Gonzalez Canche says:
      
      2012-09-02 at 04:18
      
      I know that it is Salomon’s idea, but you brilliantly implemented in R and thanks to you I visited Salomon’s website and he is giving you credit for the implementation. It is amazing to see how academic networks emerge. Thank you, again FelixS
      
      Reply
Einar says:

2012-08-31 at 19:32

the code was available earlier today, now not 🙁
einar

Reply
Pete says:

2012-09-01 at 07:26

Thanks for this, looks much more intuitive than simply plotting error bounds. A random suggestion which might make sense when the focus is more the visuals is to always include the first and last points in the bootstrap sampling so that all lines start and end at the same place. I know this would somewhat bias the estimates. Thoughts?

Reply
1. FelixS says:
  
  2012-09-03 at 11:19
  
  Interesting idea. But I think it would indeed introduce too much bias, particularly in data sets with few data points, as all smoothers would go through that one point …
  
  Reply
Chris Beeley says:

2012-09-01 at 20:18

The code’s visible for me.
Totally awesome! One of those functions you’re just dying to try out [roots around in cupboard to find some regression lines to try it out on]

Reply
Marcelo Lugo says:

2012-09-02 at 00:32

Hi, the code has an error, the program can not find the function “melt”.

Reply
1. FelixS says:
  
  2012-09-03 at 11:20
  
  One line was missing: “library(reshape2)”. It’s included now (I always have that package in my environment, so I forgot to put the library call into the function). Should work now!
  
  Reply
Tyler Cheung says:

2012-09-02 at 15:34

Hi,
I really like those figures! I tried to run the sample code in R and it seems to execute fine, but ggplot2 doesn’t seem to generate any actual figures – am I doing something wrong?

Reply
1. FelixS says:
  
  2012-09-03 at 11:21
  
  hard to say – it works for me. Do you start R from the console? Maybe you have to open a graphic device before you execute the function?
  
  Reply
antagomir says:

2012-09-04 at 15:28

Hi, where to get the source code to reproduce the plots?

Reply
1. antagomir says:
  
  2012-09-04 at 15:29
  
  Oh, no I managed to get it. Somehow hidden at first.
  
  Reply
Pingback: Start
Isaac says:

2012-09-05 at 17:43

What would be the best way of doing this in black and white (including grayscale) for publication plots?

Reply
1. FelixS says:
  
  2012-09-06 at 09:22
  
  See here! Cheers, F
  
  Reply
Pingback: Start
Rob says:

2012-09-11 at 15:47

Thanks for this very useful code! This is a lovely implementation, and very effective and displaying uncertainty.
I did have one suggestion.
Because the majority of the computational effort is in bootstrapping the smoothers, it would be ideal if this did not have to be done for each attempt to draw the graph. It would be nice if one could capture these estimates as one step, and display them as a second step. This would encourage tinkinering with the image, and testing various formats. The central idea here being that one might well want to play with the display without having to re-estimate the smoothers.
Many thanks,

Reply
Thom says:

2012-09-14 at 12:45

Hi,
Thanks for this “reproducible research”, everything works well, apart from the spaghetti plot. All lines are darkblue, without any weight on the color.
I wonder if someone also have this issue
I have a Warning message:
“Removed 20842 rows containing missing values (geom_path).”
Thomas

Reply
1. FelixS says:
  
  2012-09-24 at 08:21
  
  Hi Thomas, sorry for my late reply. The spaghettis have a higher alpha transparency, and are not weighted by color. Overlapping lines produce a darker shading. The error message can be ignored (it’s due to the bootstrap resampling). HTH Felix
  
  Reply
Pingback: Visually weighted regression in R (à la Solomon Hsiang) | Roberts Lab
Pingback: Viz. weighted regression in SPSS and some discussion | Andrew Wheeler
Eugene says:

2014-11-11 at 21:47

Hi Felix,
first of all. THX for sharing that. Its very very cool. If you dont mind to answer me a short question regarding labeling. Is there a way to skip y and x labels as well as the vertical and horizontal lines on the main plot area. I just tried the usual suspects but that doesnt work out:
vwReg(prestige~income,Prestige,family=”symmetric”,palette=brewer.pal(9,”YlOrRd”),xlab=” ,ylab=”,main=”)
vwReg(prestige ~ income, Prestige,family=”symmetric”,palette=brewer.pal(9,”YlOrRd”),xlab=NULL,ylab=NULL,main=NULL)

Reply
1. FelixS says:
  
  2014-11-12 at 07:41
  
  The function returns a ggplot-object. Hence, you can modify it using the ggplot syntax, such as:
  p1 <- vwReg(y~x, df) p1 + xlab("") + ylab("YLAB")
  
  Reply
David says:

2015-02-23 at 19:45

Hi Felix,
Really nice work. I have a family of spaghetti lines (that is, I don’t have a regular sample of points but instead their, so to say, estimated bootstraped representaion) and want to create a shaded plot ‘à la watercolor’ from them. Any advice on how to proceed?

Reply
Erik says:

2015-02-25 at 23:16

Hello Felix,
Great visualization. It runs well with my data, but I have a quick question about using this function to regress logistic growth. Instead of using loess, I’m attempting to use method=glm but receive the error: Error in method(formula, data) : ‘family’ not recognized.
Is it possible to indicate that the regression should take a logistic form instead of loess or lm?

Reply
Pingback: Viz. weighted regression in SPSS and some discussion | Andrew Wheeler

Now let’s construct some plots!

Here are two plots with actual data I am working on:

30 thoughts on “Visually weighted regression in R (à la Solomon Hsiang)”

Leave a Reply to Erik Cancel reply