# Installation of WRS package (Wilcox’ Robust Statistics)

Some users had trouble installing the WRS package from R-Forge. Here’s a method that should work automatically and fail-safe:

 # first: install dependent packages install.packages(c("MASS", "akima", "robustbase"))   # second: install suggested packages install.packages(c("cobs", "robust", "mgcv", "scatterplot3d", "quantreg", "rrcov", "lars", "pwr", "trimcluster", "parallel", "mc2d", "psych", "Rfit"))   # third: install WRS install.packages("WRS", repos="http://R-Forge.R-project.org", type="source")

WRS cannot be hosted on CRAN, as CRAN demands help files for every user-visible function. This has not been done for WRS (yet). For the time being, this somewhat more complicated installation routine has to be used.

# Further thoughts on post-publication peer review (PPPR)

Sanjay Srivastava blogged some interesting thoughts about the process of post-publication peer review (PPPR), reflecting about his own comment on a PLOS ONE publication. I agree that open peer commentaries after publication are one important part of the future of scientific publishing. There were many cases where I wished to have the opportunity to publish such a commentary. In one case, I actually wrote a commentary on a paper published in Management Science – a strange story about managers, age, and testosterone, which received a lot of press coverage. I submitted it as a commentary to the journal, but it was rejected because of “lack of new results”. Now my commentary rests on SSRN and has been downloaded 5 times in 10 months – yippee-yeah! (probably 3 of these 5 are by myself …). But as SSRN does not allow peer commentaries I could not set a link from the original paper to my comment, and nobody finds it.

Other fields of science additionally established a pre-publication open peer review (also called the “pre-print culture”). Many researchers in mathematics or physics publish their preprints on arXiv and harvest open peer commentaries before submitting the manuscript to a peer-reviewed journal.

I believe devoutly that open PrePPR and PostPPR can significantly improve the quality of scientific output. But one crucial requirement indeed is etiquette, as Sanjay pointed out. I don’t want to see shitstorms coming over scientific articles, especially in the case of young scholars who worked hard to get their first paper published. Comments should be written in the spirit of a collaborative enhancement of research, and less in terms of “debunking”. We all are humans and mistakes can occur. Problems should be pointed out in order to strengthen scientific research, but in a friendly and constructive manner.

Researchers who conceive of science as a highly competitive business where claims have to be fortified and defended might have problems with open peer reviews (e.g., the escalation of the “Bargh rampage” [1][2][3]). But if we see science as a collaborative endeavour in search for knowledge, where no model is “right” but only “less wrong”, open peer reviews can be a very helpful tool.

# The first CREDAM Award for creative data management goes to … the German government!

“If you torture the data long enough, it will confess.”
This aphorism, attributed to Ronald Coase, sometimes has been used in a disrespective manner, as if was wrong to do creative data analysis. This view obviously is misleading. In contrast, we at IRET have a much more positive and humanistic view of data management, and therefore we have made this aphorism to our leading guide in difficult times.

We at IRET have made it to our mission to proliferate and foster creative ways of data analysis. Therefore, we proudly introduce an award in recognition of outstanding data creativity: the CREDAM Award. CREDAM is both an acronym (CREative DAta Management), and a statement: credam (lat.) means “I will believe”, or “I will trust”.

## This years CREDAM Award goes to …….. the German government!

A new report on poverty in Germany is going to be published soon. What does the data say?

Year
Overall property in possession of rich households
Overall property in possession of complete lower half
199845%3%
200853%1%

Seems like a pretty clear picture, and in a previous version of the report, the authors concluded (based on this and other data), that “income disparity increased” (see Süddeutsche Zeitung). But that is wrong!! But why is it wrong? Well, that interpretation “does not reflect the opinion of the German government”.

On the pressure of the leader of the minor coalition partner, Philipp Rösler (which currently would be elected by 4% of Germans), this conclusion was re-interpreted. Now, the report comes to the completely opposite conclusion: “income disparity decreases!

### As this is a great example of creative data analysis, which liberates us from restrictive and anally retentive “scientific” procedures, we are happy to award the first CREDAM trophy to the German government, especially Phillip Rösler. Congratulations!

(Maybe we should think about adopting this strategy for scientific reports as well. Given highly flexible approaches of data analysis, conclusions should rather be based on a majority vote of all (co-)authors and reviewers, not on empirical evidence.)

# Improved evolution of correlations

As an update of this post: here’s an improved version of “The evolution of correlations”.

From the original post:

This is the evolution of a bivariate correlation between two questionnaire scales, “hope of power” and “fear of losing control”. Both scales were administered in an open online study. The video shows how the correlation evolves from r = .69*** (n=20) to r = .26*** (n=271). It does not stabilize until n = 150.

Data has not been rearranged – it is the random order how participants dropped into the study. This had been a rather extreme case of an unstable correlation – other scales in this study were stable right from the beginning. Maybe this video could help as an anecdotal caveat for a careful interpretation of correlations with small n’s (and with ‘small’ I mean n < 100) …

The right panel now displays the correlation in each step. The horizontal green line is the final correlation that is approached, the curved dotted line shows the marginal correlation that would be significant at that sample size. As the empirical curve always is above this dotted line, it is significantly different from zero in each step.

Here the code that created the movie. It’s not fully self-contained – the function plotReg plots the dual-panel display, dat0, A, and B are parameters passed to this function. You can insert any other function here. The function loops through the rows of a data frame and saves a plot at every step into a subfolder. Finally, the function needs the command line version of ffmpeg, which connects the pictures to a movie.

 makeMovie <- function(fname, dat0, A, B, fps=15) {   # create a new directory for the pictures dir.create(fname)   # create the picture sequence picName <- paste(fname, "/", fname, "_%03d.jpg", sep="") jpeg(picName, width=800, height=450, quality=95) for (i in 15:nrow(dat0)) { print(i) plotReg(A, B, i, keep=15) } dev.off()   # delete any existing movie file unlink(paste(fname,".mpg",sep=""))   # point system to R's working directory system(paste("cd ", gsub(" ", "\\ ", getwd(), fixed=TRUE)))   # show & execute the command line expression for ffmpeg to glue the pictures together print(paste(paste0("ffmpeg -r ", fps, " -i ", fname, "/", fname, "_%03d.jpg -sameq -r 25 ", paste0(fname,".avi")))) system(paste(paste0("ffmpeg -r ", fps, " -i ", fname, "/", fname, "_%03d.jpg -sameq -r 25 ", paste0(fname,".avi")))) }

# Optimizing parameters for an oscillator – Video

Here’s a video how the modFit function from the FME package optimizes parameters for an oscillation. A Nelder-Mead-optimizer (R function optim) finds the best fitting parameters for an undampened oscillator. Minimum was found after 72 iterations, true parameter eta was -.05:

More on estimating parameters of differential equations is coming later on this blog!

Things I’ve learned:

• ffmpeg does not like pngs. They are internally converted to jpg in a very low quality and I could not find a way to improve this quality. Lesson learned: Export high quality jpgs from your R function
• Use a standard frame rate for the output file (i.e., 24, 25, or 30 fps)
• My final ffmpeg command: ffmpeg -r 10 -i modFit%03d.jpg -r 25 -b:v 5000K modelFit.avi
• -r 10: Use 10 pictures / second as input
• -i modFit%03d.jpg: defines the names of the input files, modFit001.jpg, modFit002.jpg, …
• -r 25: Set framerate of output file to 25 fps
• -b:v 5000K: set bitrate of video to a high value
• modelFit.mp4: video name and encoding type (mp4)

# R-package: Wilcox’ Robust Statistics updated (WRS v0.20)

Rand Wilcox constantly updates the functions accompanying his books on robust statistics. Recently, they have been updated to version 20. The functions are available in the WRS package for R – for installation simply type
install.packages("WRS", repos="http://R-Forge.R-project.org")

In version 0.20, a number of functions dealing with ANCOVA have been added and some others improved. Unfortunately, only very few help files exist for the functions. I would recommend to check out the source code, as most functions have a comment section roughly explaining the parameters. Alternatively, consult Wilcox’ books for descriptions of the functions.

# Parse pdf files with R (on a Mac)

Inspired by this blog post from theBioBucket, I created a script to parse all pdf files in a directory. Due to its reliance on the Terminal, it’s Mac specific, but modifications for other systems shouldn’t be too hard (as a start for Windows, see BioBucket’s script).

First, you have to install the command line tool pdftotext (a binary can be found on Carsten Blüm’s website). Then, run following script within a directory with pdfs:

# Amazing fMRI plots for everybody!

Dear valued customer,

it is a well-known scientific truth that research results which are accompanied by a fancy, colorful fMRI scan, are perceived as more believable and more persuasive than simple bar graphs or text results (McCabe & Castel, 2007; Weisberg, Keil, Goodstein, Rawson, & Gray, 2008). Readers even agree more with fictitious and unsubstantiated claims, as long as you provide a colorful brain image, and it works even when the subject is a dead salmon.

## The power of brain images for everybody

What are the consequence of these troubling findings? The answer is clear. Everybody should be equipped with these powerful tools of research communication! We at IRET made it to our mission to provide the latest, cutting-edge tools for your research analysis. In this case we adopted a new technology called “visually weighted regression” or “watercolor plots” (see here, here, or here), and simply applied a new color scheme.

But now, let’s get some hands on it!

## The example

Imagine you invested a lot of effort in collecting the data of 41 participants. Now you find following pattern in 2 of your 87 variables:

You could show that plain scatterplot. But should you do it? Nay. Of course everybody would spot the outliers on the top right. But which is much more important: it is b-o-r-i-n-g!

What is the alternative? Reporting the correlation as text? “We found a correlation of r = .38 (p = .014)”. Yawn.

Or maybe: “We chose to use a correlation technique that is robust against outliers and violations of normality, the Spearman rank coefficient. It turned out that the correlation broke down and was not significant any more (r = .06, p = .708).”.

Don’t be silly! With that style of scientific reporting, there would be nothing to write home about. But you can be sure: we have the right tools for you. Finally, the power of pictures is not limited to brain research – now you can turn any data into a magical fMRI plot like that:

Isn’t that beautiful? We recommend to accompany the figure with an elaborated description: “For local fitting, we used spline smoothers from 10000 bootstrap replications. For a robust estimation of vertical confidence densities, a re-descending M-estimator with Tukey’s biweight function was employed. As one can clearly see in the plot, there is  significant confidence in the prediction of the x=0, y=0 region, as well as a minor hot spot in the x=15, y=60 region (also known as the supra-dextral data region).”

## Magical Data Enhancer Tool

With the Magical Data Enhancer Tool (MDET) you can …

• … turn boring, marginally significant, or just crappy results into a stunning research experience
• … publish in scientific journal with higher impact factors
• … receive the media coverage that you and your research deserve
• … achieve higher acceptance rates from funding agencies
• … impress young women at the bar (you wouldn’t show a plain scatterplot, dude?!)

## FAQ

Q: But – isn’t that approach unethical?
A: No, it’s not at all. In contrast, we at IRES think that it is unethical that only some researchers are allowed to exploit the cognitive biases of their readers. We design our products with a great respect for humanity and we believe that every researcher who can afford our products should have the same powerful tools at hand.

Q: How much does you product cost?
A: The standard version of the Magical Data Enhancer ships for 12’998 \$. We are aware that this is a significant investment. But, come on: You deserve it! Furthermore, we will soon publish a free trial version, including the full R code on this blog. So stay tuned!

Best regards,

Lexis “Lex” Brycenet (CEO & CTO Research Communication)
International Research Enhancement Technology (IRET)