# At what sample size do correlations stabilize?

Maybe you have encountered this situation: you run a large-scale study over the internet, and out of curiosity, you frequently the correlation between two variables.

My experience with this practice is usually frustrating, as in small sample sizes (and we will see what “small” means in this context) correlations go up and down, change sign, move from “significant” to “non-significant” and back. As an example, see Figure 1 which shows the actual trajectory of a correlation, plotted against sample size (I also posted a video of this evolution).

It is simply the order how participants dropped into the study (i.e., data has not been rearranged). In this case, the correlation started really strong (*r* = .69) and continuously decayed until it’s final *r* of .26. The light gray lines show some exemplary bootstrapped alternative trajectories.

In this particular case, at least the sign was stable (“There is a positive relationship in the population”, see also “Type-S errors”). Other trajectories in this data set, however, changed signs or their significance status. One correlation even changed from “negative significant” to “positive significant”!

Obviously, the estimate of a correlation stabilizes with increasing sample size. Now I wanted to know: At which sample size exactly can I expect a correlation to be stable? An informal query amongst colleagues revealed estimates between *n* = 80 and *n* = 150.

Together with Marco Perugini, I did a systematic analysis of this question. The results of this simulation study are reported [PDF, 0.39 MB]. In this paper a “corridor of stability” (*COS*) has been utilized: Deviations from the true value are defined as tolerable as long as they stay within that corridor (see also Figure 1 for a COS of +/- .1). The point of stability (*POS*) is that sample size from which on a specific trajectory does not leave the COS anymore.

The point of stability depends on the effect size (How strong is the true correlation?), the width of the corridor of stability (How much deviation from the true value am I willing to accept?), and the confidence in the decision (How confident do I want to be that the trajectory does not leave the COS any more?). If you’re interested in the details: read the paper. It’s not long.

The bottom line is: For scenarios in psychology, **correlations stabilize when n approaches 250**. That means, estimates with n > 250 are not only significant, they also are fairly

*accurate*(see also Kelley & Maxwell, 2003, and Maxwell, Kelley, & Rausch, 2008, for elaborated discussions on parameter accuracy).

# Additional analyses (not reported in the publication)

Figure 2 shows the distribution of POS values, depending on the half-width of the COS and on effect size rho. The horizontal axis is cut at n = 300, although several POS were > 300. It can be seen that all distributions have a very long tail. This makes the estimation of the 95th quantile very unstable. Therefore we used a larger number of 100’000 bootstrap replications in each experimental condition in order to get fairly stable estimates for the extreme quantiles.

Finally, Figure 3 shows the probability that a trajectory leaves the COS with increasing sample size.

The dotted lines mark the confidence levels of 80%, 90%, and 95% which were used in the publications. The *n* where the curves intersect these dotted lines indicate the values reported in Table 1 of the publication. For example, if the true correlation is .3 (which is already more than the average effect size in psychology) and you collect 100 participants, there’s still a chance of 50% that your correlation will leave the corridor between .21 and .39 (which are the boundaries for w=.1).

What is the conclusion? Significance tests determine the sign of a correlation. This conclusion can be made with much lower sample sizes. However, when we want to make an accurate conclusion about the *size* of an effect with some confidence (and we do not want to make a “Type M” error), we need much larger samples.

The full R source code for the simulations can be downloaded here.

*References:*

*Psychological Methods*,

*8*, 305–321. [PDF]

*Annual Review of Psychology*,

*59*, 537–563. doi:10.1146/annurev.psych.59.103006.093735 [PDF]

*Journal of Research in Personality, 47*, 609-612. doi:10.1016/j.jrp.2013.05.009 [PDF]

*Journal of Research in Personality*. doi:10.1016/j.jrp.2013.05.009

[…] http://www.nicebread.de/at-what-sample-size-do-correlations-stabilize/ […]

This is an awesome article! You present a compelling argument with clear analysis and great (although expensive) recommendations. Thank you for your hard work and public discourse!

[…] them linked in my blog. (That way I know I can find them again). From Felix Schön comes, at what sample size does correlations stabilize. And, from Daniel Simons we have nice graphs of effect sizes when samples are drawn from the same […]

[…] if you focus on estimation rather than NHST (which I wholeheartedly support by the way) — you still need adequate samples. The only alternatives are (a) to live with a lot of ambiguous (nonsignificant) results until […]

[…] I also think I’ve linked in Felix Schönbrodt’s post before, but also worth repeating. At what sample size does correlations stablilize? […]

[…] it makes no sense.many people have written very compelling explanations about why we should want larger samples (more power). i will trust that you have read those. what i want to talk […]

[…] on how correlations evolve with increasing sample size (Schönbrodt & Perugini, 2013; see also blog post), we conclude that for typical effect sizes in psychology, you need 250 participants to get […]

[…] on how correlations evolve with increasing sample size (Schönbrodt & Perugini, 2013; see also blog post), we conclude that for typical effect sizes in psychology, you need 250 participants to get […]

[…] Schönbrodt has blogged recently about how a statistic (correlation, in his case) wiggles around and gradually stabilises as a […]

[…] an internally valid statement, but ignores the fact that studies with good power also have good precision to estimate […]

[…] is about the average published effect size in social and personality psychology*****. also, this. and sanjay's 2013 arp talk. (you had to be […]

[…] up and down. In typical situations, stable estimates can be expected when n approaches 250. See this blog post for some more information and a nice video (Or: read the paper. It’s short.) Interestingly […]

Nice argument and obviously useful. I’ve faced a lot of this myself and wondered about it. Sad that the n is so high. Often in medical research we need results in smaller samples, when expensive procedures or small populations can cause problems for doing large studies.