Comparing correlations: independent and dependent (overlapping or non-overlapping)

In Chapter 6 (correlation and covariance) I consider how to construct a confidence interval (CI) for the difference between two independent correlations.  The standard approach uses the Fisher z transformation to deal with boundary effects (the squashing of the distribution and increasing asymmetry as r approaches -1 or 1). As zr is approximately normally distributed (which r is decidedly not) you can create a standard error for the difference by summing the sampling variances according to the variance sum law (see chapter 3).

This works well for the CI around a single correlation (assuming the main assumptions – bivariate normality and homogeneity of variance – broadly hold) or for differences between means, but can perform badly when looking at the difference between two correlations. Zou (2007) proposed modification to the standard approach that uses the upper and lower bounds of the CIs for individual correlations to calculate a CI for their difference. He considered three cases: independent correlations and two types of dependent correlations (overlapping and non-overlapping). He also considered differences in R2 (not relevant here).

Independent correlations

In section 6.6.2 (p. 224) I illustrate Zou’s approach for independent correlations and provide R code in sections 6.7.5 and 6.7.6 to automate the calculations. Section 6.7.5 shows how to write a simple R function and illustrates it with a function to calculate a CI for Pearson’s r using the Fisher transformation. Whilst writing the book I encountered several functions do do exactly this. The cor.test() function in the base package does this for raw data (along with computing the correlation and usual NHST). A number of functions compute it using the usual text book formula. My function relies on R primitive hyperbolic functions (as the Fisher z transformation is related to the geometry of hyperbolas), which may be useful if you need to use it intensively (e.g., for simulations):

rz.ci <- function(r, N, conf.level = 0.95) {
    zr.se <- 1/(N - 3)^0.5
    moe <- qnorm(1 - (1 - conf.level)/2) * zr.se
    zu <- atanh(r) + moe
    zl <- atanh(r) - moe
    tanh(c(zl, zu))
}

The function is 6.7.6 uses the rz.ci() function to construct a CI for the difference between two independent correlations. See section 6.6.2 of Serious stats or Zou (2007) for further details and a worked example. My function from section 6.7.6 is reproduced here:

r.ind.ci <- function(r1, r2, n1, n2=n1, conf.level = 0.95) {
    L1 <- rz.ci(r1, n1, conf.level = conf.level)[1]
    U1 <- rz.ci(r1, n1, conf.level = conf.level)[2]
    L2 <- rz.ci(r2, n2, conf.level = conf.level)[1]
    U2 <- rz.ci(r2, n2, conf.level = conf.level)[2]
    lower <- r1 - r2 - ((r1 - L1)^2 + (U2 - r2)^2)^0.5
    upper <- r1 - r2 + ((U1 - r1)^2 + (r2 - L2)^2)^0.5
    c(lower, upper)
}

The call the function use the two correlation coefficients an sample as input (the default is to assume equal n and a 95% CI).

A caveat

As I point out in chapter 6, just because you can compare two correlation coefficients doesn’t mean it is a good idea. Correlations are standardized simple linear regression coefficients and even if the two regression coefficients measure the same effect, it doesn’t follow that their standardized counterparts do. This is not merely the problem that it may be meaningless to compare, say, a correlation between height and weight with a correlation between anxiety and neuroticism. Two correlations between the same variables in different samples might not be meaningfully comparable (e.g., because of differences in reliability, range restriction and so forth).

Dependent overlapping correlations

In many cases the correlations you want to compare aren’t independent. One reason for this is that the correlations share a common variable. For example if you correlate X with Y and X with Z you might be interested in whether the correlation rXY is larger than rXZ. As X is common to both variables the correlations are not independent. Zou (2007) describes how to adjust the interval to account for this correlation. In essence the sampling variances of the correlations are tweaked using a version of the variance sum law (again see chapter 3).

The following functions (not in the book) compute the correlation between the correlations and use it to adjust the CI for the difference in correlations to account for overlap (a shared predictor). Note that both functions and  rz.ci() must be loaded into R. Also included is a calls to the main function  that reproduces the output from example 2 of Zou (2007).

rho.rxy.rxz <- function(rxy, rxz, ryz) {
	num <- (ryz-1/2*rxy*rxz)*(1-rxy^2-rxz^2-ryz^2)+ryz^3
	den <- (1 - rxy^2) * (1 - rxz^2)
	num/den
	}

r.dol.ci <- function(r12, r13, r23, n, conf.level = 0.95) {
    L1 <- rz.ci(r12, n, conf.level = conf.level)[1]
    U1 <- rz.ci(r12, n, conf.level = conf.level)[2]
    L2 <- rz.ci(r13, n, conf.level = conf.level)[1]
    U2 <- rz.ci(r13, n, conf.level = conf.level)[2]
    rho.r12.r13 <- rho.rxy.rxz(r12, r13, r23)
    lower <- r12-r13-((r12-L1)^2+(U2-r13)^2-2*rho.r12.r13*(r12-L1)*(U2- r13))^0.5
    upper <- r12-r13+((U1-r12)^2+(r13-L2)^2-2*rho.r12.r13*(U1-r12)*(r13-L2))^0.5
    c(lower, upper)
} 

# input from example 2 of Zou (2007, p.409)
r.dol.ci(.396, .179, .088, 66)

The r.dol.ci() function takes three correlations as input – the correlations of interest (e.g., rXY and rXZ) and the correlation between the non-overlapping variables (e.g., rYZ). Also required is the sample size (often identical for both correlations).

Dependent non-overlapping correlations

Overlapping correlations are not the only cause of dependency between correlations. The samples themselves could be correlated. Zou (2007) gives the example of a correlation between two variables for a sample of mothers. The same correlation could be computed for their children. As the children and mothers have correlated scores on each variable, the correlation between the same two variables will be correlated (but not overlapping in the sense used earlier). The following functions compute the CI for the difference in correlations  between dependent non-overlapping correlations. Also included is a call to the main function that reproduces Zou (2007) example 3.

rho.rab.rcd <- function(rab, rac, rad, rbc, rbd, rcd) {
	num <- 1/2*rab*rcd * (rac^2 + rad^2 + rbc^2 + rbd^2) + rac*rbd + rad*rbc - (rab*rac*rad + rab*rbc*rbd + rac*rbc*rcd + rad*rbd*rcd)
	den <- (1 - rab^2) * (1 - rcd^2)
	num/den
	}

r.dnol.ci <- function(r12, r13, r14, r23, r24, r34, n12, n34=n12, conf.level=0.95) {
    L1 <- rz.ci(r12, n12, conf.level = conf.level)[1]
    U1 <- rz.ci(r12, n12, conf.level = conf.level)[2]
    L2 <- rz.ci(r34, n34, conf.level = conf.level)[1]
    U2 <- rz.ci(r34, n34, conf.level = conf.level)[2]
    rho.r12.r34 <- rho.rab.rcd(r12, r13, r14, r23, r24, r34)
    lower <- r12 - r34 - ((r12 - L1)^2 + (U2 - r34)^2 - 2 * rho.r12.r34 * (r12 - L1) * (U2 - r34))^0.5
    upper <- r12 - r34 + ((U1 - r12)^2 + (r34 - L2)^2 - 2 * rho.r12.r34 * (U1 - r12) * (r34 - L2))^0.5
    c(lower, upper)
} 

# from example 3 of Zou (2007, p.409-10)

r.dnol.ci(.396, .208, .143, .023, .423, .189, 66)

Although this call reproduces the final output for example 3 it produces slightly different intermediate results (0.0891 vs. 0.0917) for the correlation between correlations. Zou (personal communication) confirms that this  is either a typo or rounding error (e.g., arising from hand calculation) in example 3 and that the function here produces accurate output. The input here requires the correlations from every possible correlation between the four variables being compared (and the relevant sample size for the correlations being compared). The easiest way to get the correlations is from a correlation matrix of the four variables.

Robust alternatives

Wilcox (2009) describes a robust alternative to these methods for independent correlations and modifications to Zou’s  method that make the dependent correlation methods robust to violations of bivariate normality and (in particular) homogeneity of variance assumptions. Wilcox provides R functions for these approaches on his web pages. His functions take raw data as input and are computationally intensive. For instance the dependent correlation methods use Zou’s approach but take boostrap CIs for the individual correlations as input (rather than the simpler Fisher z transformed versions).

The relevant functions are twopcor() for the independent case, TWOpov() for the dependent overlapping case and TWOpNOV() for the non-overlapping case.

UPDATE

Zou’s modified asymptotic method is easy enough that you can run it in Excel. I’ve added an Excel spreadsheet to the blog resources that should implement the methods (and matches the output to R fairly closely). As it uses Excel it may not cope gracefully with some calculations (e.g., with extremely small or large values or r or other extreme cases) – and I have more confidence in the R code.

References

Baguley, T. (2012, in press). Serious stats: A guide to advanced statistics for the behavioral sciences. Basingstoke: Palgrave.

Zou, G. Y. (2007). Toward using confidence intervals to compare correlations. Psychological Methods, 12, 399-413.

Wilcox, R. R. (2009). Comparing Pearson correlations: Dealing with heteroscedascity and non-normality. Communications in Statistics – Simulation & Computation, 38, 2220-2234.

N.B. R code formatted via Pretty R at inside-R.org

About these ads
Leave a comment

4 Comments

  1. Birk Diedenhofen

     /  December 5, 2013

    For those interested in comparing correlations: the R package cocor provides functions to compare two dependent or independent correlations, also implementing the methods by Zou (2007). There is a web interface available on http://comparingcorrelations.org

    Reply
  2. I don’t know of a simple solution off-hand. I think you could do this in SEM or multilevel software that allowed you to constrain a covariance matrix. Alternatively you could simulate correlation matrices without those constraints (or bootstrap with the raw data), count the proportion of times that the constraint was met and form a p value from that.

    Reply
  1. R functions for serious stats | Serious Stats
  2. Statistical Significance Tests on Correlation Coefficients | Philipp Singer

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 38 other followers

%d bloggers like this: