I just posted brief multicollinearity tutorial on my other blog (loosely based on the material from the Serious Stats book).

You can read it here.

I just posted brief multicollinearity tutorial on my other blog (loosely based on the material from the Serious Stats book).

You can read it here.

*Posted by Thom on November 11, 2013*

http://seriousstats.wordpress.com/2013/11/11/multicollinearity-tutoral/

*UPDATE*: Some problems arose with my previous host so I have now updated the links here and elsewhere on the blog.

The companion web site for Serious Stats has a zip file with R scripts for each chapter. This contains examples of R code and and all my functions from the book (and a few extras). This is a convenient form for working through the examples. However, if you just want to access the functions it is more convenient to load them all in at once.

The functions can be downloaded as a text file from:

http://www2.ntupsychology.net/seriousstats/SeriousStatsAllfunctions.txt

More conveniently, you can load them directly into R with the following call:

source('http://www2.ntupsychology.net/seriousstats/SeriousStatsAllfunctions.txt')

In addition to the Serious Stats functions, a number of other functions are contained in the text file. These include functions published on this blog for comparing correlations or confidence intervals for independent measures ANOVA and functions my paper on confidence intervals for repeated measures ANOVA.

*Posted by Thom on March 26, 2012*

http://seriousstats.wordpress.com/2012/03/26/r-functions-for-serious-stats/

In Chapter 6 (correlation and covariance) I consider how to construct a confidence interval (CI) for the difference between two independent correlations. The standard approach uses the Fisher *z* transformation to deal with boundary effects (the squashing of the distribution and increasing asymmetry as *r* approaches -1 or 1). As *z _{r}* is approximately normally distributed (which

This works well for the CI around a single correlation (assuming the main assumptions – bivariate normality and homogeneity of variance – broadly hold) or for differences between means, but can perform badly when looking at the difference between two correlations. Zou (2007) proposed modification to the standard approach that uses the upper and lower bounds of the CIs for individual correlations to calculate a CI for their difference. He considered three cases: independent correlations and two types of dependent correlations (overlapping and non-overlapping). He also considered differences in *R*^{2} (not relevant here).

*Independent correlations*

In section 6.6.2 (*p*. 224) I illustrate Zou’s approach for independent correlations and provide R code in sections 6.7.5 and 6.7.6 to automate the calculations. Section 6.7.5 shows how to write a simple R function and illustrates it with a function to calculate a CI for Pearson’s *r* using the Fisher *z *transformation. Whilst writing the book I encountered several functions do do exactly this. The cor.test() function in the base package does this for raw data (along with computing the correlation and usual NHST). A number of functions compute it using the usual text book formula. My function relies on R primitive hyperbolic functions (as the Fisher *z* transformation is related to the geometry of hyperbolas), which may be useful if you need to use it intensively (e.g., for simulations):

The function is 6.7.6 uses the rz.ci() function to construct a CI for the difference between two independent correlations. See section 6.6.2 of Serious stats or Zou (2007) for further details and a worked example. My function from section 6.7.6 is reproduced here:

r.ind.ci <- function(r1, r2, n1, n2=n1, conf.level = 0.95) { L1 <- rz.ci(r1, n1, conf.level = conf.level)[1] U1 <- rz.ci(r1, n1, conf.level = conf.level)[2] L2 <- rz.ci(r2, n2, conf.level = conf.level)[1] U2 <- rz.ci(r2, n2, conf.level = conf.level)[2] lower <- r1 - r2 - ((r1 - L1)^2 + (U2 - r2)^2)^0.5 upper <- r1 - r2 + ((U1 - r1)^2 + (r2 - L2)^2)^0.5 c(lower, upper) }

The call the function use the two correlation coefficients an sample as input (the default is to assume equal *n* and a 95% CI).

*A caveat*

As I point out in chapter 6, just because you can compare two correlation coefficients doesn’t mean it is a good idea. Correlations are standardized simple linear regression coefficients and even if the two regression coefficients measure the same effect, it doesn’t follow that their standardized counterparts do. This is not merely the problem that it may be meaningless to compare, say, a correlation between height and weight with a correlation between anxiety and neuroticism. Two correlations between the same variables in different samples might not be meaningfully comparable (e.g., because of differences in reliability, range restriction and so forth).

*Dependent overlapping correlations*

In many cases the correlations you want to compare aren’t independent. One reason for this is that the correlations share a common variable. For example if you correlate *X* with *Y* and *X* with *Z* you might be interested in whether the correlation *r _{XY}* is larger than

The following functions (not in the book) compute the correlation between the correlations and use it to adjust the CI for the difference in correlations to account for overlap (a shared predictor). Note that both functions and rz.ci() must be loaded into R. Also included is a calls to the main function that reproduces the output from example 2 of Zou (2007).

rho.rxy.rxz <- function(rxy, rxz, ryz) { num <- (ryz-1/2*rxy*rxz)*(1-rxy^2-rxz^2-ryz^2)+ryz^3 den <- (1 - rxy^2) * (1 - rxz^2) num/den } r.dol.ci <- function(r12, r13, r23, n, conf.level = 0.95) { L1 <- rz.ci(r12, n, conf.level = conf.level)[1] U1 <- rz.ci(r12, n, conf.level = conf.level)[2] L2 <- rz.ci(r13, n, conf.level = conf.level)[1] U2 <- rz.ci(r13, n, conf.level = conf.level)[2] rho.r12.r13 <- rho.rxy.rxz(r12, r13, r23) lower <- r12-r13-((r12-L1)^2+(U2-r13)^2-2*rho.r12.r13*(r12-L1)*(U2- r13))^0.5 upper <- r12-r13+((U1-r12)^2+(r13-L2)^2-2*rho.r12.r13*(U1-r12)*(r13-L2))^0.5 c(lower, upper) } # input from example 2 of Zou (2007, p.409) r.dol.ci(.396, .179, .088, 66)

The r.dol.ci() function takes three correlations as input – the correlations of interest (e.g., *r _{XY}* and

*Dependent non-overlapping correlations*

Overlapping correlations are not the only cause of dependency between correlations. The samples themselves could be correlated. Zou (2007) gives the example of a correlation between two variables for a sample of mothers. The same correlation could be computed for their children. As the children and mothers have correlated scores on each variable, the correlation between the same two variables will be correlated (but not overlapping in the sense used earlier). The following functions compute the CI for the difference in correlations between dependent non-overlapping correlations. Also included is a call to the main function that reproduces Zou (2007) example 3.

rho.rab.rcd <- function(rab, rac, rad, rbc, rbd, rcd) { num <- 1/2*rab*rcd * (rac^2 + rad^2 + rbc^2 + rbd^2) + rac*rbd + rad*rbc - (rab*rac*rad + rab*rbc*rbd + rac*rbc*rcd + rad*rbd*rcd) den <- (1 - rab^2) * (1 - rcd^2) num/den } r.dnol.ci <- function(r12, r13, r14, r23, r24, r34, n12, n34=n12, conf.level=0.95) { L1 <- rz.ci(r12, n12, conf.level = conf.level)[1] U1 <- rz.ci(r12, n12, conf.level = conf.level)[2] L2 <- rz.ci(r34, n34, conf.level = conf.level)[1] U2 <- rz.ci(r34, n34, conf.level = conf.level)[2] rho.r12.r34 <- rho.rab.rcd(r12, r13, r14, r23, r24, r34) lower <- r12 - r34 - ((r12 - L1)^2 + (U2 - r34)^2 - 2 * rho.r12.r34 * (r12 - L1) * (U2 - r34))^0.5 upper <- r12 - r34 + ((U1 - r12)^2 + (r34 - L2)^2 - 2 * rho.r12.r34 * (U1 - r12) * (r34 - L2))^0.5 c(lower, upper) } # from example 3 of Zou (2007, p.409-10) r.dnol.ci(.396, .208, .143, .023, .423, .189, 66)

Although this call reproduces the final output for example 3 it produces slightly different intermediate results (0.0891 vs. 0.0917) for the correlation between correlations. Zou (personal communication) confirms that this is either a typo or rounding error (e.g., arising from hand calculation) in example 3 and that the function here produces accurate output. The input here requires the correlations from every possible correlation between the four variables being compared (and the relevant sample size for the correlations being compared). The easiest way to get the correlations is from a correlation matrix of the four variables.

*Robust alternatives*

Wilcox (2009) describes a robust alternative to these methods for independent correlations and modifications to Zou’s method that make the dependent correlation methods robust to violations of bivariate normality and (in particular) homogeneity of variance assumptions. Wilcox provides R functions for these approaches on his web pages. His functions take raw data as input and are computationally intensive. For instance the dependent correlation methods use Zou’s approach but take boostrap CIs for the individual correlations as input (rather than the simpler Fisher *z* transformed versions).

The relevant functions are twopcor() for the independent case, TWOpov() for the dependent overlapping case and TWOpNOV() for the non-overlapping case.

UPDATE

Zou’s modified asymptotic method is easy enough that you can run it in Excel. I’ve added an Excel spreadsheet to the blog resources that should implement the methods (and matches the output to R fairly closely). As it uses Excel it may not cope gracefully with some calculations (e.g., with extremely small or large values or *r *or other extreme cases) – and I have more confidence in the R code.

*References*

Baguley, T. (2012, in press). Serious stats: A guide to advanced statistics for the behavioral sciences. Basingstoke: Palgrave.

Zou, G. Y. (2007). Toward using confidence intervals to compare correlations. *Psychological Methods, 12,* 399-413.

Wilcox, R. R. (2009). Comparing Pearson correlations: Dealing with heteroscedascity and non-normality. *Communications in Statistics – Simulation & Computation, 38*, 2220-2234.

N.B. R code formatted via Pretty R at inside-R.org

*Posted by Thom on February 5, 2012*

http://seriousstats.wordpress.com/2012/02/05/comparing-correlations/