Confidence intervals with tiers: functions for between-subjects (independent measures) ANOVA

In a previous post I showed how to plot difference-adjusted CIs for between-subjects (independent measures) ANOVA designs (see here). The rationale behind this kind of graphical display is introduced in Chapter 3 of Serious stats (and summarized in my earlier blog post).

In a between-subjects – or in indeed in a within-subjects (repeated measures) – design you or your audience will not always be interested only in the differences between the means. Rarely, the main focus may even be on the individual estimates themselves. A CI for each of the individual means might be informative for several reasons.

First, it may be important to know that the interval excludes an important parameter value (e.g., zero). The example in Chapter 3 of  Serious Stats involved a task in which participants had to decide which of two diagrams matched a description they had just read. Chance performance is 50% matching accuracy, so a graphical display that showed that the 95% CI for each mean excludes 50% suggests that participants in each group were performing above chance.

Second the CI for an individual mean gives you an idea of the relative precision with which that quantity is measured. This may be particularly important in an applied domain. For example, you may want to be fairly sure that performance on a task is high in some conditions as well as being sure that there are differences between conditions.

Third, the CIs for the individual means are revealing about changes in the precision between conditions. If the sample sizes are equal (or nearly equal) they are also revealing about patterns in the variances. This is because the precision of the individual means is a function of the standard error and n. This may be obscured when difference-adjusted CIs are plotted – though mainly for within-subjects (repeated measures) designs which have to allow for the correlation between the samples.

In any case, it may be desirable to display CIs for individual means and difference-adjusted means on the same plot. This could be accomplished in several ways but I have proposed using a two-tiered CI plot (see here for a brief summary of my BRM paper on this or see Chapter 16 of Serious stats).

A common approach (for either individual means or difference-adjusted CIs) is to  adopt a pooled error term. This results in a more accurate CI if the homogeneity of variance assumption is met. For the purposes of a graphical display I would generally avoid pooled error terms (even if you use a pooled error term in your ANOVA). A graphical display of means is useful as an exploratory aid and supports informal inference. You want to be able to see any patterns in the precision (or variances) of the means. Sometimes these patterns are clear enough to be convincing without further (formal) inference or modeling. If they aren’t completely convincing it usually better to show the noisy graphic and supplement it with formal inference if necessary.

Experienced researchers understand that real data are noisy and may (indeed should!) get suspicious if data are too clean. (I’m perhaps being optimistic here – but we really ought to have more tolerance for noisy data, as this should reduce the pressure on honest researchers to ‘optimize’ their analyses – e.g., see here).

My earlier post on this blog provided functions for the single tier difference-adjusted CIs. Here is the two-tiered function (for a oneway design):

plot.bsci.tiered <- function(data.frame, group.var=1, dv.var=2,  var.equal=FALSE, conf.level = 0.95, xlab = NULL, ylab = NULL, level.labels = NULL, main = NULL, pch = 19, pch.cex = 1.3, text.cex = 1.2, ylim = c(min.y, max.y), line.width= c(1.5, 1.5), tier.width=0, grid=TRUE) {
          data <- subset(data.frame, select=c(group.var, dv.var))
	fact <- factor(data[[1]])
	dv <- data[[2]]
	J <- nlevels(fact)
	ci.outer <- bsci(data.frame=data.frame , group.var=group.var, dv.var=dv.var,  difference=FALSE, var.equal=var.equal, conf.level =conf.level)
	ci.inner <- bsci(data.frame=data.frame , group.var=group.var, dv.var=dv.var,  difference=TRUE, var.equal=var.equal, conf.level =conf.level)
	moe.y <- max(ci.outer) - min(ci.outer)
    min.y <- min(ci.outer) - moe.y/3
    max.y <- max(ci.outer) + moe.y/3
    if (missing(xlab)) 
        xlab <- "Groups"
    if (missing(ylab)) 
        ylab <- "Confidence interval for mean"
   plot(0, 0, ylim = ylim, xaxt = "n", xlim = c(0.7, J + 0.3), xlab = xlab, 
        ylab = ylab, main = main, cex.lab = text.cex)
    if (grid == TRUE) grid()
    points(ci.outer[,2], pch = pch, bg = "black", cex = pch.cex)
    index <- 1:J
    segments(index, ci.outer[, 1], index, ci.outer[, 3], lwd = line.width[1])
    axis(1, index, labels = level.labels)
    if(tier.width==0) {
    segments(index - 0.025, ci.inner[, 1], index + 0.025, ci.inner[, 1], lwd = line.width[2])
    segments(index - 0.025, ci.inner[, 3], index + 0.025, ci.inner[, 3], lwd = line.width[2])
    }
    else segments(index, ci.inner[, 1], index, ci.inner[, 3], lwd = line.width[1]*(1 + abs(tier.width)))
	}

The following example uses the diagram data from the book:

source('http://www2.ntupsychology.net/seriousstats/SeriousStatsAllfunctions.txt')

diag.dat <- read.csv('http://www2.ntupsychology.net/seriousstats/diagram.csv')

plot.bsci.tiered(diag.dat, group.var=2, dv.var=4, ylab='Mean description quality', main = 'Two-tiered CIs for the Diagram data', tier.width=1)

The result is a plot that looks something like this (though I should probably have reordered the groups and labeled them):

For these data the group sizes are equal and thus the width of the outer tier reflect differences in variances between the groups. The variances are not very unequal, but neither are they particularly homogenous. The inner tier suggests group three is different from groups 2 and 4 (but not from group 1). This is a pretty decent summary of what’s going on and could be supplemented by formal inference (see Chapter 13 for a comparison of several formal approaches also using this data set).

N.B. R code formatted via Pretty R at inside-R.org

Footnote: The aesthetics of error bar plots

A major difference between the plot shown here and that in my BRM paper or in the book is that I have changed the method of plotting the tiers. The change is mainly aesthetic, but also reflects the desire not to emphasize the extremes of the error bar. The most plausible values of the parameter (e.g., mean) are towards the center of the interval – not at the extremes. I have discussed the reasons for my change of heart in a bit more detail elsewhere.

To this end I have also updated all my plotting functions. They still use the crossbar style from the book by default but this is controlled by a tier width argument. If tier.width=0 the crossbar style is used otherwise it used the tier.width to control the additional thickness of the difference-adjusted lines. In general, tier.width=1 seems to work well (but the crossbar style may be necessary for some unusual within-subject CIs where the difference-adjusted CI is wider than the CI for the individual means).

Pasting Excel data into R on a Mac

When starting out with R, getting data in and out can be a bit of a pain. It should take long to work out a convenient method – depending on what OS you use and what other packages you work with.

In my case I prefer to work with Excel spreadsheets (which are versatile and – for the most part – convenient for sharing with collaborators or students). For this reason I mostly work with comma separated variable files created in Excel an imported using read.csv(). I even quite like the fact that this method requires me to save the .xls worksheet as a .csv file (as it makes it harder to over-write the original file when I edit it for R). In know that there are many other methods that I could use, but this works fine for me.

I do however occasionally miss some of the functionality of software such as MLwiN that allows me to paste Excel data directly into it. I’ve seen instructions about how to do this on a Windows machine (e.g., see John Cook’s notes), but a while back I stumbled on a simple solution for the Mac. I’ve forgotten where I saw it (but will add a link as soon as I find it or if someone reminds me). The solution uses read.table() but is a bit fiddly and therefore best set up as a function.

paste.data <- function(header=FALSE) {read.table(pipe("pbpaste"), header=header)}

I’ve included this in my master function list so it can be loaded with other functions from the book and blog.

To use it just copy tab-delimited data (the default for copying from an Excel file or Word table) and call the function in R. The data are then imported as a data frame in R. For an empty call it assumes there is no header and adds default variable names. Adding the argument header=TRUE or just TRUE will treat the first row as variable (column) names for the data frame. Copy some data and try the following:

source('http://www2.ntupsychology.net/seriousstats/SeriousStatsAllfunctions.txt')

paste.data()

paste.data(header = TRUE)
paste.data(TRUE)
paste.data(T)

N.B. R code formatted via Pretty R at inside-R.org

UPDATE: Ken Knoblauch pointed out an older discussion of this issue in https://stat.ethz.ch/pipermail/r-help/2005-February/066257.html and also noted the read.clipboard() function in William Revelle’s excellent psych package (which works on both PC and Mac systems).

R functions for serious stats

UPDATE: Some problems arose with my previous host so I have now updated the links here and elsewhere on the blog.

The companion web site for Serious Stats has a zip file with R scripts for each chapter. This contains examples of R code and and all my functions from the book (and a few extras). This is a convenient form for working through the examples. However, if you just want to access the functions it is more convenient to load them all in at once.

The functions can be downloaded as a text file from:

http://www2.ntupsychology.net/seriousstats/SeriousStatsAllfunctions.txt

More conveniently, you can load them directly into R with the following call:

source('http://www2.ntupsychology.net/seriousstats/SeriousStatsAllfunctions.txt')

In addition to the Serious Stats functions, a number of other functions are contained in the text file. These include functions published on this blog for comparing correlations or confidence intervals for independent measures ANOVA and functions my paper on confidence intervals for repeated measures ANOVA.

N.B. R code formatted via Pretty R at inside-R.org

Follow

Get every new post delivered to your Inbox.

Join 37 other followers