Near-instant high quality graphs in R

One of the main attractions of R (for me) is the ability to produce high quality graphics that look just the way you want them to. The basic plot functions are generally excellent for exploratory work and for getting to know your data. Most packages have additional functions for appropriate exploratory work or for summarizing and communicating inferences. Generally the default plots are at least as good as other (e.g., commercial packages) but with the added advantage of being fairly easy to customize once you understand basic plotting functions and parameters.

Even so, getting a plot looking just right for a presentation or publication often takes a lot of work using basic plotting functions. One reason for this is that constructing a good graphic is an inherently difficult enterprise, one that balances aesthetic factors and statistical factors and that requires a good understanding of who will look at the graphic, what they know, what they want to know and how they will interpret it. It can takes hours – maybe days – to get a graphic right.

In Serious Stats I focused on exploratory plots and how to use basic plotting functions to customize them. I think this was important to include, but one of my regrets was not having enough space to cover a different approach to plotting in R. This is Hadley Wickham’s ggplot2 package (inspired by Leland Wilkinson’s grammar of graphics approach).

In this blog post I’ll quickly demonstrate a few ways that ggplot2 can be used to quickly produce amazing graphics for presentations or publication. I’ll finish by mentioning some pros and cons of the approach.

The main attraction of ggplot2 for newcomers to R is the qplot() quick plot function. Like the R plot() function it will recognize certain types and combinations of R objects and produce an appropriate plot (in most cases). Unlike the basic R plots the output tends to be both functional and pretty. Thus you may be able to generate the graph you need for your talk or paper almost instantly.

A good place to start is the vanilla scatter plot. Here is the R default:

Scatter default in R

Compare it with the ggplot2 default:

Scatter with ggplot2 default

Below is the R code for comparison. (The data here are from hov.csv file used in Chapter 10 Example 10.2 of Serious Stats).

# install and then load the package
install.packages('ggplot2')
library(ggplot2)

# get the data
hov.dat <-  read.csv('http://www2.ntupsychology.net/seriousstats/hov.csv')

# using plot()
with(hov.dat, plot(x, y))

# using qplot()
qplot(x, y, data=hov.dat)

R code formatted by Pretty R at inside-R.org

Adding a line of best fit

The ggplot2 version is (in my view) rather prettier, but a big advantage is being able to add a range of different model fits very easily. The common choice of model fit is that of a straight line (usually the least squares regression line). Doing this in ggplot2 is easier than with basic plot functions (and you also get 95% confidence bands by default).

Here is the straight line fit from a linear model:

Linear modelPOST

qplot(x, y, data=hov.dat, geom=c(‘point’, ‘smooth’), method=’lm’)

The geom specifies the type of plot (one with points and a smoothed line in this case) while the method specifies the model for obtaining the smoothed line. A formula can also be added (but the formula defaults to y as a simple linear function of x).

Loess, polynomial fits or splines

Mind you, the linear model fit has some disadvantages. Even if you are working with a related statistical model (e.g., a Pearson’s r or least squares simple or multiple regression) you might want to have a more data driven plot. A good choice here is to use a local regression approach such as loess. This lets the data speak for themselves – effectively fitting a complex curve driven by the local properties of the data. If this is reasonably linear then your audience should be able to see the quality of the straight-line fit themselves. The local regression also gives approximate 95% confidence bands. These may support informal inference without having to make strong assumptions about the model.

Here is the loess plot:

loess plot.png

Here is the code for the loess plot:

qplot(x, y, data=hov.dat, geom=c(‘point’, ‘smooth’), method=’loess’)

I like the loess approach here because its fairly obvious that the linear fit does quite well. showing the straight line fit has the appearance of imposing the pattern on the data, whereas a local regression approach illustrates the pattern while allowing departures from the straight line fit to show through.

In Serious Stats I mention loess only in passing (as an alternative to polynomial regression). Loess is generally superior as an exploratory tool – whereas polynomial regression (particularly quadratic and cubic fits) are more useful for inference. Here is an example of a cubic polynomial fit (followed by R code):

cubic fit.png

qplot(x, y, data=hov.dat, geom=c(‘point’, ‘smooth’), method=’lm’, formula= y ~ poly(x, 2))

Also available are fits using robust linear regression or splines. Robust linear regression (see section 10.5.2 of Serious Stats for a brief introduction) changes the loss function least squares in order to reduce impact of extreme points. Sample R code (graph not shown):

library(MASS)
qplot(x, y, data=hov.dat, geom=c(‘point’, ‘smooth’), method=’rlm’)

One slight problem here is that the approximate confidence bands assume normality and thus are probably too narrow.

Splines are an alternative to loess that fits sections of simpler curves together. Here is a spline with three degrees of freedom:

spline fit 3.png

library(splines)
qplot(x, y, data=hov.dat, geom=c(‘point’, ‘smooth’), method=’lm’, formula=y ~ ns(x, 3))

A few final thoughts

If you want to know more the best place to start is with Hadley Wickham’s book. Chapter 2 covers qplot() and is available free online.

The immediate pros of the ggplot2 approach are fairly obvious – quick, good-looking graphs. There is, however, much more to the package and there is almost no limit to what you can produce. The output of the ggplot2 functions is itself an R object that can be stored and edited to create new graphs. You can use qplot() to create many other graphs – notably kernel density plots, bar charts, box plots and histograms. You can get these by changing the geom (or by default with certain object types an input).

The cons are less obvious. First, it takes some time investment to get to grips with the grammar of graphics approach (though this is very minimal if you stick with the quick plot function). Second, you may not like the default look of the ggplot2 output (though you can tweak it fairly easily). For instance, I prefer the default kernel density and histogram plots from the R base package to the default ggplot2 ones. I like to take a bare bones plot and build it up … trying to keep visual clutter to a minimum. I also tend to want black and white images for publication (whereas I would use grey and colour images more often in presentations). This is mostly to do with personal taste.

Confidence intervals with tiers: functions for between-subjects (independent measures) ANOVA

In a previous post I showed how to plot difference-adjusted CIs for between-subjects (independent measures) ANOVA designs (see here). The rationale behind this kind of graphical display is introduced in Chapter 3 of Serious stats (and summarized in my earlier blog post).

In a between-subjects – or in indeed in a within-subjects (repeated measures) – design you or your audience will not always be interested only in the differences between the means. Rarely, the main focus may even be on the individual estimates themselves. A CI for each of the individual means might be informative for several reasons.

First, it may be important to know that the interval excludes an important parameter value (e.g., zero). The example in Chapter 3 of  Serious Stats involved a task in which participants had to decide which of two diagrams matched a description they had just read. Chance performance is 50% matching accuracy, so a graphical display that showed that the 95% CI for each mean excludes 50% suggests that participants in each group were performing above chance.

Second the CI for an individual mean gives you an idea of the relative precision with which that quantity is measured. This may be particularly important in an applied domain. For example, you may want to be fairly sure that performance on a task is high in some conditions as well as being sure that there are differences between conditions.

Third, the CIs for the individual means are revealing about changes in the precision between conditions. If the sample sizes are equal (or nearly equal) they are also revealing about patterns in the variances. This is because the precision of the individual means is a function of the standard error and n. This may be obscured when difference-adjusted CIs are plotted – though mainly for within-subjects (repeated measures) designs which have to allow for the correlation between the samples.

In any case, it may be desirable to display CIs for individual means and difference-adjusted means on the same plot. This could be accomplished in several ways but I have proposed using a two-tiered CI plot (see here for a brief summary of my BRM paper on this or see Chapter 16 of Serious stats).

A common approach (for either individual means or difference-adjusted CIs) is to  adopt a pooled error term. This results in a more accurate CI if the homogeneity of variance assumption is met. For the purposes of a graphical display I would generally avoid pooled error terms (even if you use a pooled error term in your ANOVA). A graphical display of means is useful as an exploratory aid and supports informal inference. You want to be able to see any patterns in the precision (or variances) of the means. Sometimes these patterns are clear enough to be convincing without further (formal) inference or modeling. If they aren’t completely convincing it usually better to show the noisy graphic and supplement it with formal inference if necessary.

Experienced researchers understand that real data are noisy and may (indeed should!) get suspicious if data are too clean. (I’m perhaps being optimistic here – but we really ought to have more tolerance for noisy data, as this should reduce the pressure on honest researchers to ‘optimize’ their analyses – e.g., see here).

My earlier post on this blog provided functions for the single tier difference-adjusted CIs. Here is the two-tiered function (for a oneway design):

plot.bsci.tiered <- function(data.frame, group.var=1, dv.var=2,  var.equal=FALSE, conf.level = 0.95, xlab = NULL, ylab = NULL, level.labels = NULL, main = NULL, pch = 19, pch.cex = 1.3, text.cex = 1.2, ylim = c(min.y, max.y), line.width= c(1.5, 1.5), tier.width=0, grid=TRUE) {
          data <- subset(data.frame, select=c(group.var, dv.var))
	fact <- factor(data[[1]])
	dv <- data[[2]]
	J <- nlevels(fact)
	ci.outer <- bsci(data.frame=data.frame , group.var=group.var, dv.var=dv.var,  difference=FALSE, var.equal=var.equal, conf.level =conf.level)
	ci.inner <- bsci(data.frame=data.frame , group.var=group.var, dv.var=dv.var,  difference=TRUE, var.equal=var.equal, conf.level =conf.level)
	moe.y <- max(ci.outer) - min(ci.outer)
    min.y <- min(ci.outer) - moe.y/3
    max.y <- max(ci.outer) + moe.y/3
    if (missing(xlab)) 
        xlab <- "Groups"
    if (missing(ylab)) 
        ylab <- "Confidence interval for mean"
   plot(0, 0, ylim = ylim, xaxt = "n", xlim = c(0.7, J + 0.3), xlab = xlab, 
        ylab = ylab, main = main, cex.lab = text.cex)
    if (grid == TRUE) grid()
    points(ci.outer[,2], pch = pch, bg = "black", cex = pch.cex)
    index <- 1:J
    segments(index, ci.outer[, 1], index, ci.outer[, 3], lwd = line.width[1])
    axis(1, index, labels = level.labels)
    if(tier.width==0) {
    segments(index - 0.025, ci.inner[, 1], index + 0.025, ci.inner[, 1], lwd = line.width[2])
    segments(index - 0.025, ci.inner[, 3], index + 0.025, ci.inner[, 3], lwd = line.width[2])
    }
    else segments(index, ci.inner[, 1], index, ci.inner[, 3], lwd = line.width[1]*(1 + abs(tier.width)))
	}

The following example uses the diagram data from the book:

source('http://www2.ntupsychology.net/seriousstats/SeriousStatsAllfunctions.txt')

diag.dat <- read.csv('http://www2.ntupsychology.net/seriousstats/diagram.csv')

plot.bsci.tiered(diag.dat, group.var=2, dv.var=4, ylab='Mean description quality', main = 'Two-tiered CIs for the Diagram data', tier.width=1)

The result is a plot that looks something like this (though I should probably have reordered the groups and labeled them):

For these data the group sizes are equal and thus the width of the outer tier reflect differences in variances between the groups. The variances are not very unequal, but neither are they particularly homogenous. The inner tier suggests group three is different from groups 2 and 4 (but not from group 1). This is a pretty decent summary of what’s going on and could be supplemented by formal inference (see Chapter 13 for a comparison of several formal approaches also using this data set).

N.B. R code formatted via Pretty R at inside-R.org

Footnote: The aesthetics of error bar plots

A major difference between the plot shown here and that in my BRM paper or in the book is that I have changed the method of plotting the tiers. The change is mainly aesthetic, but also reflects the desire not to emphasize the extremes of the error bar. The most plausible values of the parameter (e.g., mean) are towards the center of the interval – not at the extremes. I have discussed the reasons for my change of heart in a bit more detail elsewhere.

To this end I have also updated all my plotting functions. They still use the crossbar style from the book by default but this is controlled by a tier width argument. If tier.width=0 the crossbar style is used otherwise it used the tier.width to control the additional thickness of the difference-adjusted lines. In general, tier.width=1 seems to work well (but the crossbar style may be necessary for some unusual within-subject CIs where the difference-adjusted CI is wider than the CI for the individual means).

Independent measures (between-subjects) ANOVA and displaying confidence intervals for differences in means

In Chapter 2 (Confidence Intervals) of Serious stats I consider the problem of displaying confidence intervals (CIs) of a set of means (which I illustrate with the simple case of two independent means). Later, in Chapter 16 (Repeated Measures ANOVA), I consider the trickier problem of displaying of two or more means from paired or repeated measures. The example in Chapter 16 uses R functions from my recent paper reviewing different methods for displaying means for repeated measures (within-subjects) ANOVA designs (Baguley, 2012b). For further details and links see a brief summary on my psychological statistics blog. The R functions included a version for independent measures (between-subject) designs, but this was a rather limited designed for comparison purposes (and not for actual use).

The independent measures case is relatively straight-forward to implement and I hadn’t originally planned to write functions for it. Since then, however, I have decided that it is worth doing. Setting up the plots can be quite fiddly and it may be useful to go over the key points for the independent case before you move on to the repeated measures case. This post therefore adapts my code for independent measures (between-subjects) designs.

The approach I propose is inspired by Goldstein and Healy (1995) – though other authors have made similar suggestions over the years (see Baguley, 2012b). Their aim was to provide a simple method for displaying a large collection of independent means (or other independent statistics). At its simplest the method reduces to plotting each statistic with error bars equal to ±1.39 standard errors of the mean. This result is a normal approximation that can be refined in various ways (e.g., by using the t distribution or by extending it to take account of correlations between conditions). Using a Goldstein-Healy plot two means are considered different with 95% confidence if their two intervals do not overlap. In other words non-overlapping CIs are (in this form of plot) approximately equivalent to a statistically significant difference between the two means with α = .05. For convenience I will refer to CIs that have this property as difference-adjusted CIs (to distinguish them from conventional CIs).

It is important to realize that conventional 95% CIs constructed around each mean won’t have this property. For independent means they are usually around 40% too wide and thus will often overlap even if the usual t test of their difference is statistically significant at p < .05. This happens because the variance of a difference is (in independent samples) equal to the sum of the variances of the individual samples. Thus the standard error of the difference is around \sqrt 2 times too large (assuming equal variances). For a more comprehensive explanation see Chapter 3 of Serious stats or Baguley (2012b).

What to plot

If you have only two means there are at least three basic options:

1) plot the individual means with conventional 95% CIs around each mean

2) plot the difference between means and a 95% CI for the difference

3) plot some form of difference-adjusted CI

Which option is  best? It depends on what you are trying to do. A good place to start is with your reasons for constructing a graphical display in the first place. Graphs are not particularly good for formal inference and other options (e.g., significance tests, reporting point estimates CIs in text, likelihood ratios, Bayes factors and so forth) exist for reporting the outcome of formal hypothesis tests. Graphs are appropriate for informal inference. This includes exploratory data analysis, to aid the interpretation of complex patterns or to summarize a number of simple patterns in a single display. If the patterns are very clear, informal inference might be sufficient. In other cases it can be supplemented with formal inference.

What patterns do the three basic options above reveal? Option 1) shows the precision around individual means. This readily supports inference about the individual means (but not their difference). For example, a true population outside the 95% CI is considered implausible (and the observed mean would be different from that hypothesized value with p < .05 using a one sample t test).

Option 2) makes for a rather dull plot because it just involves a single point estimate for the difference in means and the 95% CI for the difference. If this is the only quantity of interest you’d be better off just reporting the mean and 95% CI in the text. This has advantage of being more compact and more accurate than trying to read the numbers off a graph. [This is one reason that graphs aren't optimal for formal inference; it can be hard, for instance, to tell whether a line includes zero or excludes zero when the difference is just statistically significant or just statistically non-significant. With informal inference you shouldn't care where p = .049 or p = .051, but whether there are any clear patterns in the data]

Option 3) shows you the individual means but calibrates the CIs so that you can tell if it is plausible that the sample means differ (using 95% confidence in the difference as a standard). Thus it seems like a good choice for graphical display if you are primarily interested in the differences between means. For formal inference it can be supplemented by reporting a hypothesis test in the text (or possibly a Figure caption).

It is worth noting that option 3) becomes even more attractive if you have more than two means to plot. It allows you to see patterns that emerge over the set of means (e.g., linear or non-linear trends or – if n per sample is similar – changes in variances) and to compare pairs of means to see whether it is plausible that they are different.

In contrast, option 2) is rather unattractive with more than two means. First, with J means there are J(J-1)/2 differences and thus an unnecessarily cluttered graphical display (e.g., with J = 5 means there are 10 Cis to plot). Second, plotting only the differences can obscure important patterns in the data (e.g., an increasing or decreasing trend in the means or variances would be difficult to identify).

Difference-adjusted CIs using the t distribution

Where only a few means are to be plotted (as is common in ANOVA) it makes sense to take a slight more accurate approach than the approximation originally proposed by Goldstein and Healy for large collections of means. This approach uses the t distribution. A similar approach is advocated by Afshartous and Preston (2010) who also provide R code for calculating multipliers for the standard errors using the t distribution (and an extension for the repeated measures). My approach is similar, but involves calculating the margin of error (half width of the error bars) directly rather than computing a multiplier to apply to the standard error.

Difference-adjusted CIs for the mean of each sample from an independent measures (between-subjects) ANOVA design is given by Equation 3.31 of Serious stats:

\hat \mu _j \pm t_{n_j - 1,1 - {\alpha \mathord{\left/  {\vphantom {\alpha 2}} \right.  \kern-\nulldelimiterspace} 2}} {{\sqrt 2 } \over 2} \times \hat \sigma _{\hat \mu _j }

The \hat \mu _j term is the mean of the jth sample (where samples are labeled j = 1 to J) and \hat \sigma _{\hat \mu _j }  is the standard error of that sample. The  t_{n_j - 1,1 - {\alpha \mathord{\left/  {\vphantom {\alpha 2}} \right.  \kern-\nulldelimiterspace} 2}} term is the quantile of the t distribution with n_j - 1 degrees of freedom (where n_j is the size of jth sample) that includes to 100(1 - α) % of the distribution.

Thus, apart from the {{\sqrt 2 } \mathord{\left/  {\vphantom {{\sqrt 2 } 2}} \right.  \kern-\nulldelimiterspace} 2} term, this equation is identical to that for a 95% CI around the individual means, with the proviso that the standard error here is computed separately for each sample. This differs from the usual approach to plotting CIs for independent measures ANOVA design – where it is common to use a pooled standard error computed from a pooled standard deviation ( the root mean square error of the ANOVA) . While a pooled error term is sometimes appropriate, it is generally a bad idea for graphical display of the CIs because it will obscure any patterns in the variability of the samples. [Nevertheless, where n_j is very small it make make sense to use a pooled error term on the grounds that each sample provides an exceptionally poor estimate of its population standard deviation]

However, the most important change is the {{\sqrt 2 } \mathord{\left/  {\vphantom {{\sqrt 2 } 2}} \right.  \kern-\nulldelimiterspace} 2} term. It creates a difference-adjusted CI by ensuring that the joint width of the margin of error around any two means is $latex \sqrt 2 $ times larger than for a single mean. The division by 2 arises merely as a consequence of dealing jointly with two error bars. Their total has to be $latex \sqrt 2 $ times larger and therefore each one needs only to be {{\sqrt 2 } \mathord{\left/  {\vphantom {{\sqrt 2 } 2}} \right.  \kern-\nulldelimiterspace} 2} times its conventional value (for an unadjusted CI). This is discussed in more detail by Baguley (2012a; 2012b).

This equation should perform well (e.g., providing fairly accurate coverage) as long as variances are not very unequal and the samples are approximately normal. Even when these conditions are not met, remember the aim is not to support formal inference. In addition, the approach is likely to be slightly more robust than ANOVA (at least to homogeneity of variance and unequal sample sizes). So this method is likely to be a good choice whenever ANOVA is appropriate.

R functions for independent measures (between-subjects) ANOVA designs

Two R functions for difference-adjusted CIs in independent measures ANOVA designs are provided here.  The first function bsci() calculates conventional or difference-adjusted CIs for a one-way ANOVA design.

bsci <- function(data.frame, group.var=1, dv.var=2, difference=FALSE, pooled.error=FALSE, conf.level=0.95) {
	data <- subset(data.frame, select=c(group.var, dv.var))
	fact <- factor(data[[1]])
	dv <- data[[2]]
	J <- nlevels(fact)
	N <- length(dv)
    ci.mat <- matrix(,J,3, dimnames=list(levels(fact), c('lower', 'mean', 'upper')))
    ci.mat[,2] <- tapply(dv, fact, mean)
    n.per.group <- tapply(dv, fact, length)
    if(difference==TRUE) diff.factor= 2^0.5/2 else diff.factor=1
    if(pooled.error==TRUE) {
		for(i in 1:J) {
			moe <- summary(lm(dv ~ 0 + fact))$sigma/(n.per.group[[i]])^0.5 * qt(1-(1-conf.level)/2,N-J) * diff.factor
			ci.mat[i,1] <- ci.mat[i,2] - moe
			ci.mat[i,3] <- ci.mat[i,2] + moe
			}
		}
	if(pooled.error==FALSE) {
		 for(i in 1:J) {
		 	group.dat <- subset(data, data[1]==levels(fact)[i])[[2]]
		 	moe <- sd(group.dat)/sqrt(n.per.group[[i]]) * qt(1-(1-conf.level)/2,n.per.group[[i]]-1) * diff.factor
		 	ci.mat[i,1] <- ci.mat[i,2] - moe
		 	ci.mat[i,3] <- ci.mat[i,2] + moe
		}
	}
    ci.mat
}

plot.bsci <- function(data.frame, group.var=1, dv.var=2, difference=TRUE, pooled.error=FALSE, conf.level=0.95, xlab=NULL, ylab=NULL, level.labels=NULL, main=NULL, pch=21, ylim=c(min.y, max.y), line.width=c(1.5, 0), grid=TRUE) {
    data <- subset(data.frame, select=c(group.var, dv.var))
    if(missing(level.labels)) level.labels <- levels(data[[1]])
	if (is.factor(data[[1]])==FALSE) data[[1]] <- factor(data[[1]])
	if (is.factor(data[[1]])==TRUE) data[[1]] <- factor(data[[1]])
	dv <- data[[2]]
	J <- nlevels(data[[1]])
	ci.mat <- bsci(data.frame=data.frame, group.var=group.var, dv.var=dv.var, difference=difference, pooled.error=pooled.error, conf.level=conf.level)
	moe.y <- max(ci.mat) - min(ci.mat)
    min.y <- min(ci.mat) - moe.y/3
    max.y <- max(ci.mat) + moe.y/3
    if (missing(xlab))
        xlab <- "Groups"
    if (missing(ylab))
        ylab <- "Confidence interval for mean"
    plot(0, 0, ylim = ylim, xaxt = "n", xlim = c(0.7, J + 0.3), xlab = xlab,
        ylab = ylab, main = main)
        grid()
    points(ci.mat[,2], pch = pch, bg = "black")
    index <- 1:J
    segments(index, ci.mat[, 1], index, ci.mat[, 3], lwd = line.width[1])
    segments(index - 0.02, ci.mat[, 1], index + 0.02, ci.mat[, 1], lwd = line.width[2])
    segments(index - 0.02, ci.mat[, 3], index + 0.02, ci.mat[, 3], lwd = line.width[2])
    axis(1, index, labels=level.labels)
	}

The default is difference=FALSE (on the basis that these are the CIs most likely to be reported in text or tables). The second function plot.bsci() uses the former function to plot  the means and CIs the default here is difference=TRUE (on the basis that it the difference-adjusted CIs are likely to be more useful for graphical display). For both functions the default is a pooled error term (pooled.error=FALSE) and a 95% confidence level (conf.level=0.95). Each function also takes input as a data frame and assumes that the grouping variable is the first column and the dependent variable the second column. If the appropriate variables are in different columns, the correct columns can be specified with the arguments group.var and dv.var. The plotting function also takes some standard graphical parameters (e.g., for labels and so forth).

The following examples use the diagram data set from Serious stats. The first line loads the data set (if you have a live internet connection). The second line generated the difference-adjusted CIs. The third line plots the difference adjusted CIs. Note that the grouping variable (factor) is in the second column and the DV is in the fourth column.

diag.dat <- read.csv('http://www2.ntupsychology.net/seriousstats/diagram.csv')

bsci(diag.dat, group.var=2, dv.var=4, difference=TRUE)
plot.bsci(diag.dat, group.var=2, dv.var=4, ylab='Mean description quality', main = 'Difference-adjusted 95% CIs for the Diagram data')

In this case the graph looks like this:

It should be immediately clear that while the segmented diagram condition (S) tends to have higher scores than the text (T) or picture (P) conditions, but the full diagram (F) condition is somewhere in between. This matches the uncorrected pairwise comparisons where S > P = T, S = F, and F = P = T.

At some point I will also add a function to plot two-tiered error bars (combining option 1 and 3). For details of the extension to repeated measures designs see Baguley (2012b). The code and date sets are available here.

References

Afshartous D., & Preston R. A. (2010). Confidence intervals for dependent data: equating nonoverlap with statistical significance. Computational Statistics and Data Analysis. 54, 2296-2305.

Baguley, T. (2012a, in press). Serious stats: A guide to advanced statistics for the behavioral sciences. Basingstoke: Palgrave.

Baguley, T. (2012b). Calculating and graphing within-subject confidence intervals for ANOVA. Behavior Research Methods, 44, 158-175.

Goldstein, H., & Healy, M. J. R. (1995). Journal of the Royal Statistical Society. Series A (Statistics in Society), 158, 175-177.

Schenker, N., & Gentleman, J. F. (2001). On judging the significance of differences by examining the overlap between confidence intervals. The American Statistician, 55, 182-186.

N.B. R code formatted via Pretty R at inside-R.org

Serious stats – a quick chapter summary

Here is a list of the contents by chapter with quick notes on chapter content …

0. Preface (About the book; notes on software, mathematics and types of boxed sections)

1. Data, Samples and Statistics (A gentle review of measures of central tendency and dispersion with a little more depth in places – flagging up the distinction between descriptive and inferential formulas and perhaps introducing a few unfamiliar statistics such the geometric mean)

2. Probability Distributions (A background chapter giving a whirlwind tour of the main probability distributions – discrete and continuous – that crop up in later chapters. It also introduces important concepts such probability mass functions, probability density functions and cumulative density functions and characteristics of distributions such as skew, kurtosis and whether they are bounded. From a statistical point of view it is a quick overview missing out a lot of the difficult stuff. )

3. Confidence Intervals (This chapter introduces interval estimation using confidence intervals (CIs) and gives examples for discrete and continuous distributions – particularly those for means and differences between independent or paired means using the t distribution. This chapter also introduces Monte Carlo methods – with emphasis on the bootstrap.)

4. Significance Tests (This chapter introduces significance tests. These are deliberately covered after CIs – which are less popular in the behavioral sciences but generally more useful. A number of common tests are covered – notably t tests and chi-square tests. The chapter ends with some comments on the appropriate use of significance tests – a point picked up again in chapter 11.)

5. Regression (This chapter introduces regression – with an emphasis on simple linear regression. Later chapters draw heavily on this basic material including concepts such as prediction, leverage and influence. The versatility of regression approaches is shown by illustrating how an independent t test is a simple regression model and how a linear model can fit some curvilinear relationships.)

6. Correlation and Covariance (Introduces covariance and correlation with emphasis on the link between Pearson’s r and simple linear regression. The chapter also introduces standardization and problems of working with standardized quantities such as boundary effects, range restriction and small sample bias. Methods for inference with correlation coefficients and comparing correlations (e.g., using the Fisher z distribution) are considered. Some alternatives to Pearson’s r are also introduced.)

7. Effect Size (This chapter focuses on effect size, starting with an overview of the different uses of effect size metrics. The chapter gives a tour of different types of effect size metrics, distinguishing between: continuous and discrete metrics; simple (unstandardized) and standardized metrics; focused (1 df) and unfocused (multiple df) metrics; base rate sensitive and base rate insensitive metrics. I argue that standardized metrics whether based on differences or correlations (d family or r family) are not good measures of the practical, clinical or theoretical importance of an effect because they confound the magnitude of an effect with its variability – though they may be useful in some situations.)

8. Statistical Power (This chapter introduces statistical power – starting by explaining the link between the effect size and statistical power, illustrating why standardized effect size (by combining the magnitude of an effect with its variability) is often a convenient way to summarize an effect in order to estimate statistical power or the sample size required to detect an effect. Problems and pitfalls in statistical power and sample size estimation are discussed. Later sections introduce the accuracy in parameter estimation approach to power in relation to the width of a confidence interval.)

9. Exploring Messy Data (This chapter looks at exploratory analysis of data with emphasis on graphical methods for checking statistical assumptions.)

10. Dealing with Messy Data (This chapter surveys approaches to dealing with violations of statistical assumptions with particular emphasis on robust methods and transformations.)

11. Alternatives to Classical Statistical Inference (This chapter looks at criticism of classical, frequentist methods of inference and considers frequentist responses and three alternative approaches: likelihood, Bayesian and information-theoretic methods. I illustrate each of the alternatives both here and in later chapters.)

12. Multiple Regression and the General Linear Model (This chapter extends regression to models with multiple predictors. The problem of fitting these models when predictors are not orthogonal (i.e., when they are correlated) is introduced and a solution is illustrated using matrix algebra. The rest of the chapter introduces partial and semi-partial correlation and focuses on interpreting a multiple regression model and related issues such as collinearity and suppression.)

13. ANOVA and ANCOVA with Independent Measures (This chapter introduces ANOVA and ANCOVA as special cases of multiple regression with categorical predictors (e.g., using dummy or effect coding). The chapter ends by introducing the multiple comparison problem in relation to differences between means for a factor in ANOVA or differences between adjusted means in ANCOVA. For the latter, the main focus is on modified Bonferroni procedures, though alternatives such as control of false discovery rate and information-theoretic approaches are briefly considered.)

14. Interactions (This chapter looks at modeling non-additive effects of predictors in multiple regression models through the inclusion of interaction terms. It starts by looking at the most general form of an interaction model in multiple regression (often termed a moderated multiple regression) before looking at polynomial terms in regression and interactions in the context of ANOVA and ANCOVA. The main emphasis is on interpreting and exploring interaction effects (e.g., through graphical methods). The chapter also looks at simple main effects and simple interaction effects.)

15. Contrasts (This chapter looks at the often neglected topic of contrasts – mainly in the context of ANOVA and ANCOVA models (where they are weighted combinations of differences in means or adjusted means). Methods for setting up contrasts to test hypotheses about patterns of means are explained for simple cases and extended for unbalanced designs, adjusted means and interaction effects.)

16. Repeated Measures ANOVA (This chapter introduces repeated measures and related (e.g., matched) designs. These increase statistical power by removing individual differences from the ANOVA error term, but at the cost of increased complexity (e.g., making stronger assumptions about the errors of the model). Again, the chapter focuses on checking and dealing with violations of assumptions and on the interpretation of the model. It also briefly considers MANOVA and repeated measures ANCOVA models and the use of gain scores).

17. Modelling Discrete Outcomes (This chapter explains how the regression approach of the general linear model can be extended to models with discrete outcomes using the generalized linear model and related approaches. The main focus is on logistic regression (including multinomial and ordered logistic regression) and Poisson regression, but negative binomial regression and models for excess zeroes (zero-inflated and hurdle models) are briefly reviewed. The chapter ends by considering the difficulty of modeling correlated observations in logistic regression.)

18. Multilevel Models (This chapter introduces multilevel models with particular emphasis on their application to the analysis repeated measures data. The chapter considers conventional nested designs (e.g., repeated measures within participants or children within schools) and moves on to fully crossed models and a brief overview of multilevel generalized linear models).

All chapters come with several examples within the chapter and R code (at the end). Most also have notes on SPSS syntax. I don’t include full SPSS instructions because these are often already available in popular texts. If they aren’t available it is generally because SPSS couldn’t readily implement these analyses. Also note that recent versions of SPSS can be set up to call R via syntax (though I find it easier to use R directly).

Online supplements

The book is around 800 pages long and some material cut from the final draft will be available in five online supplements. This material is either parenthetical (being too detailed than required) or self-contained sections that could stand alone and were perhaps not relevant for all readers.

OS1. Meta-analysis (This section was included in chapter 7 Effect size and introduces meta-analysis. Most meta-analytic approaches for continuous data use standardized effect size metrics. As the chapter argues that simple effect size metrics are often superior for summarizing and comparing effects this chapter uses meta-analysis of simple (raw) mean differences to illustrate fixed effect and random effects models. There is a nice link between random effects meta-analysis and multilevel models – so it was a shame to drop it.)

OS2. Dealing with missing data (An overview of methods for dealing with missing data that was part of chapter 10. The main focus is on multiple imputation – an extremely useful and underused approach in the behavioral sciences and a worked example is demonstrated for both R and SPSS. There are nice links between multiple imputation and meta-analysis – so it made sense to move this chapter out once I had decided to leave out meta-analysis. If you work with missing data and aren’t already familiar with multiple imputation you should take a careful look at this chapter – as most standard methods for dealing with missing data are biased and have low statistical power.)

OS3. Replication probabilities and prep (When I started writing the book there was quite an interest in replication probabilities and prep. in particular as an alternative to p values. This interest has largely faded and my (largely critical) take on prep is now mainly a historical curiosity. The main text now covers this topic briefly in chapter 11. )

OS4. Pseudo-R2 and related measures (A reader of the final draft of chapter 17 commented that given the problems with these measures and my own critical stance on standardized effect size metrics that my coverage of this topic was too detailed. I greatly reduced the emphasis on pseudo-R2 in the text by moving most of the material here. Of these measures my favourite is Zheng and Agresti’s predictive power measure – which I find most intuitive.)

OS5. Loglinear models (Loglinear models are models of contingency table data (closely related to Poisson regression, and under certain conditions equivalent). As Poisson models are generally more flexible, loglinear models were cut from the final draft. However, as they are quite popular in the behavioral sciences – this supplement is provided. Loglinear models are also a convenient way to parameterize a count model to make it more “chi-square-like”. Note: loglinear model can also be used in a more general sense to include models with log link functions or log transformations.)

Follow

Get every new post delivered to your Inbox.

Join 34 other followers