You can find the function and a few more details here.

Filed under: comparing correlations, confidence intervals, R code, serious stats Tagged: confidence intervals, R, R code, software, statistics ]]>

You can read it here.

Filed under: serious stats, stats advice Tagged: correlation and covariance, general linear model, messy data, multicollinearity, multiple regression, psychology, regression, statistics ]]>

Filed under: serious stats Tagged: Code School, R, statistics, Try R ]]>

There are two main practical problems to bear in mind when switching to the multilevel approach. First, the additional complexity of the approach can be daunting at first – though it is possible to built up gently to more complex models. Recent improvements in availability of software and support (textbooks, papers and online resources) also help. The second is that as soon as a model departs markedly from a conventional repeated measures ANOVA, correct inferences (notably significance tests and interval estimates such as confidence intervals) can be difficult to obtain. If the usual ANOVA assumptions hold in a nested, balanced design then there is a known equivalence between the multilevel model inferences using *t* or *F* tests and the familiar ANOVA tests (and this case the expected output of the tests is the same). The main culprits are boundary effects (which effect inferences about variances and hence most tests of random effects) and working out the correct degrees of freedom (*df*) to use for your test statistic. Both these problems are discussed in Chapter 18 of the book. If you have very large samples an asymptotic approach (using Wald *z* or chi-square statistics) is probably just fine. However, the further you depart from conventional repeated measures ANOVA assumptions the harder it is to know how large a sample news to be before the asymptotics kick in. In other words, the more attractive the multilevel approach the less you can rely on the Wald tests (or indeed the Wald-style *t* or *F* tests).

The solution I advocate in Serious Stats is either to use parametric bootstrapping or Markov chain Monte Carlo (MCMC) approaches. Another approach is to use some form of correction to the *df* or test statistic such as the Welch-Satterthwaite correction. For multilevel models with factorial type designs the recommended correction is generally the Kenward-Roger approximation. This is implemented in SAS, but (until recently) not available in R. Judd, Westfall and Kenny (2012) describe how to use the Kenward-Roger approximation to get more accurate significance tests from a multilevel model using R. Their examples use the newly developed pbkrtest package (Halekoh & Højsgaard, 2012) – which also has functions for parametric bootstrapping.

My purpose here is to contrast the the MCMC and Kenward-Roger correction (ignoring the parametric bootstrap for the moment). To do that I’ll go through a worked example – looking to obtain a significance test and a 95% confidence interval (CI) for a single effect.

*The pitch data example*

The example I’ll use is for the pitch data from from Chapter 18 of the book. This experiment (from a collaboration with Tim Wells and Andrew Dunn) involves looking at the at pitch of male voices making attractiveness ratings with respect to female faces. The effect of interest (for this example) is whether average pitch goes up or done for higher ratings (and if so, by how much). A conventional ANOVA is problematic because this is a design with two fully crossed random factors – each participant (*n* = 30) sees each face (*n* = 32) and any conclusions ought to generalise both to other participants and (crucially) to other faces. Furthermore, there is a time-varying covariate – the baseline pitch of the numerical rating when no face is presented. The significance tests or CIs reported by most multilevel modelling packages with also be suspect. Running the analysis in the R package lme4 gives parameter estimates and *t* statistics for the fixed effects but no *p* values or CIs. The following R code loads the pitch data, checks the first few cases, loads lme4 and runs the model of interest. (You should install lme4 using the command install.packages(‘lme4’) if you haven’t done so already).

Note the lack of df and p values. This is deliberate policy by the lme4 authors; they are not keen on giving users output that has a good chance of being very wrong.

*The Kenward-Roger approximation*

This approximation involves adjusting both the *F* statistic and its *df* so that the *p* value comes out approximately correct (see references below for further information). It won’t hurt too much to think of it as turbocharged Welch-Satterthwaite correction. To get the corrected *p* value from this approach first install the pbkrtest package and then load it. The approximation is computed using the KRmodcomp() function. This takes the model of interest (with the focal effect) and a reduced model (one without the focal effect). The code below installs and loads everything, runs the reduced model and then uses KRmodcomp() to get the corrected *p* value. Note that it may take a while to run (it took about 30 seconds on my laptop).

install.packages('pbkrtest') library(pbkrtest) pitch.red <- lmer(pitch ~ base + (1|Face) + (1|Participant), data=pitch.dat) KRmodcomp(pitch.me, pitch.red)

The corrected *p* value is .0001024. The result could reported as a Kenward-Roger corrected test with *F*(1, 118.5) = 16.17, *p* = .0001024. In this case the Wald *z* test would have given a p value of around .0000435. Here the effect is sufficiently large that the difference in approaches doesn’t matter – but that won’t always be true.

*The MCMC approach*

The MCMC approach (discussed in Chapter 18) can be run in several ways – with the lme4 functions or those in MCMCglmm being fairly easy to implement. Here I’ll stick with lme4 (but for more complex models MCMCglmm is likely to be better).

First you need to obtain a large number of Monte Carlo simulations from the model of interest. I’ll use 25,000 here (but I often start with 1,000 and work up to a bigger sample). Again this may take a while (about 30 or 40 seconds on my laptop).

pitch.mcmc <- mcmcsamp(pitch.me, n = 25000)

For MCMC approaches it is useful to check the estimates from the simulations. Here I’ll take a quick look at the trace plot (though a density plot is also sensible – see chapter 18).

xyplot(pitch.mcmc)

This produces the following plot (or something close to it):

The trace for the fixed effect of attractiveness looks pretty healthy – the thich black central portion indicating that it doesn’t jump around too much. Now we can look at the 95% confidence interval (strictly a Bayesian highest posterior density or HPD interval – but for present purposes it approximates to a 95% CI).

HPDinterval(pitch.mcmc)

This gives the interval estimate [0.2227276, 0.6578456]. This excludes zero so it is statistically significant (and MCMCglmm would have given an us MCMC-derived estimate of the *p* value).

*Comparison and reccomendation*

Although the Kenward-Roger approach is well-regarded, for the moment I would reccomend the MCMC approach. The pbkrtest package is still under development and I could not always get the approximation or the parametric bootstrap to work (but the parametric bootstrap can also be obtained in other ways – see Chapter 18).

The MCMC approach is also preferable in that it should generalize safely to models where the performance of the Kenward-Roger approximation is unknown (or poor) such as for discrete or ordinal outcomes. It also provides interval estimates rather than just *p* values. The main downside is that you need to familiarize yourself with some basic MCMC diagnostics (e.g., trace and density plots at the very least) and be willing to re-run the simulations to check that the interval estimates are stable.

*References *

Judd, C. M., Westfall, J., & Kenny, D. A. (2012). Treating stimuli as a random factor in social psychology: A new and comprehensive solution to a pervasive but largely ignored problem. *Journal of Personality and Social Psychology*, *103*, 54-69.

Halekoh, U., & Højsgaard, S. (2012) A Kenward-Roger approximation and parametric bootstrap methods for tests in linear mixed models – the R package pbkrtest. Submitted to Journal of Statistical Software.

Update

Ben Bolker pointed out that future versions of lme4 may well drop the MCMC functions (which are limited, at present, to fairly basic models). In the book I mainly used MCMCglmm – which is rather good at fitting fully crossed factorial models. Here is the R code for the pitch data. Using 50,000 simulations seems to give decent estimates of the attractiveness effect. Plotting the model object gives both MCMC trace plots and kernel density plots of the MCMC estimates (hit return in the console to see all the plots).

Last but not least, any one interested in the topic should keep an eye on the draft r-sig-mixed-modelling FAQ for a summary of the challenges and latest available solutions for multilevel inference in R (and other packages).

R code formatted using Pretty R at inside-R.org

Filed under: R code, serious stats, stats advice Tagged: ANOVA, Bayesian inference, fully crossed random factors, kenward-roger approximation, linear mixed models, MCMC methods, multilevel models, psychology, R, repeated measures ANOVA, significance tests, statistical inference, statistics ]]>

Even so, getting a plot looking just right for a presentation or publication often takes a lot of work using basic plotting functions. One reason for this is that constructing a good graphic is an inherently difficult enterprise, one that balances aesthetic factors and statistical factors and that requires a good understanding of who will look at the graphic, what they know, what they want to know and how they will interpret it. It can takes hours – maybe days – to get a graphic right.

In Serious Stats I focused on exploratory plots and how to use basic plotting functions to customize them. I think this was important to include, but one of my regrets was not having enough space to cover a different approach to plotting in R. This is Hadley Wickham’s ggplot2 package (inspired by Leland Wilkinson’s grammar of graphics approach).

In this blog post I’ll quickly demonstrate a few ways that ggplot2 can be used to quickly produce amazing graphics for presentations or publication. I’ll finish by mentioning some pros and cons of the approach.

The main attraction of ggplot2 for newcomers to R is the qplot() quick plot function. Like the R plot() function it will recognize certain types and combinations of R objects and produce an appropriate plot (in most cases). Unlike the basic R plots the output tends to be both functional and pretty. Thus you may be able to generate the graph you need for your talk or paper almost instantly.

A good place to start is the vanilla scatter plot. Here is the R default:

Compare it with the ggplot2 default:

Below is the R code for comparison. (The data here are from hov.csv file used in Chapter 10 Example 10.2 of Serious Stats).

R code formatted by Pretty R at inside-R.org

The ggplot2 version is (in my view) rather prettier, but a big advantage is being able to add a range of different model fits very easily. The common choice of model fit is that of a straight line (usually the least squares regression line). Doing this in ggplot2 is easier than with basic plot functions (and you also get 95% confidence bands by default).

Here is the straight line fit from a linear model:

POST

qplot(x, y, data=hov.dat, geom=c(‘point’, ‘smooth’), method=’lm’)

The geom specifies the type of plot (one with points and a smoothed line in this case) while the method specifies the model for obtaining the smoothed line. A formula can also be added (but the formula defaults to *y* as a simple linear function of *x*).

Mind you, the linear model fit has some disadvantages. Even if you are working with a related statistical model (e.g., a Pearson’s *r* or least squares simple or multiple regression) you might want to have a more data driven plot. A good choice here is to use a local regression approach such as *loess*. This lets the data speak for themselves – effectively fitting a complex curve driven by the local properties of the data. If this is reasonably linear then your audience should be able to see the quality of the straight-line fit themselves. The local regression also gives approximate 95% confidence bands. These may support informal inference without having to make strong assumptions about the model.

Here is the *loess* plot:

Here is the code for the *loess* plot:

qplot(x, y, data=hov.dat, geom=c(‘point’, ‘smooth’), method=’loess’)

I like the *loess* approach here because its fairly obvious that the linear fit does quite well. showing the straight line fit has the appearance of imposing the pattern on the data, whereas a local regression approach illustrates the pattern while allowing departures from the straight line fit to show through.

In Serious Stats I mention *loess* only in passing (as an alternative to polynomial regression). Loess is generally superior as an exploratory tool – whereas polynomial regression (particularly quadratic and cubic fits) are more useful for inference. Here is an example of a cubic polynomial fit (followed by R code):

qplot(x, y, data=hov.dat, geom=c(‘point’, ‘smooth’), method=’lm’, formula= y ~ poly(x, 2))

Also available are fits using robust linear regression or splines. Robust linear regression (see section 10.5.2 of Serious Stats for a brief introduction) changes the loss function least squares in order to reduce impact of extreme points. Sample R code (graph not shown):

library(MASS)

qplot(x, y, data=hov.dat, geom=c(‘point’, ‘smooth’), method=’rlm’)

One slight problem here is that the approximate confidence bands assume normality and thus are probably too narrow.

Splines are an alternative to *loess* that fits sections of simpler curves together. Here is a spline with three degrees of freedom:

library(splines)

qplot(x, y, data=hov.dat, geom=c(‘point’, ‘smooth’), method=’lm’, formula=y ~ ns(x, 3))

If you want to know more the best place to start is with Hadley Wickham’s book. Chapter 2 covers qplot() and is available free online.

The immediate pros of the ggplot2 approach are fairly obvious – quick, good-looking graphs. There is, however, much more to the package and there is almost no limit to what you can produce. The output of the ggplot2 functions is itself an R object that can be stored and edited to create new graphs. You can use qplot() to create many other graphs – notably kernel density plots, bar charts, box plots and histograms. You can get these by changing the geom (or by default with certain object types an input).

The cons are less obvious. First, it takes some time investment to get to grips with the grammar of graphics approach (though this is very minimal if you stick with the quick plot function). Second, you may not like the default look of the ggplot2 output (though you can tweak it fairly easily). For instance, I prefer the default kernel density and histogram plots from the R base package to the default ggplot2 ones. I like to take a bare bones plot and build it up … trying to keep visual clutter to a minimum. I also tend to want black and white images for publication (whereas I would use grey and colour images more often in presentations). This is mostly to do with personal taste.

Filed under: graphics, R code, serious stats Tagged: confidence intervals, exploratory data analysis, loess, multiple regression, polynomial regression, R, robust statistics, software, splines, statistics ]]>

https://twitter.com/ProfAndyField/status/219757478018166784/photo/1

Filed under: news, serious stats Tagged: text books ]]>

In a between-subjects – or in indeed in a within-subjects (repeated measures) – design you or your audience will not always be interested only in the differences between the means. Rarely, the main focus may even be on the individual estimates themselves. A CI for each of the individual means might be informative for several reasons.

First, it may be important to know that the interval excludes an important parameter value (e.g., zero). The example in Chapter 3 of Serious Stats involved a task in which participants had to decide which of two diagrams matched a description they had just read. Chance performance is 50% matching accuracy, so a graphical display that showed that the 95% CI for each mean excludes 50% suggests that participants in each group were performing above chance.

Second the CI for an individual mean gives you an idea of the relative precision with which that quantity is measured. This may be particularly important in an applied domain. For example, you may want to be fairly sure that performance on a task is high in some conditions as well as being sure that there are differences between conditions.

Third, the CIs for the individual means are revealing about changes in the precision between conditions. If the sample sizes are equal (or nearly equal) they are also revealing about patterns in the variances. This is because the precision of the individual means is a function of the standard error and *n*. This may be obscured when difference-adjusted CIs are plotted – though mainly for within-subjects (repeated measures) designs which have to allow for the correlation between the samples.

In any case, it may be desirable to display CIs for individual means and difference-adjusted means on the same plot. This could be accomplished in several ways but I have proposed using a two-tiered CI plot (see here for a brief summary of my BRM paper on this or see Chapter 16 of Serious stats).

A common approach (for either individual means or difference-adjusted CIs) is to adopt a pooled error term. This results in a more accurate CI if the homogeneity of variance assumption is met. For the purposes of a graphical display I would generally avoid pooled error terms (even if you use a pooled error term in your ANOVA). A graphical display of means is useful as an exploratory aid and supports informal inference. You want to be able to see any patterns in the precision (or variances) of the means. Sometimes these patterns are clear enough to be convincing without further (formal) inference or modeling. If they aren’t completely convincing it usually better to show the noisy graphic and supplement it with formal inference if necessary.

Experienced researchers understand that real data are noisy and may (indeed should!) get suspicious if data are too clean. (I’m perhaps being optimistic here – but we really ought to have more tolerance for noisy data, as this should reduce the pressure on honest researchers to ‘optimize’ their analyses – e.g., see here).

My earlier post on this blog provided functions for the single tier difference-adjusted CIs. Here is the two-tiered function (for a oneway design):

plot.bsci.tiered <- function(data.frame, group.var=1, dv.var=2, var.equal=FALSE, conf.level = 0.95, xlab = NULL, ylab = NULL, level.labels = NULL, main = NULL, pch = 19, pch.cex = 1.3, text.cex = 1.2, ylim = c(min.y, max.y), line.width= c(1.5, 1.5), tier.width=0, grid=TRUE) { data <- subset(data.frame, select=c(group.var, dv.var)) fact <- factor(data[[1]]) dv <- data[[2]] J <- nlevels(fact) ci.outer <- bsci(data.frame=data.frame , group.var=group.var, dv.var=dv.var, difference=FALSE, var.equal=var.equal, conf.level =conf.level) ci.inner <- bsci(data.frame=data.frame , group.var=group.var, dv.var=dv.var, difference=TRUE, var.equal=var.equal, conf.level =conf.level) moe.y <- max(ci.outer) - min(ci.outer) min.y <- min(ci.outer) - moe.y/3 max.y <- max(ci.outer) + moe.y/3 if (missing(xlab)) xlab <- "Groups" if (missing(ylab)) ylab <- "Confidence interval for mean" plot(0, 0, ylim = ylim, xaxt = "n", xlim = c(0.7, J + 0.3), xlab = xlab, ylab = ylab, main = main, cex.lab = text.cex) if (grid == TRUE) grid() points(ci.outer[,2], pch = pch, bg = "black", cex = pch.cex) index <- 1:J segments(index, ci.outer[, 1], index, ci.outer[, 3], lwd = line.width[1]) axis(1, index, labels = level.labels) if(tier.width==0) { segments(index - 0.025, ci.inner[, 1], index + 0.025, ci.inner[, 1], lwd = line.width[2]) segments(index - 0.025, ci.inner[, 3], index + 0.025, ci.inner[, 3], lwd = line.width[2]) } else segments(index, ci.inner[, 1], index, ci.inner[, 3], lwd = line.width[1]*(1 + abs(tier.width))) }

The following example uses the diagram data from the book:

source('http://www2.ntupsychology.net/seriousstats/SeriousStatsAllfunctions.txt') diag.dat <- read.csv('http://www2.ntupsychology.net/seriousstats/diagram.csv') plot.bsci.tiered(diag.dat, group.var=2, dv.var=4, ylab='Mean description quality', main = 'Two-tiered CIs for the Diagram data', tier.width=1)

The result is a plot that looks something like this (though I should probably have reordered the groups and labeled them):

For these data the group sizes are equal and thus the width of the outer tier reflect differences in variances between the groups. The variances are not very unequal, but neither are they particularly homogenous. The inner tier suggests group three is different from groups 2 and 4 (but not from group 1). This is a pretty decent summary of what’s going on and could be supplemented by formal inference (see Chapter 13 for a comparison of several formal approaches also using this data set).

N.B. R code formatted via Pretty R at inside-R.org

*Footnote: The aesthetics of error bar plots*

A major difference between the plot shown here and that in my BRM paper or in the book is that I have changed the method of plotting the tiers. The change is mainly aesthetic, but also reflects the desire not to emphasize the extremes of the error bar. The most plausible values of the parameter (e.g., mean) are towards the center of the interval – not at the extremes. I have discussed the reasons for my change of heart in a bit more detail elsewhere.

To this end I have also updated all my plotting functions. They still use the crossbar style from the book by default but this is controlled by a tier width argument. If tier.width=0 the crossbar style is used otherwise it used the tier.width to control the additional thickness of the difference-adjusted lines. In general, tier.width=1 seems to work well (but the crossbar style may be necessary for some unusual within-subject CIs where the difference-adjusted CI is wider than the CI for the individual means).

Filed under: R code, serious stats Tagged: ANOVA, confidence intervals, exploratory data analysis, R, R functions, repeated measures ANOVA, statistics ]]>

In my case I prefer to work with Excel spreadsheets (which are versatile and – for the most part – convenient for sharing with collaborators or students). For this reason I mostly work with comma separated variable files created in Excel an imported using read.csv(). I even quite like the fact that this method requires me to save the .xls worksheet as a .csv file (as it makes it harder to over-write the original file when I edit it for R). In know that there are many other methods that I could use, but this works fine for me.

I do however occasionally miss some of the functionality of software such as MLwiN that allows me to paste Excel data directly into it. I’ve seen instructions about how to do this on a Windows machine (e.g., see John Cook’s notes), but a while back I stumbled on a simple solution for the Mac. I’ve forgotten where I saw it (but will add a link as soon as I find it or if someone reminds me). The solution uses read.table() but is a bit fiddly and therefore best set up as a function.

paste.data <- function(header=FALSE) {read.table(pipe("pbpaste"), header=header)}

I’ve included this in my master function list so it can be loaded with other functions from the book and blog.

To use it just copy tab-delimited data (the default for copying from an Excel file or Word table) and call the function in R. The data are then imported as a data frame in R. For an empty call it assumes there is no header and adds default variable names. Adding the argument header=TRUE or just TRUE will treat the first row as variable (column) names for the data frame. Copy some data and try the following:

source('http://www2.ntupsychology.net/seriousstats/SeriousStatsAllfunctions.txt') paste.data() paste.data(header = TRUE) paste.data(TRUE) paste.data(T)

N.B. R code formatted via Pretty R at inside-R.org

UPDATE: Ken Knoblauch pointed out an older discussion of this issue in https://stat.ethz.ch/pipermail/r-help/2005-February/066257.html and also noted the read.clipboard() function in William Revelle’s excellent psych package (which works on both PC and Mac systems).

Filed under: R code Tagged: Mac, R, R functions, software ]]>

Books > Health, Family & Lifestyle > Psychology & Psychiatry > Methodology > **Statistics**

As of writing the rank has dropped to #3 (but I’m still quite excited – even though I know this may not imply large numbers of pre-orders).

They have also increased the discount on pre-orders to 36%. The book should also be available for pre-order in other countries (though I’ve only checked the US store), but for some reason the discount is not as generous there.

If you can’t get hold of the item in your country, bookdepository.com does free worldwide delivery (to most countries as far as I can tell). I’ve used them to ship gifts to friends and family overseas and they seem pretty reliable (and also offer pretty good discounts).

Filed under: news, serious stats Tagged: amazon, book depository, sales rank, text books ]]>

When it came to proof read R 2.13 was already out and therefore most of the examples were also checked with version, but I stuck with R 2.12 on my home and work machines until last week.

In general I don’t see the point of updating to a new version number if everything is working fine. One advantage of this approach is that the version I install will usually have bugs from the initial release already ironed out. That said, new versions of R have (in my experience) been very stable.

I tend to download the version only when I fall several versions behind or if it is a requirement for a new package or package version. On this occasion it turned out that the latest version of the ordinal package (for fitting ordered logistic regression and multilevel ordered logistic regression models). There are two main drawbacks with updating. The first is reinstalling all your favourite package libraries (and generally getting it set up how you like it). The second is dealing with changes in the way R behaves.

For re-installing all my packages I use a very crude system. For any given platform (Mac OS, Windows or Linux) there are cleverer solutions (that you can find via google). My solution works across cross-platform and is fairly robust, if inelegant. I simply keep an R script with a number of install.packages() commands such as:

install.packages(‘lme4’, ‘exactci’, ‘pwr’, ‘arm’)

I run these in batches after installing the new R version. I find this useful because I’m forever installing R on different machines (so far Mac OS or Windows) at work (e.g., for teaching or if working away from the office or on a borrowed machine). I can also comment the file (e.g., to note if there are issues with any of the packages under a particular version of R). This usually suffices for me as I usually run a ‘vanilla’ set-up without customization. It would be more efficient for me to customize my set-up, but for teaching purposes I find it helps not to do that. Likewise, I tend to work with a clean workspace (and use a script file to save R code that creates my workspaces). I should stress that this isn’t advice – and I would work differently myself if I didn’t use R so much for teaching.

One of the first things that happened after installing R 2.15 was that some of my own functions started producing warnings. R warnings can be pretty scary for new users but are generally benign. Some of them are there to detect behaviour associated with common R errors or common statistical errors (and thus give you a chance to check your work). Others alert you to non-standard behaviour from a function in R (e.g., changing the procedure it uses when sample sizes are small). Yet others offer tips on writing better R code. Only very rarely are they an indication that something has gone badly wrong.

Thus most R warnings are slightly annoying but potentially useful. In my case R 2.15 disliked a number of my functions of the form:

mean(data.frame)

The precise warning was:

Warning message:

mean() is deprecated.

Use colMeans() or sapply(*, mean) instead.

All the functions worked just fine, but (after my initial irritation had receded) I realize that colMeans() is a much better function. It is more efficient but, even better, it is obvious that it calculates the means of the columns of a data frame or matrix. With the more general mean() function it is not immediately obvious what will happen when called with a data frame as an argument. It is also trivial to infer that rowMeans() calculates the row means.

I have now re-written a number of functions to deal with this problem and to make a few other minor changes. The latest version of my functions can be loaded with the call:

source('http://www2.ntupsychology.net/seriousstats/SeriousStatsAllfunctions.txt')

I will try and keep this file up-to-date with recent versions of R and correct any bugs as they are detected.

The functions can be downloaded as a text file from:

http://sites.google.com/site/seriousstats2012/home/

Filed under: news, R code, serious stats Tagged: R, software ]]>