Mixed Effects Models and Extensions in Ecology with R
Zuur, A.F., Ieno, E.N., Walker, N., Saveliev, A.A., Smith, G.M.
Springer, 2009
Somewhere along the line you probably realized that your undergraduate statistics classes didn’t quite cover the breadth of topics you’d end up needing for dealing with your data. The time constraints of a typical quarter or semester-long biostats class often leave you only scratching the surface of all of the issues you need to consider when analyzing a typical ecological data set. That’s where a book like Zuur et al’s Mixed Effects Models and Extensions in Ecology with R can be supremely useful.
If you’re anything like the prototypical ecologist, your experiments often involve things like blocked designs to try to account for spatial heterogeneity in your field sites, or you’re using a split-plot design to economize on the available resources. Maybe you collect data at several nested spatial scales (Region>Site>Plot>Subplot), or you collect repeated measurements from individuals or plots over time, so that your samples may not be strictly independent. Perhaps you’ve categorized some of your data by a set of factors that are fixed and informative (you are interested in the response at those specific treatment levels) and some of the data can be treated as random and (perhaps) less informative. And maybe the response variable you measured isn’t a nice continuous value, but rather is a count of individuals, or a percentage value, or a series of 0’s and 1’s (dead or alive), so that the normality assumptions of the basic ANOVA or linear regression aren’t upheld.
If you’re one of those ecologists, the authors have written this book just for you! Mixed Effects Models and Extensions in Ecology with R (what a mouthful) does a great job of hand-holding the reader as they build up from the basics of a fixed-effect linear model (linear regression, ANOVA, ANCOVA) framework through the numerous model variations available in the R statistical language. The book is full of R code so that you can follow along on your computer, replicating the analyses as you work through the book. You’ll begin with basic graphical data exploration, and look at the various assumptions of a simple linear model. Following that, the bulk of the book is focused on mixed effects modeling (i.e. using fixed and random effects in the same linear model), both for normally distributed data and for models with different underlying distributions (a.k.a. “generalized” linear models).
There are sections on dealing with heterogeneity in the data, primarily using the many variance structures available in the nlme package of R. You’ll learn how to deal with temporal correlation of samples. You’ll see how to fit models using Poisson, Binomial, Bernoulli, Negative Binomial, and Gamma distributions, and how to deal with zero-inflated and zero-truncated count data. The book also deals with generalized additive models and generalized estimation equations, and finishes with a chapter on Markov-Chain Monte Carlo methods (though there are others books that delve into that particular topic in much more detail).
Throughout the book you are exposed to the R code used to analyze the numerous example datasets that are provided by the authors. The real strength of this book is amount of explanation the authors provide on why they approach a problem in a specific manner, and what the output from R is actually telling you. This is coupled with nice vignettes on the data sets being used in each chapter, and a brief discussion of how to draw conclusions from the R output and formulate a results/discussion section for the data in a paper. The wide range of datasets provides plenty of examples of the common “problematic” data types you’ll run into (all those issues listed above) and how to attack them.
Mixed Effects Models and Extensions in Ecology with R will be a welcome alternative to the book that is considered the “standard text” in this field, Pinheiro and Bates Mixed Effects Models in S and S-PLUS if you’re not comfortable with math and advanced statistical concepts. This is not meant to denigrate Pinheiro and Bates in the least, but Zuur et al.’s book is much more approachable for the typical biologist who has had one or two biostats courses and perhaps isn’t quite up to speed on matrix manipulation. Zuur et al. have not completely masked the underlying complexity of the math involved in these models, but they make an admirable effort to ease the reader into it and defer to more advanced texts (like Pinheiro and Bates) if the reader wishes to know more.
If there is one aspect of the book that may frustrate readers, it is that the authors often (rightly) avoid framing the data output in terms of standard ANOVA tables and p-values. For readers focused on just getting some p-values to plug into an ANOVA table for a manuscript they’re writing, the discussion of AIC values, model simplification, and coefficient values (as a bridge to effect sizes) might be frustrating, but that’s all part of an ongoing discussion in the broader community about just what sorts of results are actually useful to report in a paper (hint: the answer probably isn’t p-values).
Overall, I highly recommend this book if you find yourself faced with these various types of data, and if an advisor or internet search has pointed you towards using R packages like nlme,lme4, or mgcv to deal with your data.
As a final note, although Mixed Effects Models and Extensions in Ecology with R is not meant as a primary introduction to using R, it does a good job of walking you through the basics of manipulating data, plotting, and fitting models with R. This is no doubt a result of some of the authors’ previous work on the excellent A Beginner’s Guide to R, discussed here. As an added bonus, Mixed Effects Models and Extensions in Ecology with R has a far more attractive cover design.