Book Review: The R Book, Second Edition (2013)

The first edition of The R Book by Michael J. Crawley was an ambitious work, but managed to be slightly rubbish due to the atrocious typographical layout of the original book. The good news is that the new 2nd edition, released in 2013, has a substantially improved layout that makes the book far more useful as a general reference. This is important, since the book is meant as an accessible reference for non-statisticians to many of the powerful data manipulation and statistical techniques available in R, particularly for biologists and researchers in similar fields. With this new edition you’re no longer forced to flounder through the pages one by one searching for the snippet of content that will help you carry out some test. Up to now, the only way to make the 1st edition marginally useful was to seek out one of the pdf versions floating around the nether regions of the web so that you could use the search function to find content. I really wanted the original book to be more usable, because it’s chock full of all sorts of helpful information about reading data, reorganizing data, running statistical tests, interpreting R’s output, and making (basic) figures. Now that Wiley and Sons has presumably fired the original editor and layout people responsible for the 1st edition due to their massive incompetence, the 2nd edition allows The R Book to actually achieve that goal of being a handy reference. It might finally be one of the first books you pluck off your shelf when you need to figure out how to do something with R.

A major challenge with the first edition of The R Book was finding the section you needed. The 1st edition had a table of contents that was 1 page long (seriously?), while the 2nd edition now has 15 additional pages of Detailed Contents with all of the subsections laid out so that you have some chance of quickly finding the section covering the operations you want to carry out. The index at the end has been expanded by another 10 pages, and has been worked over to include what seem to me like more useful descriptors and references to commonly used functions. Including little things like a reference to abline(0,1) for adding a 45 degree line to a plot illustrate the improved level of thought put into this revision (plus those quotes around the h and v in the 1st edition weren’t really necessary).

1st edition index
1st edition index
2nd edition index.
2nd edition index.

In the main body of the book, the most important and visually pleasing change is the use of fixed-width fonts for the snippets of code the user is meant to type in at the R command prompt. The original version used proportional width fonts and had no spaces between operators for the user-typed code, making the code hard to read. This was always a puzzling choice, since the printed output from R displayed in the book did use a fixed-width font. The 2nd edition now has those spaces inserted so that the user-typed code is far more readable, particularly for neophytes. In addition, the 2nd edition now uses colors to differentiate the user-typed commands and the R-generated output. User-typed commands are shown in red, and R’s responses are shown in blue.

1st edition user-typed code and R-generated output.
1st edition user-typed code and R-generated output.
The 2nd edition's improved typesetting and coloring.
The 2nd edition’s improved typography and coloring. As a biologist, I’m strangely comforted by examples that use things like crop yields, clay, and sand, instead of the “income” or “widgets” found in most statistics texts.

The images above do illustrate one continuing gripe I have about The R Book. All of the example data sets in the book are meant to be loaded from your local hard drive by providing a file path, after having downloaded them from the author’s website. Many other R books wrap their data sets up into a R package that can be downloaded from CRAN and easily loaded into a R session with the library( ) function. The R Book asks you to first download all of the files, and then manually type out their location when you want to load them, i.e. read.table(“c:\\temp\\yields.txt”,header=T). I suppose this has some utility as a teaching tool since it forces the reader to get used to the steps necessary to load their own local data files, but it still drives me batty when I just want to crank through an example from the book and I’m forced to go find where I buried my directory of example data. In what should be taken as damning with faint praise, I will give Crawley credit for at least including a link to the location of the example data files in the 2nd edition of the book. In the original edition, no mention was made of where to find these files, and you’ll find web searches peppered with people asking where to find the data files. The files were always available for the 1st edition, the book just failed to mention that fact. There are several other remaining nits to pick left over from the 1st edition, things like calling “packages” libraries and so forth, but I don’t think any of them make the book unusable. The 2nd edition also adds color to many of the figures. In most cases this probably isn’t a huge advantage over the old black and white figures, but it does illustrate the usage of color options during the plotting phase.

The vast majority of the text in the 2nd edition has remained unchanged from the 1st edition. Some of the material has been reorganized and expanded somewhat, particularly in the early chapter on the essentials of the R language. Two entirely new chapters on meta-analysis and Bayesian methods have been added as well. As with the 1st edition of The R Book, none of the chapters are meant to be exhaustive accounts of all of the background, analyses, and outputs available for a particular statistical method. The strength of this book is that a colleague can tell you “try analyzing those data using survival analysis”, and you, knowing nothing about survival analysis, can open this book up to get a quick primer of the basics of survival analysis. If you’ve forgotten how to do a Chi-squared test, or you’ve never learned how to do it in R, this book can help you. Ultimately you should seek out a real statistics textbook or a specialized R book if you’re getting hot and heavy with some particular analysis technique, but The R Book is a nice resource for getting you started off in the right direction.

Should you buy The R Book? If you already own the 1st edition of The R Book and have gotten used to dealing with its faults, I’m not sure there’s a compelling reason to spend the money on the 2nd edition. Prior to the release of the 2nd edition, I would only grudgingly recommend The R Book, and then only if the user had a decent grasp on R already. But now, if you’re in the market for a general-purpose and fairly comprehensive reference for how to do all sorts of things with R, and even if you don’t have a great grasp on how to use R, I think The R Book is a decent choice. It holds your hand through the basic operations and the beginnings of more advanced statistical techniques, and gives you enough knowledge to then jump into a more technical treatment available in other more advanced books.  As with anything R-related, the internet will help fill in the gaps and provide alternate examples (http://www.rseek.org is your friend), but The R Book makes a good starting point.