Book Review: Practical Computing For Biologists


Practical Computing for Biologists
Steven H. D. Haddock and Casey W. Dunn
Sinauer, 2010

Practical Computing for Biologists is an ambitious book that primarily aims to demystify many of the problems that most biologists will hit at some point in their research that can be solved with a bit of simple programming. The book does an admirable job of achieving this goal by exposing the reader to a number of freely available programs and techniques for dealing with text files and other data types. Much of the book (Parts 1-4) is spent introducing the reader to manipulating text files, command line scripts, and Python programming, and it assumes little to no familiarity with these tools prior to opening the book. The text provides plenty of hand-holding to get you through the initial steps of setting up a text editor, navigating the command line, and dipping into Python.

The techniques described in the book are designed to be platform-agnostic, though the examples are focused on the Mac OSX platform. Some of the commands and syntax will vary on other platforms, particularly Windows, but the included appendices and margin notes provide adequate detail on the differences where they crop up. Perhaps the most important aspect of Parts 1-4 is time spent dissecting the structure of text files, which crop up everywhere in biology, whether you are downloading data from a website, parsing sequence data sent to you by a colleague, or recording data from an instrument in the lab. Though the book doesn’t attempt to be a cookbook for analyzing the infinite types of data files in daily use in the lab, it gives the reader enough information, and outlines the free tools used, to start poking through their own data files. In addition, Practical Computing for Biologists introduces relational databases such as MySQL, so that you can consider transitioning to a more flexible and efficient data storage system.

Additionally, if your collaborators (or heaven forbid, you) consider PowerPoint to be a viable medium for designing publication-quality graphics, then Part 5 of the book, dealing with graphics, should be required reading. The three chapters in this section could form the basis for a half-day workshop appropriate for anyone attempting to wade into the realm of making figures, and should convince you that PowerPoint is only appropriate for what it was originally designed for (making presentations and posters). Even current day graduate students, who ostensibly grew up on the internet, are often surprisingly clueless about the differences between vector and raster graphics, resolution, color spaces, and how these various terms relate to their mission of producing usable figures for a publication. The typical graduate biology curriculum doesn’t include any sort of practical training in this area, and Practical Computing for Biologists serves as a useful reference for what issues to consider when generating and refining graphics.

Part 6 of the book briefly covers a few additional technical considerations, such as setting up remote access to your lab computer (or alternately accessing remote computers), managing software, and a brief introduction to interfacing your computer with the physical world through parallel or serial communications and microcontrollers such as the Arduino platform. The physical computing section by itself won’t make you an instant expert on designing your own apparatus, but you will be exposed to some of the current methods for talking to sensors and equipment in the lab and in the field.

So who is this book for? There are several sorts of biologists that might find this book useful. If you’ve been bumping up against tedious tasks that require repetitive point and click actions to process your data, whether in Excel or another program, this book might show you how to speed up those actions with a script. If you’ve been try to take advantage of the many data repositories out there on the web (weather and ocean data, for example) and are finding yourself copying and pasting from webpages, this book has many tips for accessing web resources and the parsing the resulting text (or XML or .csv, etc.) files to get at the useful data. If you find yourself having to apply the same transformation to multiple data files from an instrument, you can probably use tricks in this book to write a script to automatically process your files. If you come into possession of massive genome files that you need to pull info from, this book might give you the initial tools to springboard you into a better understanding of programming techniques to process your files. Note that these examples don’t include specific fields like “oceanographer” or “ornithologist”, since these types of tasks pop up in every field of biology. Ultimately, you simply have to find yourself in a position to recognize that you’ve been wasting time on some repetitive, menial task that a computer could do automatically, and you have to be willing to overcome any initial intimidation you might feel about straying from the comforting womb of the icons on the dock of your desktop. Practical Computing for Biologists will hold your hand long enough to make you aware of the many tools built into your computer, and give you an introduction to the relevant terms and concepts for a program so that you can delve further into the resources available on the web. If you are adventurous enough want to try your hand at some programming, this book might be right for you. It won’t be the only programming/computer book you ever buy, but it makes a great introduction for any neophyte. There might be several topics in this book that will make you say “Oh, I didn’t know you could do that so easily!” and consequently save you hours or days of work far into the future. The associated website (http://practicalcomputing.org/) aims to become a community resource for readers of the book and users of the software.

For those who are wondering, the book focuses primarily on the bash shell, Python, TextWrangler, and MySQL. There is some discussion of ImageMagick, Inkscape, and the commercial products Adobe Illustrator and Adobe Photoshop in the graphics section. You’ll also find an introduction to ssh, sftp, makefiles, and package managers. What you won’t find is any in depth discussion of statistics packages (R, SAS, SPSS etc), common commercial programming/analysis programs (Matlab, Mathematica), typesetting systems (LaTeX), spreadsheets (Excel), presentation software (PowerPoint), word processors (Word), citation managers (EndNote), or your favorite phylogenetics program. Some of these programs merit brief mentions in the text, but Practical Computing for Biologists doesn’t make any attempt to teach you those programs, as it would be overly ambitious. At over 500 pages in length, they’ve already shoved a ton of information into this book.