NOAA OISST v2 High Resolution daily sea surface temperatures with R
Update, 2015-11-30 It appears that NOAA has gone through and upgraded all of the OISST files to the newer version of the NetCDF file format. As a result, the functions outlined in this post don’t work any longer. Instead, see the updated functions in my newer post, http://lukemiller.org/index.php/2014/11/extracting-noaa-sea-surface-temperatures-with-ncdf4/. The concepts are the same as described here, but the newer functions use the ncdf4 package to access the newer NetCDF file format.
The National Ocean and Atmospheric Administration generates freely-available world-wide estimates of mean daily sea surface temperature, and has been doing so back to 1981. The data are on a 0.25 x 0.25 degree grid, and provide an interpolated estimate of the sea surface temperature for each day of the year, based on a mix of satellite data and in situ sampling stations on ships and buoys. In a post a while back I outlined how to open and extract data from the lower-resolution weekly SST files that are provided on a 1×1 degree grid. This post outlines functions to open and extract data from the higher resolution daily files using R and the ncdf package written by David Pierce, and the plotting function utilizes a function from the fields package written by Dan Kelley.
The NOAA OISST v2 high resolution data are packaged into NetCDF files, and they come in two main versions, a yearly file that is around 740MB (the current year’s file is smaller), and 1-day files that are around 1MB each. The NetCDF files for these two versions are structured somewhat differently, so I’ve written two different functions to deal with the files. The functions can be loaded into your R session by downloading the script NOAA_OISST_function.R from my GitHub repository, and calling
to load the functions into memory.
You’ll also need the ncdf and fields packages, which can be installed as follows:
The homepage for the yearly datafiles is http://www.esrl.noaa.gov/psd/data/gridded/data.noaa.oisst.v2.highres.html. There you will find a description of the available data, and links to the FTP site ftp://ftp.cdc.noaa.gov/Datasets/noaa.oisst.v2.highres/. The files for mean daily SST are titled in the format
sst.day.mean.YEAR.v2.nc. These files are uncompressed and ready to use.
The homepage for the 1-day data files is http://www.ncdc.noaa.gov/sst/griddata.php, which leads you to the FTP site at ftp://eclipse.ncdc.noaa.gov/pub/OI-daily-v2/NetCDF/ where you’ll find sub-directories labeled AVHRR and AVHRR-ASMR. The AVHRR-ASMR files are no longer being produced, but you will find up to date files in the AVHRR subdirectories (these acronyms refer to the satellite instruments used in the data gathering). It’s important to note that the 1-day files come in a compressed .gz format, and you must first uncompress them with a separate software tool (like 7zip for Windows) before you try to use the function.
For both functions to work, you also need one last file, the land-sea mask file. It can be directly downloaded from here: ftp://ftp.cdc.noaa.gov/Datasets/noaa.oisst.v2.highres/lsmask.oisst.v2.nc.
The two main functions I’ve written are
extractOISST1day. The former function only works with the yearly NetCDF files, and the latter function only works with the 1-day NetCDF files.
Both functions take several arguments:
test = extractOISST(fname, lsmask, lon1, lon2, lat1, lat2, date1, date2) test2 = extractOISST1day(fname, lsmask, lon1, lon2, lat1, lat2)
fname is the full path to the .nc data file.
lsmask is the full path to the land-sea mask file
lon1, lon2 are the two longitudes that bound the area you want to extract data for. They are given in degrees EAST of the prime meridian, so they range from 0.125 to 359.875. When you enter the arguments,
lon1 MUST be smaller (further west) than
lon2. If you only want data for one longitude, you only need to enter a value for
lat1, lat2 are the two latitudes that bound the area you want to extract data for. They are given in degrees NORTH, so that latitudes in the southern hemisphere all have negative values. The value for
lat1 must be less (further south) than the value of
lat2. If you only want data for one latitude, you only need to enter a value for
So for example, to get data near Lima, Peru,
lon1 would be 282.7, and
lat1 would be -12.1.
When using the yearly data file function,
date1, date2 should be provided as R Date objects or as character strings in the format
"2013-12-29". If only
date1 is provided, the output will only contain 1 day’s data, otherwise it will contain data for each day between the first and last dates provided (if they’re in the file).
An example use:
fname = "W:/NOAA_OISST/daily/sst.day.mean.2013.v2.nc" lsmask = "W:/NOAA_OISST/daily/lsmask.oisst.v2.nc" sstout = extractOISST(fname,lsmask,lon1 = 230, lon2 = 250, lat1 = 20, lat2 = 40, date1 = "2013-01-01",date2 = "2013-01-30")
The output of the yearly file function is a 3-dimensional array, with the temperature data (degrees C) arranged so that latitudes are in rows, longitudes are in columns, and data from each grid cell on different days comprise the 3rd dimension. Missing data, including areas over land, are entered as
NA. The 1st entry,
sstout[1,1,1], is the northernmost latitude and the westernmost longitude on the 1st date. Moving to the 2nd row of the array would put you at 0.25 degrees further south, and moving to the 2nd column of the array would put you 0.25 degrees further east. To see what latitude and longitude and date correspond to each location in the grid, you
can extract the info from the
dimnames() function. For example:
lats = as.numeric(dimnames(sstout)$Lat) longs = as.numeric(dimnames(sstout)$Long) dates = as.Date(dimnames(sstout)$Date)
The values that you get for latitude and longitude are the closest matches to your input values, and represent the center of each grid cell.
I’ve also included a little function to plot the data, although there are certainly better plotting functions in packages like
fields. I just provide this to check that you’re getting data from the region of the world you were hoping for. In the example here, I asked for data covering the west coast of North America including part of Baja California. The upper part of the Gulf of California should appear when the data are plotted.
The 1-day function works in the same manner, but does not require any date arguments, because there is only 1 date in each file. The output from the 1-day function is a 2-dimensional matrix, with latitudes in rows and longitudes in columns (as above). The first entry [1,1], is the northernmost, westernmost grid cell in the range of lat/long values you asked for. Moving down one row is 0.25 degrees further south, and moving to the right 1 column is 0.25 degrees further east.
One issue I’ve wrestled with is how best to format the output arrays. When the data are first extracted inside the function, longitudes increase as you move down rows, and latitudes increase as you move right through columns, which might be counterintuitive to you. As a result, in the functions I flip the arrays around so that latitudes and longitudes are arrayed as you might expect, with northern values at the top, western values at the left. But this might not play nice with many plotting functions, so I’m open to suggestions about the “best” way to orient the output arrays.
To cite your use of the data, NOAA suggests the following:
“NOAA High Resolution SST data provided by the NOAA/OAR/ESRL PSD, Boulder, Colorado, USA, from their Web site at http://www.esrl.noaa.gov/psd/”
or citing it as:
Reynolds, Richard W., Thomas M. Smith, Chunying Liu, Dudley B. Chelton, Kenneth S. Casey, Michael G. Schlax, 2007: Daily High-Resolution-Blended Analyses for Sea Surface Temperature. J. Climate, 20, 5473-5496. Reynolds, Richard W., Thomas M. Smith, Chunying Liu, Dudley B. Chelton, Kenneth S. Casey, Michael G. Schlax, 2007: Daily High-Resolution-Blended Analyses for Sea Surface Temperature. J. Climate, 20, 5473-5496.