{"id":1957,"date":"2015-04-22T16:28:36","date_gmt":"2015-04-22T23:28:36","guid":{"rendered":"http:\/\/lukemiller.org\/?p=1957"},"modified":"2015-04-22T17:36:44","modified_gmt":"2015-04-23T00:36:44","slug":"a-plot-of-co-authorships-in-my-little-corner-of-science","status":"publish","type":"post","link":"https:\/\/lukemiller.org\/index.php\/2015\/04\/a-plot-of-co-authorships-in-my-little-corner-of-science\/","title":{"rendered":"A plot of co-authorships in my little corner of science"},"content":{"rendered":"<p><img decoding=\"async\" src=\"http:\/\/www.lukemiller.org\/general_images\/author-year-count.svg\" alt=\"author year count image\" width=\"500\" \/>\n<\/p>\n<p>&nbsp;<br \/>\nHere&#8217;s a mostly useless visualization of the collection of journal articles that sits in my reference database in Endnote. I deal mostly in marine biology, physiology, biomechanics, and climate change papers, with a few molecular\/genetics papers thrown in here and there. The database has 3325 entries, 2 of which have ambiguous publication years and aren&#8217;t represented above. This is by no means an exhaustive survey of the literature in my field, it&#8217;s just an exhaustive survey of the literature on my computer.<\/p>\n<p>To make this figure, I first had Endnote export the database to a text file using an output style that simply listed the publication year and the authors, each separated by commas:<\/p>\n<pre>2009, L. W. Aarssen, C. J. Lortie, A. E. Budden, J. Koricheva, R. Leimu, T. Tregenza\r\n1956, D. P. Abbott\r\n1968, D. P. Abbott, D. Epel, J. H. Phillips, I. A. Abbott\r\n1983, A. H. Abdel-Rehim\r\n1983, A. H. Abdel-Rehim\r\n1988, A. H. Abdel-Rehim\r\n2003, M. A. Abdelrhman\r\n2007, M. A. Abdelrhman\r\n1997, A. Abelson, M. Denny\r\n.\r\n.\r\n.\r\n<\/pre>\n<figure id=\"attachment_1965\" aria-describedby=\"caption-attachment-1965\" style=\"width: 150px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/lukemiller.org\/wp-content\/uploads\/2015\/04\/endnote_style1.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lukemiller.org\/wp-content\/uploads\/2015\/04\/endnote_style1-150x150.png\" alt=\"In Endnote, the bibliography output style is set to show only the Year and Author, using a single Generic style that will get applied to all reference types. \" width=\"150\" height=\"150\" class=\"size-thumbnail wp-image-1965\" \/><\/a><figcaption id=\"caption-attachment-1965\" class=\"wp-caption-text\">In Endnote, the bibliography output style is set to show only the Year and Author, using a single Generic style that will get applied to all reference types.<\/figcaption><\/figure>\n<p>The R script below then parses that text file to pull out the year of each publication and the number of authors for each publication. The number of authors ranged between 1 and 121 (that 121 author paper is purposely cut off in the figure). I then tallied the number of papers that fell into each combination of publication year and number of co-authors. That count is then translated into one of the color values represented in the color scalebar. The colorbar is a half-hearted attempt to map the range of count values (1 to 69 in this case) onto a range of colors that is perceived as fairly linear by the human eye, based on the recommendations at <a href=\"http:\/\/earthobservatory.nasa.gov\/blogs\/elegantfigures\/2013\/08\/05\/subtleties-of-color-part-1-of-6\/\" target=\"_blank\">http:\/\/earthobservatory.nasa.gov\/blogs\/elegantfigures\/2013\/08\/05\/subtleties-of-color-part-1-of-6\/<\/a> and <a href=\"http:\/\/colorbrewer2.org\/\" target=\"_blank\">ColorBrewer2.org<\/a>. The function for plotting the colorbar was derived <a href=\"http:\/\/stackoverflow.com\/questions\/9314658\/colorbar-from-custom-colorramppalette\" target=\"_blank\">from this posting.<\/a><\/p>\n<p>Unsurprisingly, to me at least, 1, 2, and 3 author papers are the most numerous, and I clearly have pulled more papers from the late 90&#8217;s and 2000&#8217;s than I have from the earlier literature. Papers with more than 9 or 10 co-authors are fairly rare in my collection, with most of those levels being represented by just one or a few papers. <\/p>\n<p>The R script <code>author_year_plot.R<\/code> and the associated text data file <code>authors_list_20150422.txt<\/code> are in my <a href=\"https:\/\/github.com\/millerlp\/Misc_R_scripts\" target=\"_blank\">GitHub repository<\/a> if you&#8217;re bored enough to want to try your hand at recreating the figure. <\/p>\n<p>&nbsp;<\/p>\n<pre lang=\"R\" colla=\"+\">\r\n# author_year_plot.R\r\n# \r\n# Author: Luke Miller 2015-04-22\r\n###############################################################################\r\n\r\n\r\n###############################################################################\r\n# Export a text file from Endnote that only lists Year and Authors, all \r\n# separated by commas. To do this, create an Output Style\r\n# that lists the year followed by a comma and then each author separated by\r\n# a comma. Select all references, then go to File>Export. In the window that\r\n# opens, you'll see a menu for output style, choose your author-only version \r\n# there and save the output file as text file. \r\nf1 = 'authors_list_20150422.txt'\r\n#\r\n## Scan input file, divide each line into a separate entry in a character vector\r\nauthors = scan(file = f1, what = character(), sep = '\\n') \r\n#\r\nyr = character()\r\n# Extract year from each record. \r\nfor (i in 1:length(authors)){\r\n\tyr[i] = substr(authors[i],regexpr('[1-2]',authors[i])[[1]],\r\n\t\t\tregexpr(',',authors[i])[[1]] - 1)\r\n}\r\nyr = as.numeric(yr) # Convert to numbers\r\n# Entries with missing or ambiguous years (anything with multiple years listed\r\n# like 1997-2013) will end up as NA's in the yr vector, and will generate a \r\n# warning.\r\n\r\ncnt = numeric(length(yr)) # Create empty vector\r\n# To count the number of authors on a paper, simply count the number of \r\n# commas in each line of the authors vector. There is always one comma after \r\n# the year, denoting at least one author, and every additional comma means there\r\n# is another author. \r\nfor (i in 1:length(authors)){\r\n\tcnt[i] = length(gregexpr(',',authors[i])[[1]])\r\n}\r\n# Pick out rows that don't have a useful year value\r\nbad.entries = which(is.na(yr))\r\n# Remove the offending rows from the yr and cnt vectors\r\nyr = yr[-(bad.entries)]\r\ncnt = cnt[-(bad.entries)]\r\n# Make a data frame out of the yr and cnt vectors\r\ndf = data.frame(Year = yr, Count = cnt)\r\n\r\n# Make a new dataframe that holds each combination of Year and Count\r\nnewdf = expand.grid(Years = unique(yr), Count = unique(cnt))\r\n# Make a new column to hold a tally of the number of papers for each Year and\r\n# author Count combination. \r\nnewdf$TotalPapers = NA\r\n\r\n# Go through the combinations of years and counts to tally the number of papers\r\n# that match that combo in the 'df' dataframe\r\nfor (i in 1:nrow(newdf)){\r\n\t# Put the tally of number of papers matching each Year & Count combo in the\r\n\t# TotalPapers column\r\n\tnewdf$TotalPapers[i] = nrow(df[df$Year == newdf$Year[i] & \r\n\t\t\t\t\t\t\tdf$Count == newdf$Count[i],])\r\n}\r\n\r\n# Drop any combinations where the TotalPapers was 0\r\nnewdf = newdf[-(which(newdf$TotalPapers == 0)),]\r\n\r\n#########################################################\r\n#########################################################\r\n# Create a function to plot a color scale bar on the existing plot using the\r\n# vector of colors that will be generated later by the colorRampPalette function\r\ncolor.bar <- function(lut, min, max=-min, nticks=11, \r\n\t\tx1 = 1, x2 = 2, y1 = 1, y2 = 2, \r\n\t\tticks=seq(min,max, length=nticks), round = TRUE, title = '',\r\n\t\tcex.title = 1, text.col = 'black', horiz = FALSE){\r\n\t# lut = a vector of color values, in hex format\r\n\t# min = minimum value represented by the first color\r\n\t# max = maximum value represented by the last color\r\n\t# nticks = number of tick marks on the colorbar\r\n\t# x1 = location of left edge of colorbar, in plot's x-units\r\n\t# x2 = location of right edge of colorbar, in plot's x-units\r\n\t# y1 = location of bottom edge of color bar, in plot's y-units\r\n\t# y2 = location of top edge of color bar, in plot's y-units\r\n\t# ticks = a sequence of tick mark value to be added to colorbar\r\n\t# round = TRUE or FALSE, round off tick values to 0 decimal place.\r\n\t# title = Title for colorbar\r\n\t# cex.title = size for title\r\n\t# text.col = color of tick marks, title, and border of colorbar\r\n\t# horiz = TRUE or FALSE, lay out color bar vertically or horizontally\r\n\t\r\n\t# Calculate a scaling factor based on the number of entries in the \r\n\t# look-up-table and the absolute distance between y2 and y1 on the plot\r\n\tif (horiz == FALSE){\r\n\t\tscale = (length(lut)-1)\/(y2-y1)\t\r\n\t} else if (horiz == TRUE){\r\n\t\t# For horizontal bars, use the distance between x2 and x1 instead\r\n\t\tscale = (length(lut)-1)\/(x2-x1)\r\n\t}\r\n\t# Round off the tick marks if desired\r\n\tif (round) { ticks = round(ticks,0) }\r\n\t# Draw little thin rectangles for each color in the look up table. The\r\n\t# rectangles will span the distance between x1 and x2 on the plot's \r\n\t# coordinates, and have a y-axis height scaled to fit all of the colors\r\n\t# between y1 and y2 on the plot's coordinates. Each color will only be a\r\n\t# small fraction of that overall height, using the scale factor. For a \r\n\t# horizontal-oriented bar the thin rectangles will run between y1 and y2,\r\n\t# scaled to fit all of the colors between x1 and x2. \r\n\tfor (i in 1:(length(lut)-1)) {\r\n\t\tif (horiz == FALSE) {\r\n\t\t\t# Calculate myy, the lower y-location of a rectangle\r\n\t\t\tmyy = (i-1)\/scale + y1\r\n\t\t\t# Calculate the upper y value as y+(1\/scale), and draw the rectangle\r\n\t\t\trect(x1,myy,x2,myy+(1\/scale), col=lut[i], border=NA)\r\n\t\t} else if (horiz == TRUE) {\r\n\t\t\t# Calculate x, the left x-location of a rectangle\r\n\t\t\tmyx = (i-1)\/scale + x1\r\n\t\t\t# Calculate the right x value as x+(1\/scale), and draw the rectangle\r\n\t\t\trect(myx,y1,myx+(1\/scale),y2, col=lut[i], border=NA)\r\n\t\t}\r\n\t}\r\n\t# Draw a border around the color bar\r\n\trect(x1,y1,x2,y2, col = NULL, border = text.col)\r\n\t# Draw tick marks and tick labels\r\n\tfor (i in 1:length(ticks)){\r\n\t\tif (horiz == FALSE) {\r\n\t\t\tmyy = (ticks[i]-1)\/scale + y1\r\n\t\t\t# This is an attempt to set the tick mark and labels just off to the \r\n\t\t\t# right side of the color bar without having them take up too much \r\n\t\t\t# of the plot area. The x locations are calculated as x2 plus a \r\n\t\t\t# fraction of the width of the rectangle.\r\n\t\t\tmyx2 = x2 + ((x2-x1)*0.1)\r\n\t\t\tmyx3 = x2 + ((x2-x1)*0.13)\r\n\t\t\t# Draw little tick marks\r\n\t\t\tlines(x = c(x2,myx2), y = c(myy,myy), col = text.col)\r\n\t\t\t# Draw tick labels\r\n\t\t\ttext(x = myx3, y = myy, labels = ticks[i], adj = c(0,0.3), \r\n\t\t\t\t\tcol = text.col)\r\n\t\t} else if (horiz == TRUE) {\r\n\t\t\t# For a horizontal scale bar\r\n\t\t\tmyx = (ticks[i]-1)\/scale + x1\r\n\t\t\t\r\n\t\t\t# This is an attempt to set the tick mark and labels just below the \r\n\t\t\t# bottom of the color bar without having them take up too much of\r\n\t\t\t# the plot area. The y locations are calculated as y1 minus a \r\n\t\t\t# fraction of the height of the rectangle\r\n\t\t\tmyy2 = y1 - ((y2-y1)*0.1)\r\n\t\t\tmyy3 = y1 - ((y2-y1)*0.13)\r\n\t\t\t# Draw little tick marks\r\n\t\t\tlines(x = c(myx,myx), y = c(y1,myy2), col = text.col)\r\n\t\t\t# Draw tick labels\r\n\t\t\ttext(x = myx, y = myy3, labels = ticks[i], adj = c(0.5,1), \r\n\t\t\t\t\tcol = text.col)\r\n\t\t}\r\n\t}\r\n\t# Draw a title for the color bar\r\n\ttext(x = ((x1+x2)\/2), y = y2, labels = title, adj = c(0.5,-0.35),\r\n\t\t\tcex = cex.title, col = text.col)\r\n}\r\n####################################################\r\n####################################################\r\n# Define a color ramp function from white to blue\r\n\r\n# From ColorBrewer 9-class Blues (single-hue). ColorBrewer recommends the \r\n# following set of 9 color values, expressed in hex format. I reverse them so\r\n# that the highest value will be the lightest color. \r\ncolfun = colorRampPalette(rev(c(\"#f7fbff\",\"#deebf7\",\"#c6dbef\",\"#9ecae1\",\r\n\t\t\t\t\t\t\"#6baed6\",\"#4292c6\",\"#2171b5\",\"#08519c\",\"#08306b\")),\r\n\t\tspace = 'Lab')\r\n\r\n# Define a set of colors from blue to white using that function, covering the\r\n# entire range of possible values for newdf$TotalPapers\r\ncols = colfun(max(newdf$TotalPapers))\r\n# Assign a color to each entry in the newdf data frame based on its TotalPapers\r\n# value. \r\nnewdf$col = \"\"\r\nfor (i in 1:nrow(newdf)){\r\n\tnewdf$col[i] = cols[newdf$TotalPapers[i]]\r\n}\r\n\r\n##############################\r\n# Create an output file in svg format\r\nsvg(filename = \"author-year-count.svg\", width = 9, height = 4.8)\r\npar(mar =c(5,6,1,2)) # Change the figure margins slightly\r\nplot(Count~Years, data = newdf, type = 'n', \r\n\t\tylim = c(0,45), las = 1,\r\n\t\tcex.lab = 1.6,\r\n\t\tcex.axis = 1.3,\r\n\t\tylab = 'Number of coauthors',\r\n\t\txlab = 'Publication Year',\r\n\t\tyaxt = 'n')\r\n# Color the background of the plot using a rectangle, and determine its \r\n# dimensions on the fly by calling the par()$usr function to get the coordinates\r\n# of the plot edges.\r\nrect(par()$usr[1],par()$usr[3],par()$usr[2],par()$usr[4], col = \"#BBBBBB\")\r\n# Draw some grid lines at useful locations\r\nabline(h = c(1,2,3,4,5,10,15,20,25,30,35,40), col = \"#CCCCCC\")\r\nabline(v = seq(1875,2015, by = 5), col = \"#CCCCCC\")\r\n# Redraw the plot's bounding box to cover where the horizontal lines overwrite\r\n# it. \r\nbox()\r\n# Redraw the point data over the newly drawn background and horizontal lines\r\npoints(Count~Years, data = newdf, col = newdf$col, pch = 20, cex = 0.9)\r\n# Call the color.bar function created earlier to create a color scale.\r\ncolor.bar(lut = cols, nticks = 8, horiz = TRUE,\r\n\t\tmin = 1, max = max(newdf$TotalPapers),\r\n\t\tx1 = 1880, x2 = 1920, y1 = 42, y2 = 44, \r\n\t\ttitle = 'Number of papers', cex.title = 1.1, text.col = 'black')\r\n# Draw the y-axis labels at the appropriate spots\r\naxis(2, at = c(1,2,3,4,5,10,15,20,25,30,35,40), \r\n\t\tlabels = c('1','','3','','5','10','15','20','25','30','35','40'), \r\n\t\tlas = 1, cex.axis = 1.1)\r\ndev.off()\r\n\r\n\r\n\r\n\r\n\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>&nbsp; Here&#8217;s a mostly useless visualization of the collection of journal articles that sits in my reference database in Endnote. I deal mostly in marine biology, physiology, biomechanics, and climate change papers, with a few molecular\/genetics papers thrown in here and there. The database has 3325 entries, 2 of which have ambiguous publication years and [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4,218],"tags":[210,211,17,58],"class_list":["post-1957","post","type-post","status-publish","format-standard","hentry","category-journal","category-r-project","tag-color-palette","tag-color-ramp","tag-plotting","tag-r-project"],"_links":{"self":[{"href":"https:\/\/lukemiller.org\/index.php\/wp-json\/wp\/v2\/posts\/1957","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lukemiller.org\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lukemiller.org\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lukemiller.org\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/lukemiller.org\/index.php\/wp-json\/wp\/v2\/comments?post=1957"}],"version-history":[{"count":16,"href":"https:\/\/lukemiller.org\/index.php\/wp-json\/wp\/v2\/posts\/1957\/revisions"}],"predecessor-version":[{"id":1974,"href":"https:\/\/lukemiller.org\/index.php\/wp-json\/wp\/v2\/posts\/1957\/revisions\/1974"}],"wp:attachment":[{"href":"https:\/\/lukemiller.org\/index.php\/wp-json\/wp\/v2\/media?parent=1957"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lukemiller.org\/index.php\/wp-json\/wp\/v2\/categories?post=1957"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lukemiller.org\/index.php\/wp-json\/wp\/v2\/tags?post=1957"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}