{"id":918,"date":"2011-06-23T15:08:17","date_gmt":"2011-06-23T19:08:17","guid":{"rendered":"http:\/\/lukemiller.org\/?p=918"},"modified":"2011-10-03T11:03:38","modified_gmt":"2011-10-03T15:03:38","slug":"digitizing-data-from-old-plots-using-digitize","status":"publish","type":"post","link":"https:\/\/lukemiller.org\/index.php\/2011\/06\/digitizing-data-from-old-plots-using-digitize\/","title":{"rendered":"Digitizing data from old plots using &#8216;digitize&#8217;"},"content":{"rendered":"<p>The <a href=\"http:\/\/journal.r-project.org\/archive\/2011-1\/2011-1_index.html\" target=\"_blank\">June 2011 issue<\/a> of <a href=\"http:\/\/journal.r-project.org\/index.html\" target=\"_blank\">The R Journal <\/a>contains an article on the R package <strong>digitize<\/strong> (<a href=\"http:\/\/journal.r-project.org\/archive\/2011-1\/RJournal_2011-1_Poisot.pdf\" target=\"_blank\">link to pdf<\/a>) by Timoth\u00e9e Poisot. This might prove to be a handy tool if you occasionally find yourself needing to retrieve data points from figures in old articles for which you don&#8217;t have the raw data. There are a number of other stand-alone software tools to accomplish this same task, such as <a href=\"http:\/\/plotdigitizer.sourceforge.net\/\" target=\"_blank\">PlotDigitizer<\/a>, <a href=\"http:\/\/datathief.org\/\" target=\"_blank\">DataThief<\/a>, or TechDig (good luck finding that last one). These other programs all work, but I use them so rarely that I typically forget the name of the program and spend more time looking for a new program than it takes to actually digitize the graph. As a result, having a package in R to do the same task might save some exasperation in the future. For an alternative method using <a href=\"http:\/\/rsb.info.nih.gov\/ij\/\" target=\"_blank\">ImageJ<\/a>, see <a href=\"https:\/\/lukemiller.org\/?p=1050\" target=\"_blank\">this post.<\/a><\/p>\n<p>To get ahold of <strong>digitize<\/strong>, simply open up an R session and type the following at the command prompt:<br \/>\n<code>install.packages('digitize') #download the digitize package<br \/>\nlibrary(digitize) #load the digitize package into the workspace<\/code><\/p>\n<p>The normal usage routine involves three steps:<\/p>\n<p><code>cal = ReadAndCal('<em>imagefilename<\/em>.jpg')<\/code><\/p>\n<figure id=\"attachment_924\" aria-describedby=\"caption-attachment-924\" style=\"width: 300px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/lukemiller.org\/wp-content\/uploads\/2011\/06\/Rintro-snail1.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-924\" title=\"Rintro-snail1\" src=\"https:\/\/lukemiller.org\/wp-content\/uploads\/2011\/06\/Rintro-snail1-300x300.jpg\" alt=\"\" width=\"300\" height=\"300\" srcset=\"https:\/\/lukemiller.org\/wp-content\/uploads\/2011\/06\/Rintro-snail1-300x300.jpg 300w, https:\/\/lukemiller.org\/wp-content\/uploads\/2011\/06\/Rintro-snail1-150x150.jpg 150w, https:\/\/lukemiller.org\/wp-content\/uploads\/2011\/06\/Rintro-snail1-1024x1024.jpg 1024w, https:\/\/lukemiller.org\/wp-content\/uploads\/2011\/06\/Rintro-snail1.jpg 1200w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><figcaption id=\"caption-attachment-924\" class=\"wp-caption-text\">The original figure to be digitized.<\/figcaption><\/figure>\n<p>This opens the jpeg in a plotting window and lets you define points on the x and y axes. You must start by clicking on the left-most x-axis point, then the right-most axis point, followed by the lower y-axis point and finally the upper y-axis point. You don&#8217;t need to choose the end points of the axis, only two points on the axis that you know the x or y value for. As you click on each of the 4 points, the coordinates are saved in the object <code>cal<\/code>.<\/p>\n<figure id=\"attachment_927\" aria-describedby=\"caption-attachment-927\" style=\"width: 300px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/lukemiller.org\/wp-content\/uploads\/2011\/06\/control_points.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-927\" title=\"control_points\" src=\"https:\/\/lukemiller.org\/wp-content\/uploads\/2011\/06\/control_points-300x284.png\" alt=\"\" width=\"300\" height=\"284\" srcset=\"https:\/\/lukemiller.org\/wp-content\/uploads\/2011\/06\/control_points-300x284.png 300w, https:\/\/lukemiller.org\/wp-content\/uploads\/2011\/06\/control_points.png 556w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><figcaption id=\"caption-attachment-927\" class=\"wp-caption-text\">The four axis control points have been placed on the x and y axes, visible as blue crosses.<\/figcaption><\/figure>\n<p>The next step is:<br \/>\n<code>data.points = DigitData(col = 'red')<\/code><\/p>\n<p>You return to the figure window, and now you can click on each of the data points you&#8217;re interested in retrieving values for. The function will place a dot (colored red in this case) over each point you click on, and the raw x,y coordinates of that point will be saved to the <code>data.points<\/code> list. When you&#8217;re finished clicking points, you need to hit stop or right-click to stop the data point collection.<\/p>\n<figure id=\"attachment_928\" aria-describedby=\"caption-attachment-928\" style=\"width: 300px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/lukemiller.org\/wp-content\/uploads\/2011\/06\/data_points.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-928\" title=\"data_points\" src=\"https:\/\/lukemiller.org\/wp-content\/uploads\/2011\/06\/data_points-300x272.png\" alt=\"\" width=\"300\" height=\"272\" srcset=\"https:\/\/lukemiller.org\/wp-content\/uploads\/2011\/06\/data_points-300x272.png 300w, https:\/\/lukemiller.org\/wp-content\/uploads\/2011\/06\/data_points.png 527w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><figcaption id=\"caption-attachment-928\" class=\"wp-caption-text\">Click on each of the data points you want. The function will place a marker over each clicked position.<\/figcaption><\/figure>\n<p>&nbsp;<\/p>\n<p>Finally, you need to convert those raw x,y coordinates into the same scale as the original graph. You do this by calling the <code>Calibrate<\/code> function and feeding it your <code>data.point<\/code> list, the <code>cal<\/code> list that contains your 4 control points from the first step, and then 4 numeric values that represent the 4 original points you clicked on the x and y axes. These values should be in the original scale of the figure (i.e. read the values off the graph&#8217;s tick marks).<\/p>\n<p><code>df = Calibrate(data.points, cal, 0.1, 0.4, 0.0, 0.6)<br \/>\nhead(df)<br \/>\nx y<br \/>\n1 0.04327273 5.551115e-17<br \/>\n2 0.04763636 3.344262e-02<br \/>\n3 0.05418182 4.327869e-02<br \/>\n4 0.09018182 7.868852e-02<br \/>\n5 0.11200000 1.042623e-01<br \/>\n6 0.15454545 1.967213e-01<\/code><\/p>\n<p>The resulting data frame <code>df<\/code> contains 2 columns, x and y, that contain the scaled values for each point you clicked on the original graph. Note that points that are close to zero aren&#8217;t ever going to read exactly zero, but instead will be some very small number in scientific notation (i.e. 5.551115e-17).<\/p>\n<p>There is one major caveat with the use of <strong>digitize<\/strong>, that being that your jpeg figure must be rotated correctly so that the x axis is perfectly horizontal and the y-axis is vertical. The <strong>digitize<\/strong> package doesn&#8217;t make any attempt to correct for rotated axes, and if you input a poorly scanned image that isn&#8217;t straight, you&#8217;ll get bogus numbers in return. For example, I tried digitizing a rotated version of the above figure (shown below).<\/p>\n<figure id=\"attachment_930\" aria-describedby=\"caption-attachment-930\" style=\"width: 300px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/lukemiller.org\/wp-content\/uploads\/2011\/06\/Rintro-snail1_rot.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-930\" title=\"Rintro-snail1_rot\" src=\"https:\/\/lukemiller.org\/wp-content\/uploads\/2011\/06\/Rintro-snail1_rot-300x300.jpg\" alt=\"\" width=\"300\" height=\"300\" srcset=\"https:\/\/lukemiller.org\/wp-content\/uploads\/2011\/06\/Rintro-snail1_rot-300x300.jpg 300w, https:\/\/lukemiller.org\/wp-content\/uploads\/2011\/06\/Rintro-snail1_rot-150x150.jpg 150w, https:\/\/lukemiller.org\/wp-content\/uploads\/2011\/06\/Rintro-snail1_rot-1024x1024.jpg 1024w, https:\/\/lukemiller.org\/wp-content\/uploads\/2011\/06\/Rintro-snail1_rot.jpg 1302w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><figcaption id=\"caption-attachment-930\" class=\"wp-caption-text\">An example image with off-kilter axes.<\/figcaption><\/figure>\n<p>The results I got from <strong>digitize<\/strong> on this rotated image do not match the results from the original straight image because my control points on the rotated image are different distances from each other due to the rotation. The plot of the two resulting data sets (straight image in black, rotated image in blue) shows the very different results obtained from an off-kilter image.<\/p>\n<figure id=\"attachment_931\" aria-describedby=\"caption-attachment-931\" style=\"width: 300px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/lukemiller.org\/wp-content\/uploads\/2011\/06\/rotation_comparison.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-931\" title=\"rotation_comparison\" src=\"https:\/\/lukemiller.org\/wp-content\/uploads\/2011\/06\/rotation_comparison-300x300.png\" alt=\"\" width=\"300\" height=\"300\" srcset=\"https:\/\/lukemiller.org\/wp-content\/uploads\/2011\/06\/rotation_comparison-300x300.png 300w, https:\/\/lukemiller.org\/wp-content\/uploads\/2011\/06\/rotation_comparison-150x150.png 150w, https:\/\/lukemiller.org\/wp-content\/uploads\/2011\/06\/rotation_comparison.png 672w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><figcaption id=\"caption-attachment-931\" class=\"wp-caption-text\">The digitized data from the straight figure are shown in black, while the digitized data from the rotated figure are shown in blue. The calculated values for each point differ as a result of the rotation.<\/figcaption><\/figure>\n<p>Just be careful when creating your input image. If you scan the image slightly off-kilter, open it in an image manipulation program and rotate the figure until the axes are perfectly horizontal\/vertical, and then run it through the <strong>digitize<\/strong> workflow.<\/p>\n<p>One other minor limitation of <strong>digitize<\/strong> is that it only supports jpeg images, so you&#8217;ll need to convert tiff, png, or pdf images to jpegs before use.<\/p>\n<p>Edit, October 2011: You might also find this web-based plot digitizer developed by Ankit Rohatgi to be\u00a0useful: <a title=\"http:\/\/arohatgi.info\/WebPlotDigitizer\/\" href=\"http:\/\/arohatgi.info\/WebPlotDigitizer\/\" target=\"_blank\">http:\/\/arohatgi.info\/WebPlotDigitizer\/<\/a> This program runs inside of your Firefox or Chrome web-browser. Simply drag your plot image onto the application webpage and begin digitizing points. There are helpful tutorials provided as well.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The June 2011 issue of The R Journal contains an article on the R package digitize (link to pdf) by Timoth\u00e9e Poisot. This might prove to be a handy tool if you occasionally find yourself needing to retrieve data points from figures in old articles for which you don&#8217;t have the raw data. There are [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[218],"tags":[89,87,88,58,90],"class_list":["post-918","post","type-post","status-publish","format-standard","hentry","category-r-project","tag-data-retrieval","tag-digitize","tag-figure","tag-r-project","tag-scanned-image"],"_links":{"self":[{"href":"https:\/\/lukemiller.org\/index.php\/wp-json\/wp\/v2\/posts\/918","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lukemiller.org\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lukemiller.org\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lukemiller.org\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/lukemiller.org\/index.php\/wp-json\/wp\/v2\/comments?post=918"}],"version-history":[{"count":27,"href":"https:\/\/lukemiller.org\/index.php\/wp-json\/wp\/v2\/posts\/918\/revisions"}],"predecessor-version":[{"id":1070,"href":"https:\/\/lukemiller.org\/index.php\/wp-json\/wp\/v2\/posts\/918\/revisions\/1070"}],"wp:attachment":[{"href":"https:\/\/lukemiller.org\/index.php\/wp-json\/wp\/v2\/media?parent=918"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lukemiller.org\/index.php\/wp-json\/wp\/v2\/categories?post=918"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lukemiller.org\/index.php\/wp-json\/wp\/v2\/tags?post=918"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}