{"id":124,"date":"2010-02-27T19:08:50","date_gmt":"2010-02-28T00:08:50","guid":{"rendered":"http:\/\/lukemiller.org\/?p=124"},"modified":"2020-11-24T15:52:30","modified_gmt":"2020-11-24T23:52:30","slug":"calculating-lt50-median-lethal-temperature-aka-ld50-quickly-in-r","status":"publish","type":"post","link":"https:\/\/lukemiller.org\/index.php\/2010\/02\/calculating-lt50-median-lethal-temperature-aka-ld50-quickly-in-r\/","title":{"rendered":"Calculating LT50 (median lethal temperature, aka LD50) quickly in R"},"content":{"rendered":"<p>Say you&#8217;ve got a bunch of survival\/mortality data from an experiment. Maybe you exposed batches of snails to various high temperatures for a few hours, and recorded the number alive and dead in each batch at the end. Now you&#8217;d like to report the median lethal temperature (or perhaps a lethal dosage if you were injecting stuff into critters). We can do this fairly quickly using R statistical software to perform a logistic regression and back-calculate the LT50.<\/p>\n<p>This assumes that you already have R installed on your computer and you know how to fire it up.<\/p>\n<p>To start with, we need some data.<\/p>\n<figure id=\"attachment_125\" aria-describedby=\"caption-attachment-125\" style=\"width: 222px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/lukemiller.org\/wp-content\/uploads\/2010\/02\/data_in_excel.gif\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-125\" title=\"The input data\" src=\"https:\/\/lukemiller.org\/wp-content\/uploads\/2010\/02\/data_in_excel.gif\" alt=\"\" width=\"222\" height=\"292\" \/><\/a><figcaption id=\"caption-attachment-125\" class=\"wp-caption-text\">The input data<\/figcaption><\/figure>\n<p>In this case, I&#8217;ve entered my data into Excel in an arrangement suitable for importing into R. This file, which contains only these data on it, should be saved as a comma-separated-value (.csv) file. Other text file styles would work, but we&#8217;ll stick with .csv. You&#8217;ll need to know the path to the directory where this file is stored so that you can find it with R.<\/p>\n<p>With R open, we&#8217;ll load the .csv file into the R workspace by entering the following command at the prompt:<\/p>\n<pre>&gt; data = read.csv(\"D:\/directory\/data.csv\")<\/pre>\n<p>where <code>D:\/directory\/<\/code> is the path to our file, and <code>data.csv<\/code> is the name of the file we created in Excel. The data are now entered into a data frame called <code>'data'<\/code>.<\/p>\n<p>We can examine the contents of\u00a0 <code>'data'<\/code> by simply typing the name of the data frame at the prompt:<\/p>\n<pre>&gt; data<\/pre>\n<p><code><\/code><code><\/code>which would return something like this:<\/p>\n<pre><code>   temperature alive dead\n1           38    20    0\n2           40    20    0\n3           40    20    0\n4           40    20    0\n5           42    19    1\n6           42    14    6\n7           42    15    5\n8           44     5   15\n9           44     3   17\n10          44     2   18\n11          47     0   20\n12          47     0   20\n13          47     0   20<\/code><\/pre>\n<p>That looks about right. There are data from the 13 different trials I ran, with the temperature, number alive, and number dead given for each trial. Having the numbers as alive + dead is necessary here, so if you recorded your data as % survival or % mortality, convert the values back into counts.<\/p>\n<p>Now if you&#8217;re having trouble importing your data, you could type it directly into a data frame in R as follows:<\/p>\n<p><code><\/code><\/p>\n<pre>&gt; data = data.frame(temperature = <br \/>   c(38,40,40,40,42,42,42,44,44,44,47,47,47),<br \/>   alive = c(20,20,20,20,19,14,15,5,3,2,0,0,0),<br \/>   dead = c(0,0,0,0,1,6,5,15,17,18,20,20,20))<\/pre>\n<p>That recreates the data table from above.<\/p>\n<p>Now that your data are in the workspace, let&#8217;s attach them so that we can refer to each column by its header directly:<\/p>\n<pre>&gt; attach(data)<\/pre>\n<p><code><\/code>If you then typed <code>&gt; temperature<\/code>, R would respond with all the contents of the temperature column.<\/p>\n<p>Next it&#8217;s necessary to create a separate array of our response variables (alive and dead):<\/p>\n<pre>&gt; y = cbind(alive, dead)<\/pre>\n<p>At this point we can run the generalized linear model using a binomial distribution to fit a curve to temperature and alive\/dead data:<\/p>\n<pre>&gt; model.results = glm(y ~ temperature, binomial)<\/pre>\n<p>In that line, we created an object <code>model.results<\/code> to hold our output, and called the <code>glm<\/code> function. We supplied <code>glm<\/code> with our response variable and the treatment levels <code>y ~ temperature,<\/code> followed by the distribution we want to use <code> binomial <\/code>. Since our response variable only has two states (alive or dead), the binomial distribution is the weapon of choice.<\/p>\n<p>You can check the results of the models by typing the name of the model object you just created:<\/p>\n<pre>&gt; model.results<\/pre>\n<p>which outputs the following for our data above:<\/p>\n<pre>Call: glm(formula = y ~ temperature, family = binomial)<br \/><br \/>Coefficients:<br \/>(Intercept) temperature <br \/>     69.113      -1.609<br \/><br \/>Degrees of Freedom: 12 Total (i.e. Null); 11 Residual<br \/>Null Deviance:      252.2 <br \/>Residual Deviance: 8.309     AIC: 29.29<\/pre>\n<p>It&#8217;s worth noting that the Residual Deviance here (8.309) is less than our Residual Degrees of Freedom (11), meaning that our data are not <a href=\"http:\/\/en.wikipedia.org\/wiki\/Overdispersion\">overdispersed.<\/a> If the Residual Deviance was something large like 20 compared to our 11 Residual Degrees of freedom, it might be worth trying a log-transformation on the treatment values. This might be the case if you were administering dosages of some drug over a very wide range of concentrations. To carry out a logistic regression on our log-transformed data, we&#8217;d simply modify the glm call slightly:<\/p>\n<pre>&gt; model.results = glm(y ~ log(temperature), binomial)<\/pre>\n<p>which calculates the natural log of each treatment level prior to carrying out the regression.<\/p>\n<p>You can also print out the model results in summary format, which will also show you standard error estimates, z-values, and P-values for the estimates of the intercept and slope coefficients of the logistic regression:<\/p>\n<pre>&gt; summary(model.results)<\/pre>\n<p>The summary results look like this:<\/p>\n<pre>Call:<br \/>glm(formula = y ~ temperature, family = binomial)<br \/><br \/>Deviance Residuals: <br \/>     Min       1Q   Median      3Q     Max <br \/>-1.30305 -0.24134 -0.05082 0.59144 1.74125<br \/><br \/>Coefficients:<br \/>           Estimate Std. Error z value Pr(&gt;|z|) <br \/>(Intercept) 69.1126     9.0186   7.663 1.81e-14 ***<br \/>temperature -1.6094     0.2103  -7.653 1.96e-14 ***<br \/>---<br \/>Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1<br \/><br \/>(Dispersion parameter for binomial family taken to be 1)<br \/><br \/>Null deviance: 252.1776 on 12 degrees of freedom<br \/>Residual deviance: 8.3088 on 11 degrees of freedom<br \/>AIC: 29.294<br \/>Number of Fisher Scoring iterations: 5<\/pre>\n<p>To calculate what the median lethal temperature in the experiment was, we&#8217;ll use a function in the MASS library that comes with the base installation of R. Load the MASS library:<\/p>\n<pre>&gt; library(MASS)<\/pre>\n<p>\u00a0The MASS library has a handy function called <code>dose.p<\/code> that will calculate any fractional dosage value you want (i.e. LT50, LT90, LT95 etc). Call <code>dose.p<\/code> as follows:<\/p>\n<p><code><\/code><\/p>\n<pre>&gt; dose.p(model.results, p = 0.5)<br \/><code><\/code><\/pre>\n<p><code><\/code>Note that we input our model.results object as the input, and then input our desired level (p=0.5, which is 50% survival).<br \/>The results from <code>dose.p<\/code> will look like this:<\/p>\n<pre>             Dose        SE\np = 0.5: 42.94194 0.1497318<\/pre>\n<p>In this case, the value given under Dose is the median lethal dose (i.e. temperature), 42.94194 (we&#8217;ll drop a few significant digits and call it 43\u00b0C when it&#8217;s time to write things up). Note here that if you had to log transform your data during the <code>glm<\/code> call above, the value returned in Dose will also be log-transformed (3.7598 if we did this to the data above). You would convert it back to natural units using the exponential command: <code>&gt; exp(3.7598)<\/code> which would return ~42.9\u00b0C if we used this data set.<\/p>\n<p>If we also wanted to know what dosage gave 90% survival, we&#8217;d call <code>dose.p<\/code> as such:<br \/><code><\/code><\/p>\n<pre>&gt; dose.p(model.results, p = c(0.5, 0.9))<\/pre>\n<p>It might also be useful to have a graph of % Survival and a logistic curve fit to the data for making a figure. Let&#8217;s do that.<\/p>\n<p>To begin, we can plot our original alive\/dead data as % survival by just dividing the number alive by the total number of animals per trial when we call the plot command:<\/p>\n<pre>&gt; plot(temperature, (alive \/ (alive + dead)), ylab = \"Proportional Survival\")<\/pre>\n<p>Note the <code> (alive \/ (alive + dead)) <\/code> section in there. That tells R that we want our y-values to be the number alive divided by the total number of animals per trial (a proportion between 0 and 1).<br \/>The resulting plot looks like this:<\/p>\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lukemiller.org\/wp-content\/uploads\/2020\/11\/Logistic_plot1.png\" alt=\"Initial plot of the raw survival data points\" class=\"wp-image-2593\" width=\"504\" height=\"504\" srcset=\"https:\/\/lukemiller.org\/wp-content\/uploads\/2020\/11\/Logistic_plot1.png 1008w, https:\/\/lukemiller.org\/wp-content\/uploads\/2020\/11\/Logistic_plot1-300x300.png 300w, https:\/\/lukemiller.org\/wp-content\/uploads\/2020\/11\/Logistic_plot1-150x150.png 150w, https:\/\/lukemiller.org\/wp-content\/uploads\/2020\/11\/Logistic_plot1-768x768.png 768w\" sizes=\"auto, (max-width: 504px) 100vw, 504px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><br>Next we need to come up with a way to plot a logistic curve over our raw data values. To do this, we&#8217;ll have to define a function that will calculate y-values along a logistic curve. We do this as follows:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">&gt; logisticline = function(z) {\n\teta = 69.1126 + -1.6094 * z;\n\t1 \/ (1 + exp(-eta))\n\t}<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">In the 2nd line of that equation, we write a little formula for &#8216;eta&#8217; using the values 69.1126 and -1.6094, which were the intercept and slope that were reported in our <code>summary(model.results)<\/code> given above. Substitute your own slope and intercept in there, but leave the &#8216;z&#8217; alone. The third line is the standard <a href=\"http:\/\/en.wikipedia.org\/wiki\/Logistic_equation\">logistic function<\/a> used to calculate the value &#8216;logisticline&#8217; for each input value z. Leave the third line alone.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Now we need to create a vector of temperature values that will be used as input to our function &#8216;logisticline&#8217; to create the y-values to plot our curve.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">&gt; x = seq(38, 47, 0.01)<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">In that line, we define an object &#8216;x&#8217; and fill it with a sequence (i.e. &#8216;seq&#8217;) of numbers from 38 to 47 (my temperature limits) using a step size of 0.01. The small step size ensures that the curve will look smooth. If you don&#8217;t specify a step size, the &#8216;seq&#8217; command will step by 1, making for a non-curvy curve in this case.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Finally, we can add this line to our existing plot. Note that we&#8217;ll call the &#8216;logisticline&#8217; function we just defined, and supply it with the vector &#8216;x&#8217;, so that the y-values plotted on the graph are calculated by plugging each value of x into the logistic equation and plotting the resulting proportional survival:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">&gt; lines(x, logisticline(x), new = TRUE)<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\"><br>The resulting output should look something like this:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized is-style-default\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lukemiller.org\/wp-content\/uploads\/2020\/11\/Logistic_plot2.png\" alt=\"With the logistic regression line added to the plot.\" class=\"wp-image-2594\" width=\"504\" height=\"504\" srcset=\"https:\/\/lukemiller.org\/wp-content\/uploads\/2020\/11\/Logistic_plot2.png 1008w, https:\/\/lukemiller.org\/wp-content\/uploads\/2020\/11\/Logistic_plot2-300x300.png 300w, https:\/\/lukemiller.org\/wp-content\/uploads\/2020\/11\/Logistic_plot2-150x150.png 150w, https:\/\/lukemiller.org\/wp-content\/uploads\/2020\/11\/Logistic_plot2-768x768.png 768w\" sizes=\"auto, (max-width: 504px) 100vw, 504px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Should you want to make your own plots look more presentable, there are some tips for modifying the output of the plot <a href=\"https:\/\/lukemiller.org\/index.php\/2010\/05\/modifying-basic-plots-in-r\/\">in this post.<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Say you&#8217;ve got a bunch of survival\/mortality data from an experiment. Maybe you exposed batches of snails to various high temperatures for a few hours, and recorded the number alive and dead in each batch at the end. Now you&#8217;d like to report the median lethal temperature (or perhaps a lethal dosage if you were [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[218],"tags":[60,61,59,7,58,8],"class_list":["post-124","post","type-post","status-publish","format-standard","hentry","category-r-project","tag-ld50","tag-logistic-regression","tag-lt50","tag-r","tag-r-project","tag-statistics"],"_links":{"self":[{"href":"https:\/\/lukemiller.org\/index.php\/wp-json\/wp\/v2\/posts\/124","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lukemiller.org\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lukemiller.org\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lukemiller.org\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/lukemiller.org\/index.php\/wp-json\/wp\/v2\/comments?post=124"}],"version-history":[{"count":25,"href":"https:\/\/lukemiller.org\/index.php\/wp-json\/wp\/v2\/posts\/124\/revisions"}],"predecessor-version":[{"id":2598,"href":"https:\/\/lukemiller.org\/index.php\/wp-json\/wp\/v2\/posts\/124\/revisions\/2598"}],"wp:attachment":[{"href":"https:\/\/lukemiller.org\/index.php\/wp-json\/wp\/v2\/media?parent=124"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lukemiller.org\/index.php\/wp-json\/wp\/v2\/categories?post=124"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lukemiller.org\/index.php\/wp-json\/wp\/v2\/tags?post=124"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}