#=======Okay actual code here==============

samples <- readRDS(file = "samples.rds")
data.full <- readRDS(file = "data.rds")
data <- data.full
data$total.avg.speed <- NULL
data$meda <- NULL

Introduction

The forthcoming publication—Burchill et al. 202X—examines the ability of ant teams to collectively transport large food loads back to their colony’s nest. Unlike previous research into cooperative transport, however, this publication focuses on ants transporting loads up inclined and vertical surfaces. To quantify the efficiency of ant teams, Burchill et al. (202X) uses a common metric in the cooperative transport literature called the “prey delivery rate,” which is defined as: \(\frac{load\; weight \times speed\; of\; load}{\#\; of\; ants}\) . It approximates the rate at which prey mass is delivered to the colony on a per ant basis (Buffin and Pratt, 2016).

However, according to frequentist, linear regression models, the experimental treatments do not seem to effect prey delivery rate (PDR), perhaps indicating that the ants are dynamically adjusting team effort to keep the PDR constant. Additionally, such a proposed strategy is predicted via optimality models. Frequentist statistics and traditional null hypothesis approaches do not allow the researchers to “prove” the null model that represents a constant PDR across predictors, hence the use of Bayesian statistics.

Data collection methods

Two Oecophylla smaragdina (green weaver ant) colonies were recorded in Kirwan, Australia in the summer of 2018. Colonies occupied nests that spanned multiple trees, and all food items were collected and transported up the tree trunks to their leaf-nests. Cardboard platforms were erected at the base of three trees per colony, and the ants’ foraging trails were re-routed over these experimental platforms. Each day of the experiment, one platform was inclined at a 45° from the horizontal, one was set at a 90° vertical angle, and one platform was set to a flat horizontal 0° incline. The order was randomized, but over the course of three days, each tree would be given each of the three angles.

Additionally, load weight was experimentally manipulated, with three weight classes presented to colonies. Loads were either unburdened, dead crickets—the ants’ preferred prey to transport—with a weight of ~0.25g, crickets with one lead weight attached (~1.25g), or crickets with two lead weights attached (~2.25g). These loads were weighed prior to presentation. Loads of each weight class were presented in a randomized order to each experimental platform, and each load was run twice before moving on to the next weight class. Essentially, each colony had three trees that were tested, each tree was given each of three angled platforms, and each platform was given each of three weight classes.

After the loads were placed on the platforms, they were recorded with digital cameras until the load was transported across the platform, ~50cm. The video cameras were positioned on tripods such that their orientation was orthogonal to the platform; regardless of platform’s inclination, recorded videos provided similar “bird’s eye views” (see the figure below). Once the ants transported the loads to the far end of the platform, the load and the attached ants were removed. The ants would be brushed off, away from the platform, and the load would be returned for a second run before moving on to the next weight class.

Experimental setup with a 0° incline. Notice that the camera is positioned orthogonal and approximately a meter from the platform, presenting a “bird’s eye view.”

The distance between the starting point of the load and its final position, as well as the time it took to arrive there, were calculated from the videos. Ten still frames, evenly distributed across the video, were examined and the number of ants engaged with the load was recorded from each video. A median number of ants was then calculated. Thus, the PDR for each run was calculated as the weight of the load, multiplied by its average speed, divided by the median number of ants involved.

Analytical methods

A Bayesian approach (model shown below) was used to model the effects (or lack thereof) of the experimental treatments on ant teams’ prey delivery rate (PDR). The R package rjags was used to fit the parameters in the model and generate posterior distributions with MCMC methods.

The “Kruschke-style” diagram of the model, including all specific parameter values set in the priors. Details about the priors, etc. follow below.

set.seed(123)

Likelihood function

The distribution of prey delivery rate (PDR) values across runs is right-skewed, bounded by zero, and clearly non-normal. Thus, I felt that a gamma distribution would be the most appropriate likelihood function. Shown below, such a distribution seems to fit the data well.

data$PDR %>% hist(main="PDR values across the runs\n(grams*cm / ants*seconds)",
                  xlab="prey delivery rate")

rgamma(106,5.865493,447.4947) %>% 
  hist(add=T, breaks = breaks1,
       col=rgb(0, 0, .9, 0.2), border=rgb(0, 0, .9, 0.5))

I chose to reparametrize the gamma distribution’s shape and rate priors in terms of mean and standard deviation. This way, the model can be specified similar to standard regression models.

Because all the PDR values are inherently positive, I used a log-link function to keep the model values above zero. However, link functions make formulating some of the priors a bit more difficult.

Picking priors

Intercept

Effectively, what I decided to do was formulate the intercept as if there were no link function, then use the link function to transform the original distribution into the new log-linked space.

Given that the base/intercept PDR is not an easily conceptualized variable and there is little previous knowledge for this situation, I chose to loosely base the intercept’s prior off the raw PDR data. The mean of the raw data is 0.0131074 , so if my model didn’t have a log-link function, I would have set the prior on the intercept to be a normal distribution with a mean of 0.013. Because I don’t want the priors to be very informed, I would have used a large standard deviation, say 0.1 (much larger than even the maximum value).

However, I need to transform this prior into a log-link prior. To get an estimate I logged the parameters of the original prior (using the absolute value of the new standard deviation to keep it positive) to generate the actual prior for my model. For comparison purposes, I then transformed this distribution BACK into “regular” space (through the inverse of the link function) so that I could plot it alongside the original prior (see below).

mean_newprior <- rnorm(500, log(0.013), abs(log(.1))) 

mean_prior %>% 
  density() %>%
  plot(main="Original and transformed prior of the intercept",
       xlab="mean prey delivery rate at intercept",
       col="aquamarine3", lwd=2,
       xlim=c(-.4,1), ylim=c(0,6))

mean_newprior %>% exp() %>% density(bw=.1) %>% lines(lwd=2)

The original prior I made for the untransformed intercept (in teal) vs the log-link’d prior (in black) exponentiated back into ‘regular’ space. Notice the really long right tail of the log-link’d prior

Coefficients/slopes of predictors

Slopes of zero still have the same intuitive meaning in log-link space: no effect, neither positive or negative. Since I do not have a priori information on which direction the predictors will affect the PDR, the coefficients associated with my predictors are equally likely to be positive or negative Therefore, the priors for these slopes should be symmetric around zero with a large standard deviation (I have little idea how strong their effects will be).

Thus, I chose a normal distribution with a mean of zero and a standard deviation of one. Plotted below against one of the predictors, you can see that my choice of priors for the intercept and slope have good coverage over the likely regression options.

beta_prior <- rnorm(10000, 0, 1) 

plot(PDR ~ incline, bty="n", pch=16, col="aquamarine3", data=data,
     main="Priors plotted on incline vs. PDR")

vals <- sample(c(1:length(mean_newprior)), 500)
x <- seq(0,90, 1)
for (v in vals) {
  y <- exp(mean_newprior[v] + beta_prior[v]*(x-45)/45)
  lines(y~x, col=rgb(0, 0, .9, 0.09))
}

The plotting possible regression lines generated from the slope and intercept priors. Here the angle of inclination is used as an example predictor.

Random effect of colony

Additionally, studies on social insects often include a random effect of colony identity in their statistical models, I wanted to do the same. However, given that I only have good data from two colonies, I chose to have the standard intercept represent the first colony, and the second colony would have a random deviation from this value. That way, the model isn’t trying to estimate three parameters that sum to only two final values, which would create high degrees of cross-correlation.

The random effect of colony ID, like the slopes of the predictors, could be either positive or negative. Similar to the slopes, I set the prior of the random effect to be a normal distribution with a mean of zero and a large distribution. However, because this is now a hierarchical model, I will treat the standard deviation of this normal distribution as a hyper-prior and allow it to be fit with the data.

Standard deviation of the random effect

Because standard deviations need to be positive, and because I think the variation between colonies is likely small but I want to allow for the possibility of a larger difference, I used a gamma distribution for this hyper-prior. Using a shape of two and a rate of fifty, I get a nice distribution where the standard deviation would vary mostly be less than 0.1 (already huge) but could get much larger.

hist(rgamma(10000,2,50),
     main="Standard deviation (hyper-prior) of\n the random effect of colony ID",
     xlab="standard deviation values for normal prior")

Below we can plot the difference between the two colonies and see that my priors cover the realistic options.

density(
  sample(data$PDR[which(data$colony.id==1)],200,rep=T) -
    sample(data$PDR[which(data$colony.id==2)],200,rep=T)
  ) %>% 
  plot(xlim=c(-.05,.05), 
       main="Bootstrapped difference in PDR values\nbetween colonies")

replicate(500, lines(density(rnorm(200,0,rgamma(1,2,50))),col=rgb(0, 0, .9, 0.05)))

The random effect of colony ID, with only two colonies, can be conceived of as essentially the difference between them. The ‘actual’ distribution of bootstrapped differences between the two colonies is shown in black. The blue lines are 500 prior distributions of this random effect, with a mean of zero and a standard deviation drawn from our hyper-prior.

Standard deviation of PDR in general

Like the intercept, it is reparametrized into the shape and rate parameters of the likelihood function. Also like the intercept, this was prior was loosely based on the data. The actual standard deviation of the PDR values is 0.0054121 . Like the standard deviation of the random effect, I chose to use a gamma distribution with a shape of two and a rate of 50. Given that ~30% of these values are 10x greater than the standard deviation of the raw data, I feel that it’s safe and not very informative.

hist(rgamma(10000,2,50),
     main="Prior distribution of the\nstandard deviation of the PDR",
     xlab="standard deviation values")

nodes <- c("base","random.effect[2]","sd","sd_rando","beta[1]","beta[2]","beta[3]")
#samples <- coda.samples(model,nodes,2000,1)

MCMC diagnostics

The MCMC chains seemed to run well. When we look at Gelman and Rubin’s convergence diagnostic for every parameter, the largest upper confidence interval is 1.0041893 , which is around or below the 1.01 value that we’re looking for.

When we look at the cross-correlation between the parameters, most pairs are appropriately low. There is somewhat strong cross-correlation between the intercept and the random effect, but that’s to be expected, given the intercept plus the random effect sums to the second colony’s intercept.

crosscorr.plot(samples)

Additionally, the effective sample size for each parameter is high. The ’ sd_rando ’ parameter has the lowest sample size of 8938.6623823 , which is still quite high. If the sample size is large enough, the efficiency of MCMC chains is not important.

Lastly, here is a summary of our parameters

summary(samples)

## 
## Iterations = 1001:21000
## Thinning interval = 1 
## Number of chains = 3 
## Sample size per chain = 20000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                       Mean       SD  Naive SE Time-series SE
## base             -4.352283 0.046196 1.886e-04      3.881e-04
## beta[1]           0.015511 0.041380 1.689e-04      2.231e-04
## beta[2]          -0.098615 0.046482 1.898e-04      2.567e-04
## beta[3]           0.024946 0.053661 2.191e-04      2.895e-04
## random.effect[2]  0.017145 0.036939 1.508e-04      3.391e-04
## sd                0.005515 0.000442 1.804e-06      3.175e-06
## sd_rando          0.038791 0.027032 1.104e-04      2.872e-04
## 
## 2. Quantiles for each variable:
## 
##                       2.5%       25%       50%       75%     97.5%
## base             -4.443202 -4.383023 -4.351967 -4.321532 -4.262402
## beta[1]          -0.065471 -0.012266  0.015376  0.043223  0.097255
## beta[2]          -0.189809 -0.129738 -0.098317 -0.067539 -0.007254
## beta[3]          -0.080887 -0.010668  0.024881  0.060789  0.130396
## random.effect[2] -0.045101 -0.004392  0.010123  0.034642  0.107394
## sd                0.004736  0.005206  0.005483  0.005793  0.006464
## sd_rando          0.004787  0.018903  0.032944  0.052308  0.105211

“Goodness of fit”

How well does the model fit the data? As far as I can tell, the model predictions fit the data. If we just look at the individual predictors and their slopes, they seem like a good match. The intercepts look good too. Below are 500 regression lines drawn from the posterior distributions.

base <- extract_bayes("base")
beta1 <- extract_bayes("beta[1]")
beta2 <- extract_bayes("beta[2]")
beta3 <- extract_bayes("beta[3]")
rand1 <- 0#extract_bayes("random.effect[1]")
rand2 <- extract_bayes("random.effect[2]")
sdz <- extract_bayes("sd")

plot(PDR ~ incline, bty="n", pch=16, col="aquamarine3", data=data,
     main="Incline's effect on prey delivery rate", xlab="angle in degrees")
vals <- sample(c(1:length(beta2)), 500)
x <- seq(0,90, 1)
for (v in vals) {
  y <- exp(base[v] + beta2[v]*(x-45)/45 + rand2[v])
  lines(y~x, col=rgb(0, .2, .8, 0.02))
}

plot(PDR ~ actual.weight, bty="n", pch=16, col="aquamarine3", data=data,
     main="Load weight's effect on prey delivery rate", xlab="load weight in grams")
vals <- sample(c(1:length(beta1)), 500)
x <- seq(0, 2.5,.1)
for (v in vals) {
  y <- exp(base[v] + beta1[v]*(x-1.25))
  lines(y~x, col=rgb(0, .2, 0.8, 0.02))
}

Results

The whole shebang!

Let’s see what our full model looks like when plotted against the predictors.

plot(PDR ~ incline, bty="n", pch=16, col="aquamarine3", data=data,
     main="Incline's effect on prey delivery rate \n(including interaction with load weight)", xlab="angle in degrees")
vals <- sample(c(1:length(beta1)), 300)
x <- seq(0,90, 1)

weight.vals <- c(.25, 1.25,   2.25)-1.25
#weight.vals<- -1
for (w in weight.vals) {
  for (v in vals) {
    y <- exp(base[v] +
               beta2[v]*(x-45)/45 +
               beta1[v]*w +
               beta3[v]*(x-45)/45*w)
    
    colz = if (w==-1) rgb(.5,0,.5, 0.05) else if (w==0) rgb(0,.5,.5, 0.05) else if (w==1)  rgb(.5,.5,0, 0.05) else NA
    lines(y~x, col=colz)
  }
}

There seems to be a slight negative slope in this graph. Note how the three weight classes, shown in different color lines, all overlap. We can’t see the differences between them, corroborating the idea that weight doesn’t really have an effect on PDR.

plot(PDR ~ actual.weight, bty="n", pch=16, col="aquamarine3", data=data,
     main="Load weight's effect on prey delivery rate \n(including interaction with platform angle)", xlab="load weight in grams")
vals <- sample(c(1:length(beta1)), 500)
x <- seq(0, 2.5,.1)

angle.vals <- (c(0,45,90)-45)/45
for (a in angle.vals) {
  for (v in vals) {
    y <- exp(base[v] +
               beta1[v]*(x-1.25) +
               beta2[v]*a +
               beta3[v]*(x-1.25)*a)
    
    colz = if (a==-1) rgb(.5,0,.5, 0.05) else if (a==0) rgb(0,.5,.5, 0.05) else if (a==1)  rgb(.5,.5,0, 0.05) else NA
    lines(y~x, col=colz)
  }
}

In this graph, we can see the three angles of inclination, shown in three different colors, mostly separate. Although the slopes of these lines are all essentially zero here (because the x-axis predictor is weight), there seem to be clear differences between the different angles.

Given these graphs, it appears that the load weight and its interaction with incline are probably no different than zero. However, it seems that incline might have a real effect on the prey delivery rate.

Parameter estimates

So, what did we find? Let’s look at the posterior distributions for our intercept and predictors.

par(mar=c(4,0,6,0))
plotPost(base %>% exp(),xlab = expression(beta[0]),main = "intercept, posterior")
plotPost(beta1,xlab = expression(beta[1]),main = "coefficient from load weight, posterior")
plotPost(beta2,xlab = expression(beta[2]),main = "coefficient from incline, posterior")
plotPost(beta3,xlab = expression(beta[3]),main = "interaction term, posterior")

As you can see, incline is the only one of the predictors whose coefficient’s 95% HDI doesn’t straddle zero.

PDR.change <-
  data.full %>% as_tibble() %>% 
  mutate(new.PDR = actual.weight*total.avg.speed/(meda+1),
         diff= PDR-new.PDR) %>%
  filter(actual.weight>2) %>%
  .$diff %>% max()

ROPEs

This is the real issue however: do any of my predictors TRULY affect the prey delivery rate? Kruschke recommends using a “region of practical equivalence” to accept or reject null hypotheses. In this case, I need to define what a region of practical equivalence would be for the PDR.

Although I will actually try out several ROPEs, here is the most “common sense” ROPE, one that I personally conceived: While transporting the largest loads (~2.25g), there are many ants (~15+) engaged with the load at once. Ants will frequently disengage or reengage with the load, without much trouble, as it is carried nestward. Now imagine I add a lazy ant atop the load, and it just sits there, not helping carry any weight. (This does occasionally happen in nature.) This negligible, additional ant decreases the per capita PDR (weight x speed / ant #). Therefore, I will use the maximum decrease in PDR (only among the heaviest loads) that would result from the addition of a single, theoretical lazy ant, as the bounds of my ROPE. Thus, my first ROPE is ± 0.0016069 .

big.a <- (90-45)/45
big.w <- (2.25-1.25)
small.a <- (0-45)/45
small.w <- (0.25-1.25)

plotPost((exp(beta1*big.w+base)-exp(beta1*0+base)), 
         ROPE=c(-PDR.change, PDR.change),
         main="Decision: Accept the Null",
         xlab="PDR difference between 2.25g & 1.25g loads")

plotPost((exp(beta2*big.a+base)-exp(beta2*0+base)), 
         ROPE=c(-PDR.change, PDR.change),
         main="Decision: Undecided",
         xlab="PDR difference between 90° & 45° inclines")

Additionally, we can test if the interaction is real. We look at the model’s estimate of large loads WITH large inclines, versus the intercept.

plotPost(
  exp(beta1*big.w+base+beta2*big.a+beta3*big.a*big.w) - 
    exp(base),
  ROPE=c(-PDR.change, PDR.change),
  main="Decision: Undecided",
  xlab="PDR: intercept vs. large inclines+large loads")

Given these results, it appears that load weight does NOT have an effect, and the effects of incline and its interaction with weight require more data before a sound judgement can be made. However, both of these are close to being fully within the ROPE. I don’t have a ton of data anyway, so things might change with more of it.

However, that was a pretty hand-wavey ROPE. Without much background information or theory, researchers can rely on standard effect size measures. Kruschke suggests using Cohen’s d, and notes that Cohen himself considered a value of 0.2 to be a “small” effect size, at least in the psychological literature. Kruschke recommends halving this and using it as a rope, essentially ±0.1 SD.

plotPost((exp(beta1*big.w+base)-exp(beta1*0+base))/sdz,
         ROPE=c(-.1,.1),
         main="Decision: Undecided",
         xlab="Load weight's effect size")

plotPost((exp(beta2*big.a+base)-exp(beta2*0+base))/sdz,
         ROPE=c(-.1,.1),
         main="Decision: Undecided",
         xlab="Incline's effect size")


plotPost(
  (exp(beta1*big.w+base+beta2*big.a+beta3*big.a*big.w) - 
    exp(base))/sdz,
  ROPE=c(-.1,.1),
  main="Decision: Undecided",
  xlab="Interaction term effect size")

Using this ROPE, the null hypothesis for every predictor can be neither accepted nor rejected, and incline seems to be moving toward being accepted. However, it seems unlikely that standardized effect size “rules of thumb” from the social sciences have the same meaning in myrmecology. But as long as we’re talking nonsense, Kruschke writes that “recent FDA guidance for bioequivalence studies recommends ROPE limits of 0.8 and 1.25 for the ratio of means in the two groups.” If we use THIS for our ROPE, we essentially accept the null (aka. accept “bioequivalence”?) for each predictor.

plotPost((exp(beta1*big.w+base)/exp(beta2*0+base)), 
         ROPE=c(0.8,1.25),
         main="Decision: Accept the Null",
         xlab="PDR ratio between 2.25g & 1.25g loads")

plotPost((exp(beta2*big.a+base)/exp(beta2*0+base)),
         ROPE=c(0.8,1.25),
         main="Decision: Accept the Null",
         xlab="PDR ratio between 90° & 45° inclines")

plotPost(
  (exp(beta1*big.w+base+beta2*big.a+beta3*big.a*big.w)/ 
     exp(base)),
  ROPE=c(0.8,1.25),
  main="Decision: Undecided",
  xlab="PDR ratio: intercept / large inclines+large loads")

Clearly the choice and specification of the ROPE is crucial here. However, I would argue that my orginial ROPE is most rooted in the actual process and most reflective of nature.

Discussion

In general, it seems that the weight of the load being carried has no effect on an ant team’s prey delivery rate. The effect of the platform’s incline is more nebulous. Although its estimated coefficient is below zero, ~75% of the coefficient’s HDI falls within my initial ROPE. The choice of ROPE can certainly “move” the estimate closer or further from non-zero status, but I think it’s most likely that more data needs to be collected. I only have ~33 data points per angle of inclination, which is not a particularly huge dataset.

But regardless of incline’s effect on PDR, the fact that weight does not seem to matter is interesting. Essentially the colony’s “rate of return” per ant is constant across VERY different prey weights. If colony were behaving “optimally,” we might even expect this behavior though. Ro explain, I’ll paraphrase the ideas of one of my more math-y collaborators, Dr. Ted Pavlic:

Every allocation of ants is associated with some sort of utility, which we think is represented by the total PDR into the colony. In other words, there’s some utility function that takes the number of ants allocated to each load as inputs, as in (for \(n\) loads):

       \(U( x_1, x_2, x_3, ..., x_n )\)

If we were to minimize the number of ants working while making sure the colony’s utility is at least above some threshold (think eating enough food to be “full”):

       minimize \(x_1 + x_2 + ... + x_n\) such that \(U( x_1, x_2, x_3, ..., x_n ) \geq U^\ast\)

We would get the result:

       \(\frac{\mathrm{dU} }{\mathrm{d} x_1} = \frac{\mathrm{dU} }{\mathrm{d} x_2} = \frac{\mathrm{dU} }{\mathrm{d} x_3} = . . . = \frac{\mathrm{dU} }{\mathrm{d} x_n}\)

In other words, you “fill up” each load until the marginal return for the NEXT ant is equal to the marginal return you would get anywhere else. This is similar to an ideal free distribution. However, why would you get the same PDR/ant across different colonies? If the ants are working to fulfill a macronutrient constraint \(U^\ast\), and that macronutrient constraint is similar across ant colonies, then it’s possible that the PDR/ant should be similar across colonies.

If a colony is trying to consume above a certain amount of food and it wants to minimize the number of workers engaged in the dangerous job of foraging, you’d excpect to see a constant PDR across loads. And if the amount of food a colony needs is similar to another colony, the PDR would be constant across both colonies’ loads. Both colonies that I used occupied a similar number of small trees in a recently developed suburb; they were likely of a similar age and size, probably needing similar amounts of food, hence similar PDRs. But how could a decentralized colony make such intelligent allocations distributed across a large space? That’s the next question I want to answer.

Anyway, the model itself seems perfect for the data; its estimates and predictions seem very realistic when plotted. The only weak point might be the random effect of colony. Due to a colony functioning as a super-organism, the true n of a social insect research project is often somewhat confusing. Social insect scientists seem to feel compelled to include a random intercept per colony and then act like they can ignore the non-independence of behaviors sampled from few colonies. Yet it seems to me that colony-specific traits would likely affect more than just the intercept. Again, this is a data problem: to get a better idea of these differences, I would need data from much more than two colonies. Such data could also discredit or support the hypothesis that the constant PDR is a result of an optimality strategy. If there were strong colony-level effects across colonies of similar sizes, the above theory might be disproven.

A work by Andrew T Burchill

aburchil@asu.edu

Bayesian Stats Final Report

Andrew Burchill

2020-04-21