Lmvsanova

1 minute read

Published: January 18, 2015

Power of different linear models.

If you want to test for a change over a variable we can consider some different ways to do it.

We can collect the data so that it is spread out along the x axis or clumped at either end. We can analyse the x axis as a continuous variable or a discrete variable (binning the x if it is spread out.) These four options are displayed below with boxplots implying the data is analysed as discrete x values. Note that only two sets of data are simulated, one clumped and one spread out.

So to examine the power of these approaches here’s a function that simulates some data (from a linear model with normal error) and then calculates and extracts p-values (sorry) for the four cases shown above.

calcLM <- function(){

x1 <- runif(30)
y1 <- x1 * 2 + rnorm(30)
x2 <- rep(c(0,1), 15)
y2 <- x2 * 2 + rnorm(30)

coef <- c( summary(lm(y1 ~ x1))$coef[8],
           summary(lm(y2 ~ x2))$coef[8],
           summary(lm(y1 ~ x1 > 0.5))$coef[8],
           summary(lm(y2 ~ as.factor(x2)))$coef[8]
          )
}

Then let’s run the simulation 1000 times.

p <- t(replicate(1000, calcLM())) %>% data.frame

colnames(p) <- c('continuousSpread', 'continuousClumped', 'discreteSpread', 'discreteClumped')

pLong <- melt(p, variable.name = 'model', value.name = 'p')

And plot the results.

ggplot(pLong, aes(x = model, y = p)) +
  scale_y_log10() +
  geom_violin() +
  ggtitle('p values from different models.')

plot of chunk simPlot

So, using data from the edges of our range of x values gives us more power (lower p-values). Also, it’s interesting to note that doing a discrete ANOVA with the x values as a factor is identical to treating this as a continuous linear model. This will not be true if you have more than two groups though. Furthermore, if you actually want to use this as a linear model, you will have to do an extra step to scale the coefficients if you do an ANOVA rather than a linear model.

So… that was kinda fun. And another chance to get to know ggplot2 better. Some code is suppressed here but you can see the full knitr document here.

Twitter Facebook LinkedIn

Tim CD Lucas

Lmvsanova

What_size_predictive_intervals_should_we_use

How_people_trick_themselves_into_thinking_they_can_predict_the_stock_market

Deep_learning_wont_improve_malaria_mapping

Disaggregation_regression_workshop