block by mimno 11482432

11482432

Full Screen

We just sampled two groups of points (red and blue), and fit a regression line to each one.

How much confidence do we have in these fitted linear models? This page shows three variations of randomization tests that show us what regression lines from “similar” datasets would look like. The buttons at the top will sample a random, but similar, dataset in one of three ways. We will then fit regression lines for the randomized dataset, and then go back to the original data.

Comparing the “real” line to these replicated lines can tell us whether the original line tells us something interesting about the dataset, or if it’s just fitting random noise. The numbers at the top will tell us how many of the replicated models have had a greater slope than the original model of the same color.

  1. Bootstrap. How sure are we of the slope of the lines? This test samples with replacement from the original data. Some data points may appear several times, others not at all. If the slope of the line depends a lot on a small number of outliers, we should see lots of variation in the replicated lines.

  2. Permuting y. Is the slope of the lines significant? This test randomly shuffles the y values of the data points, leaving the x values unchanged. Since there’s no connection between x and y, we should expect the regression line to be flat. But for a small sample, we might get a steeper slope by chance. Does the slope of our original model lie within the range of slopes that we get by random chance?

  3. Permuting class. Are the blue points really different from the red points? This test keeps the points in the same x,y positions, but randomly swaps their class, so some blue points become red, and vice versa. If the two classes are really different, we should see regression lines that are close to each other and somewhere in the middle of the original regression lines.

Copy this gist and try modifying the parameters used to generate the two classes of points. What happens to each test if the two classes have different sample sizes? Try varying the difference in slope and standard deviation (which I’m calling error).

index.html