When Is A Small Sample Really A Small Sample?

Had an interesting argument on HN the other day. People were giving anecdotal evidence about Macbook failure rates, and other people were saying they were insignificant samples. I shared mine about having frinds who had 4 Macbook Pros total (one actually bought a backup because his first turned into such a brick, an experience I’m begging him to blog) and of course the discussion devolved from there into how it was an irrelevant sample size.

Now let’s say 3 of these had to visit the Apple Store for repairs. A sample of 4, at first, seems too small to conclude anything from. However, I remembered just enough from my personal studies of probability to suspect that 3 out of 4 failures was actually quite meaningful. So I did a little digging.

I talked to my friend Matt Matros, who is my go-to guy when I have math problems since he has a degree in it from Yale, and he pointed me at Baye’s Theorem, which is the correct way to solve it. It turns out the odds of observing 3 or more failures in a sample of 4 laptops, if you assumed the laptops failed only 10% of the time, would be on the order of 0.037%.

 apple-logo

What that means, in plain English, is that if you see 3 out of 4 Macbooks fail, then they almost certainly have a much higher failure rate than 10%. In fact, if you had a 50% failure rate, you would still expect to see 3 or more fail only 31.25% of the time.

This isn’t entirely meaningful. Those four laptops were purchased in either 3 or 4 different states (I’m not certain) so it can’t be attributed to shipping errors. If they were all the same model, it could have been just one bad Apple (har har) or even just one bad production run.

Also, all problems are not equal, which has long been a problem with the JD Power IQS, which is often used as a metric for automobile dependability. All of the aforementioned problems required at least 1 trip to the Apple store though. And I think they may have all required at least 2, but that speaks only to the poor quality of the Genius Bar.

Just out of curiosity (and my own mathematical ineptitude) I wrote a quick PHP Monte Carlo simulator to goof around with the numbers, and it pretty much just confirmed Baye’s Theorem exactly. You can snag the source for it here.

And yes, I’m aware I’m an awful programmer.

10 Responses to “When Is A Small Sample Really A Small Sample?”

  1. Gareth Allen Says:

    While your analysis is technically correct, you are making a fundamental mistake. Your decision to post and to run the analysis was based on the fact that you already had evidence that Macbooks suck. It's essentially the same as an individual claiming that a poker site is rigged because he lost with aces ten times in a row. While it may be true, and the chances are abysmally small, it fails to look at the true 'space' of outcomes that would result in a post. For instance, losing with kings ten times in a row, losing with sets ten times in a row, etc (now multiply each space by the number of individuals who obsessively look for such anomalies). In your case, you would probably post somewhere if a high proportion of any product your friends used failed. So if the failure rate for ALL products is 10%, and 20 products fit the category (owned by at least 4 friends) the chance you will post here against at least one product is something like 1-.963^20 = 53%.

    This is a very common fallacy in statistical analysis, and one reason its tough to trust most social sciences' studies. They find a large dataset, throw a few hundred variables together, and see what ends up as significant to a 5% alpha level. By definition, 5% of the variables will be significant even if there is absolutely no relation, so it is easy to find relationships that are significant, simplify the model, and then claim to have found some truth.

    Macbooks probably do suck, but if you want a more random sample, call someone up out of the blue and ask them about their friend's Macbooks. If 3/4 of them have had problems, you can use Baye's Theorem then.

  2. mattmaroon Says:

    The probability of Aces getting cracked 10 out of 10 times is so close to 0 that if it happened to me at a site, I'd probably stop playing there. That's not the same as 10 in a row over a large number of pocket aces though, since a sample of X pocket aces has X-9 consecutive groups of 10 contained in it. 100 pocket aces has 91 groups of 10 (1-10, 2-11…91-100). That's what's happening when people cry foul on poker sites (and even then, their standards for calling shenanigans are usually far lower than 10 aces in a row getting cracked). I probably saw something like 2000 pocket aces a year. I'm too lazy to figure out what the odds were that 10 in a row got cracked in there are, but it's probably substantial. Especially when, as you say, I include kings, sets on the flop, etc., as those people do.

    My Apple sample is akin to the 10/10 though. I wasn't looking for product failures, we were talking about Macbooks being bricked, and I realized that I only knew the status of 4 of them definitively, and 3 of them had needed repairs shortly after purchase. If I put the purchases of hundreds or thousands of them in chronological order, then found 4 in a row that had failed, it wouldn't mean much (like the Aces example). If I have a sample of only 4, and 3 failed, that means quite a bit more.

  3. Gareth Allen Says:

    If you were talking about Macbooks and THEN realized that 3/4 of the ones you knew of were broken, I think that makes it alright (which sounds like the case for you). But if you only discussed it because you had this evidence, then its not a random sample, especially on an online forum, since 1. perhaps 50 people were reading, and only the few that had data like yours chime in, 2. you read discussions about many products and might not chime in unless you had data like this.

    However, if you posted here every time you had data like this about a product, we couldn't infer much from it as readers (and certainly not to 3.7%), since you have knowledge of a lot of products. It's sort of a survivorship bias.

  4. mattmaroon Says:

    What about if I google for “broken Macbook” and the top 8 links all involve one? :)

  5. Shalmanese Says:

    Do you really only have 4 friends with macbooks? Or are those simply the most memorable 4 precisely because something notable happened to them.

  6. seconded.

    the fact that you did this analysis *because* you saw a high failure rate invalidates the quality of your sample.

    to look at it another way: think of how many people there are out there who have 4 friends who own macbooks: lots. now, what is the probability that not one of them knows 4 macbook owners, of which 3 are faulty: very very low. it had to happen to someone…

    to look at it one more way: what is the probability that a lottery winner is rich? does that mean that everyone is rich?

    tom saffell

  7. mattmaroon Says:

    They're the only 4 who I know what happened to. I have lots of friends with macbooks, and I've heard lots of complaints, but I have no idea whose went to the shop and whose did not.

  8. mattmaroon Says:

    Hmm, the probability that 3 out of 4 Macbooks would fail for anyone, with a normal laptop failure rate of, let's say, 5%, is probably still pretty small. I don't think they sell that much.

  9. Shalmanese Says:

    People whose macbooks have died are more likely to tell you the status of their macbook than those who haven't.

  10. Matt, good work writing that code.

Comments are closed.