Thursday, October 6, 2011

Selection Bias

A story from WWII has been making the rounds on the Internet...

During WWII, statistician Abraham Wald was asked to help the British decide where to add armor to their bombers to make them more robust. He examined the records of planes that had returned to base and noted where the bullet holes were. He recommended adding more armor to the places where there was no damage.

What? Wouldn't you want more armor where the bullets hit? No.

Wald reasoned that gunfire was not fired by snipers and that bullets were equally likely to hit anywhere on a plane. So what could explain why some locations on the planes showed damage and other areas nothing at all? It's an example of selection bias. Since only planes that were able to fly back after being hit were examined, the damaged areas indicated locations that were already strong enough to take a hit and keep flying. The areas that were serious enough to bring down the plane when hit were not recorded because the plane was lost. 

Selection bias is a sneaky error that can creep in to any study. Suppose you test a diet on 1000 people. 700 drop out and never finish. But you get great results with the 300 who stick with it and publish your diet as very successful when followed faithfully. But what about the drop-outs? Why did they drop out? Perhaps the diet made them ill, or failed to work, or made them gain weight. By including the drop-outs in the final results the results change from wildly successful to a 70% chance of failure.

Suppose you call random homes between 9-5 on a weekday for a survey. Is this a random sample? No. You're won't get families with two working parents. Instead the study will bias the unemployed, stay-at-home moms/caretakers, the self-employed, and others who might be at home instead of an office during working hours. To overcome the bias of the home/hours selected, the surveyors need to call back at later hours the homes not reached so that every home is accounted for in the survey with few to no drop-outs.

Or consider that many psychology studies are made with college student subjects who get paid for participating. This is a case of self-selection bias. Those who select themselves are likely to be 18-25, a college student, willing to do odd things for money, etc. Again not a random sample and a difficult problem to fix. Even if you advertise the study in the paper and radio for a broader population, you'll not see successful CEO's & lawyers sign up for minimal wage.  And the study is still haunted by self-selection, the extremely shy or introveted are not likely to apply.

Bottom Line

Always be extra cautious when a study or survey says it used a "random sample". How was the randomness generated? Was there hidden bias in the selection method?

Labels: ,

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home