30 January 2014

Puzzle time: False positives

This week’s trip is over. Time to leave Florida and start skidding on icy Atlanta roads. Also time for a puzzle.

Many of you are aware that my current job involves catching fraud in online transactions. We of course focus on building systems that can catch maximum amount of fraudulent transactions. However, what you may not know is that an equal challenge in building these systems is making sure that we do not flag the good transactions as fraudulent (and irritate the good customers). This is always a tough balance. This is also called the “false positive” problem. (The test showed “positive” but that is a false result).

Here is a false positive puzzle. A village has only one lab that can perform a particular test for a particular disease. The test, however, is only 98% accurate. So, a patient who does not have the disease will get “you do not have the disease” report 98% time. 2% of the time, it will say (erroneously) say “you have the disease”. Similarly, a patient who indeed is suffering from the disease, will get a “you have a problem” report 98% of the time. The rest 2% time he or she will get a clean chit erroneously.

You also know that 0.5% of the village population has been indeed afflicted by the disease.

Your friend from the village just received a report that he has the disease.

How concerned should he be? What is the real probability that he has the disease?

Posted January 30, 2014 by Rajib Roy in category "Puzzles

1. By Rajib Roy on

Forgot to mention that answers to be sent privately only. But Somnath, that answer should be correct.

2. By Bob Hart on

“There are lies, damn lies and statistics .. in that order” .. I am recalling why I don’t get along with probability π

3. By Narayan Venkatasubramanyan on

a doctor colleague told me that they’d tried this very problem on doctors … and they were almost all wildly wrong! considering the uncertainties in their practice, you’d think they’d know a little rudimentary probability.

i think it is probably even more interesting to ask people who have the training to work this out to estimate the answer before doing the math. i suspect our intuition fails us more than we realize when it comes to dealing with probabilities.

4. By Rajib Roy on

Answer to the puzzle: The answer to the question is 19.76%. In spite of the test having come positive – intuitively we will probably think that the prob. of the friend having the disease is 98% – the real answer is less than 20%!! As Narayan pointed out, our brains often have difficulty grasping probability.

I personally still have a lot of problem with it. In Junior year in school, I was introduced to probability of an event being the fraction of relevant outcomes to all possible outcomes assuming each outcome is equally likely. And that is where it became a circular definition for me. The assumption that each is equally likely presumes the definition of probability which it is trying to define in the first place.

In any case, going back to the problem. Assume the village has 10000 people. Then 50 (.5%) of them are afflicted. So the question is if a test has come positive, what is the chance the person is in that population of 50? (because there are people outside those 50 who will have positive test too since the test is faulty).

How many people can get a positive test? 2% of the 9950 and 98% of the 50 = 199 + 49 = 248. We know for sure that the friend is one of the 248 (his test came positive). In this population only 49 really have the disease. So the probability of him having the disease is 49/248 = 19.76%

5. By Bob Hart on

yeah .. I was right .. ‘in that order’ π I suppose part of the issue here is I’ve never been able to get my head around what this probability means. If I say a coin has a probability of 50% of being tails, I know that means if I flip the coin again, it’ll come up tails, on average, half the time. But this guy having a 20% chance of having the disease .. what does that mean?

6. By Narayan Venkatasubramanyan on

Bob, consider this: assume i came along with a new test that is 99.9% accurate.

as per rajib’s original problem, a patient who does not have the disease will get “you do not have the disease” report 99.9% time. 0.1% of the time, it will say (erroneously) say “you have the disease”. Similarly, a patient who indeed is suffering from the disease, will get a “you have a problem” report 99.9% of the time. The rest 0.1% time he or she will get a clean chit erroneously.

repeating rajib’s explanation with these new numbers, i’d get this: Assume the village has 10000 people. Then 50 (.5%) of them are afflicted. So the question is if a test has come positive, what is the chance the person is in that population of 50? (because there are people outside those 50 who will have positive test too since the test is faulty).

How many people can get a positive test? 0.1% of the 9950 and 99.9% of the 50 = 10 + 50 = 60. We know for sure that the friend is one of the 60 (his test came positive). In this population only 50 really have the disease. So the probability of him having the disease is 50/60 = 83.33%.

how much more would you be willing to pay for my more accurate test? assume the cost of the medication were \$x. try again with the possibility that the medicine is highly toxic and damaging to the health of those who don’t have the disease.

7. By Al Blake on

More interesting as a puzzle than having your credit card declined in the Apple Store…I was thinking about you while talking to the fraud prevention guy π

This site uses Akismet to reduce spam. Learn how your comment data is processed.