wu :: forums (http://www.ocf.berkeley.edu/~wwu/cgi-bin/yabb/YaBB.cgi)
riddles >> easy >> Am I sick?
(Message started by: harpanet on Jan 21st, 2005, 11:18am)

Title: Am I sick?
Post by harpanet on Jan 21st, 2005, 11:18am
A particular medical condition occurs in 1% of the population.
A test for the condition is known to be 99% accurate (i.e. 99% of the afflicted are identified and 99% of the healthy are cleared).

I have the test and get a positive result (I have the condition).

What is the probability that I actually have the condition?

Title: Re: Am I sick?
Post by TenaliRaman on Jan 21st, 2005, 1:14pm
hmm i am getting ::[hide]50%[/hide]::  :o
what are the chances i am sleep-typing? :-/

-- AI

Title: Re: Am I sick?
Post by towr on Jan 21st, 2005, 1:31pm
::[hide]

Let's take 10000 (representative) people.
1% = 100 of these are afflicted with the disease.
Of those 100, 99% = 99 test positive.
Of the 9900 that aren't afflicted, 1% = 99 test positive.

So among those who test positive (198), 50% are afflicted, and 50% aren't.

[/hide]::

Title: Re: Am I sick?
Post by Sir Col on Jan 24th, 2005, 1:47pm
Nice approach, towr.

Cucumbers and potatoes come to mind. ;)

Using the laws of probability...
::[hide]
P(infected AND [test positive|infected]) =  0.01*0.99 = 0.0099
P(not infected AND [test positive|not infected) = 0.99*0.01 = 0.0099

Therefore, P(test positive) = 0.0198.

Hence P(infected|test positive) = 0.0099/0.0198 = 0.5
[/hide]::

As if the original problem doesn't challenge our intuition enough, here is a surprising follow-up...

If 1% of the population are infected and the test only identifies infected people 1% of the time, how accurate is the test?

Title: Re: Am I sick?
Post by harpanet on Jan 24th, 2005, 2:42pm
I came across this one in a book called "information - the new language of science", during its discussion of Thomas Bayes. I find it astounding how counter-intuitive the answer is. If anything tells us to be wary of statistics, this is it.

I think the aphorism should be amended to "Statistics, damned statistics, and lies".


Title: Re: Am I sick?
Post by rmsgrey on Jan 24th, 2005, 3:00pm

on 01/24/05 at 13:47:16, Sir Col wrote:
If 1% of the population are infected and the test only identifies infected people 1% of the time, how accurate is the test?

When you say it "identifies infected people 1% of the time", doyou mean that: "1% of people who take the test are correctly identified as infected", "1% of people who take the test are identified as infected regardless of actual status", "1% of infected people are identified as infected" or something else?

Title: Re: Am I sick?
Post by Noke Lieu on Jan 24th, 2005, 3:10pm
okay- what trap have I fallen into? Maybe have misinterpreted the question. You say that it goes against intuition...

taking towr's approach, and muddying it, probably...

10000 population, 100 ill. of those 100, test identifies 1% as ill. so it is only 1% of 1%? Rubbish- that's what i'd expect.

so must look for another catch...
10000 population. 100 are ill, 9900 are well. test gives 1 true positive and 99 false positives. so that's 100 positives, 100 ill people- does it matter that they aren't the same people?
if not, that means that its 100% accurate, no?

Title: Re: Am I sick?
Post by Sir Col on Jan 24th, 2005, 4:23pm
I believe I have stated it correctly.  ???

Although the wording is subtle, what I've done is to turn the problem on its head. To put it another way: I've told you that P(infected|test positive) = 0.01; this was the value you were asked to find in the original problem. This time you need to find P(test positive|infected); the original problem gave you this as 99%.

Title: Re: Am I sick?
Post by rmsgrey on Jan 25th, 2005, 6:38am

on 01/24/05 at 16:23:58, Sir Col wrote:
I believe I have stated it correctly.  ???

Although the wording is subtle, what I've done is to turn the problem on its head. To put it another way: I've told you that P(infected|test positive) = 0.01; this was the value you were asked to find in the original problem. This time you need to find P(test positive|infected); the original problem gave you this as 99%.

In that case:
::[hide]
Using Towr's approach with a population of 10000:
100 people are infected.
If n of those 100 test positive, and 99*n of the 9900 uninfected also test positive, then 1% of them are infected, and the required probability is n%. In other words, you can find any probability you want - with the bonus conclusion that the probability of testing positive is independent of your being infected or not (in other words, the test tells you nothing about the sickness)
[/hide]::
This doesn't challenge my intuition at all.

Title: Re: Am I sick?
Post by Sir Col on Jan 25th, 2005, 10:57am
But you don't know how accurate the test is: that is what you're trying to find. So you cannot assume that 99% (the value in the original problem) of unaffected will test positive.

Title: Re: Am I sick?
Post by rmsgrey on Jan 26th, 2005, 8:40am

on 01/25/05 at 10:57:04, Sir Col wrote:
But you don't know how accurate the test is: that is what you're trying to find. So you cannot assume that 99% (the value in the original problem) of unaffected will test positive.

I'm not assuming that. I'm assuming that, given that 1% of those who test positive are infected, 99 times as many people who test positive are uninfected. I believe I've shown that the required value (P(test positive|infected)) can take any value from 0 to 100%.

The follow-up problem statement, as I understand it, is: "Given P(infected)=0.01 and P(infected|test positive)=0.01, find P(test positive|infected)." As it stands, there is insufficient information touniquely specify the value of the desired quantity.

Title: Re: Am I sick?
Post by Sir Col on Jan 26th, 2005, 11:43am
I'll post my intended solution then; I've obviously made a mistake in my reasoning...

P(infected) = 0.01
We are trying to find P(test positive|infected), so let this be equal to p.

P(infected AND [test positive|infected]) =  0.01p
P(not infected AND [test positive|not infected) = 0.99(1-p) = 0.99-0.99p

Therefore, P(test positive) = 0.99-0.98p

Hence P(infected|test positive) = 0.01p/(0.99-0.98p) = x

In the original problem, we were given that p=0.99, and this evaluated to give x=0.5. However, in my variation I am telling you that x=0.01.

Rearranging we get, p = 99x/(98x+1) = 0.5; that is, if the test is 50% accurate it only identifies infected people 1% of the time.

Title: Re: Am I sick?
Post by towr on Jan 26th, 2005, 2:11pm

on 01/26/05 at 11:43:34, Sir Col wrote:
P(infected AND [test positive|infected]) =  0.01p
P(not infected AND [test positive|not infected) = 0.99(1-p) = 0.99-0.99p

P(test positive|infected)  != 1 - P(test positive|not infected)

P(A&B)/P(B) != 1 - P(A&~B)/P(~B)
Just suppose A is independent of B and always the case, then you'd get 1=1-1

P(A&B)/P(B) = (P(A) - P(A&~B))/(1-P(~B))

Title: Re: Am I sick?
Post by Sir Col on Jan 26th, 2005, 3:22pm
Hmm?  ???

I'm using the assumption from the original problem that 100p=P% accuracy of the test implies that P% of infected will test positive and P% of uninfected will not test positive.

Hence P(test positive|infected) = P(test negative|not infected) = p, so P(test positive|not infected)=1-p.

To obtain P(test positive), I've simply added P(infected AND [test positive|infected]) and P(not infected AND [test positive|not infected) = 0.01p+0.99-0.99p = 0.99-0.98p.

As this probability of obtaining a positive test is made up of two probabilities: 0.01p (infected) and 0.99-0.99p (not infected), it follows that,
P(infected|test positive) = 0.01p/(0.99-0.98p).

Title: Re: Am I sick?
Post by towr on Jan 27th, 2005, 1:49am
P(infected) = 0.01
P(not infected) = 0.99

P(test positive|infected) = p
P(test negative|infected) = 1-p

P(test negative|not infected) = p
P(test positive|not infected) = 1-p

P(infected|test positive) = 0.01
P(not infected|test positive) = 0.99

P(A|B)=P(B|A)P(A)/P(B)

P(test positive|infected) = P(infected|test positive) * P(test positive)/P(infected)
p = 0.01 * P(test positive)/0.01
p = P(test positive)

Anyway,

P(infected) = 0.01 and P(infected|test positive) = 0.01  
implies P(infected) and P(test positive) are independent.

Which is what rmsgrey already said before.

Title: Re: Am I sick?
Post by Sir Col on Jan 27th, 2005, 7:01am
As P(A|B) = (A[cap]B)/P(B), I would write P(infected|test positive) = P(infected [cap] test positive)/P(test positive) = 0.01p/(0.99-0.98p).

I agree with your approach entirely, but you're missing one important fact: we can express P(test positive) in terms of p. So...
p = P(test positive) = 0.99-0.98p

Therefore, 1.98p = 0.99, and still get the result I suggested: p = 0.5.

Title: Re: Am I sick?
Post by rmsgrey on Jan 27th, 2005, 8:17am

on 01/26/05 at 15:22:54, Sir Col wrote:
I'm using the assumption from the original problem that 100p=P% accuracy of the test implies that P% of infected will test positive and P% of uninfected will not test positive.

Which would be the missing piece of information.

Given the intuitively appealing result that, since the result of the test being positive gives you no new information about your chance of being infected (it's still 1%), the probability of testing positive must be the same whether you're infected or not, you don't need much calculation to reach an answer when you're told that, in addition, the probability of testing negative when uninfected is the same as the probability of testing positive when infected...

P(+|I)=P(+|U)   (from my earlier working)
P(+|I)=P(-|U)    (given)
Therefore: P(+|U)=P(-|U)
But: P(+|U)+P(-|U)=1    (the two results are mutually exclusive, and cover all possibilities)

So: P(+|I)=P(+|U)=P(-|I)=P(-|U)=0.5

Title: Re: Am I sick?
Post by towr on Jan 27th, 2005, 12:41pm

on 01/27/05 at 07:01:30, Sir Col wrote:
As P(A|B) = (A[cap]B)/P(B), I would write P(infected|test positive) = P(infected [cap] test positive)/P(test positive) = 0.01p/(0.99-0.98p).

I agree with your approach entirely, but you're missing one important fact: we can express P(test positive) in terms of p. So...
p = P(test positive) = 0.99-0.98p

Therefore, 1.98p = 0.99, and still get the result I suggested: p = 0.5.
You can get there much faster by using the independance of the test and the infection.

P(test positive|infected) = P(test positive) = p
[equiv]
P(test positive|not infected) = P(test positive) = 1-p
so P(test positive) = p = 1-p = 0.5

Title: Re: Am I sick?
Post by Sir Col on Jan 27th, 2005, 1:13pm
Apologies for the ambiguity.

However, like the original problem I personally found it surprising to my intuition: a test which is 99% accurate only identifies a genuinely infected person 50% of the time (the original problem), or its dual, a test that is 50% accurate will positively identify someone who is not infected 99% of the time (my version).

Title: Re: Am I sick?
Post by towr on Jan 27th, 2005, 1:59pm
I don't see why that should be so surprising, most people aren't infected. If you always say a person isn't infected you also get 99% accuracy.



Powered by YaBB 1 Gold - SP 1.4!
Forum software copyright © 2000-2004 Yet another Bulletin Board