Author |
Topic: infectious disease (Read 1843 times) |
|
tomas
Newbie
Posts: 8
|
|
infectious disease
« on: Dec 5th, 2008, 8:26am » |
Quote Modify
|
In the town with the population of n people, there is an infectious disease.So far there has been 1% of the residents being infected. Provided that there is a testing centre which can test up to 10^4 samples per day. the centre can test either a combination of blood samples or separate ones however , if there is more than 1 infectious sample in a m-sized sample (n>m>1), the result will be wrong.else, if there is 0 or 1 infectious sample in a m-sized sample, the result will be correct. How many days does it take to find out all the infected people
|
« Last Edit: Dec 5th, 2008, 8:32pm by tomas » |
IP Logged |
|
|
|
teekyman
Full Member
Gender:
Posts: 199
|
|
Re: infectious disease
« Reply #1 on: Dec 5th, 2008, 8:42am » |
Quote Modify
|
Did you mean at least one infected sample makes the test detect the infection, or did you really mean more than 1? Also, can the center process 10^4 people, or samples? (I'm also assuming that n >> 10^4 because we're probably not looking for a naive solution.)
|
|
IP Logged |
|
|
|
tomas
Newbie
Posts: 8
|
|
Re: infectious disease
« Reply #2 on: Dec 5th, 2008, 8:52am » |
Quote Modify
|
if there is only 1 infectious sample in the m-sized sample , then the result will be correct. else , there are > 1 infectious samples , it will be wrong the centre can process a m-sized sample (combination of m people's blood sample, here 1<=m<=10^4) people's blood sample= people (the same ) of course , n >10^4 let's say ,n --> infinity but the strategy is unchanged for a random value of n
|
« Last Edit: Dec 5th, 2008, 8:53am by tomas » |
IP Logged |
|
|
|
teekyman
Full Member
Gender:
Posts: 199
|
|
Re: infectious disease
« Reply #3 on: Dec 5th, 2008, 10:08am » |
Quote Modify
|
Well if having two or more infected people donate samples causes the test to give a false negative infection, then that will complicate things indeed. In this situation since we know that a disease-positive test tells us that there is exactly one person in the group with the disease, but a disease negative test tells that either 0 or more than 2 people have the disease, we really only get knowledge out of disease positive tests. Say we test with groups of size k. (assume for now that k = 100 so that some calculations become much easier) Then if we get a positive result, which happens with probability .01k*e^-.01k = e ^ -1 we can use binary search to isolate the single person with the disease in that group of size k which will take a total of log(100) (base 2) + 1 (about steps including the initial test. So doing this once through the entire population (dividing them into batches of size k), will take 8 * n / 100 steps, or 2/25 * n. We will successfully determine whether e^-1 * n are infected or not on average, and the rest we won't know anything about. So after doing this, we are left with (on average) n * (1 - e^-1) people who tested negative for the disease, but still may be infected. Among the people that we were able to verify, 1/100 people are diseased, so the same ratio of infected people are left among the general population. So we do the exact same thing, choosing 100 random people per sample again. This time, we can reduce to n*(1-e^-1)^2 people left after an additional 8 * (1-e^-1) * n / 100 steps. We quickly see that the total number of samples (if we choose k people at each step) needed is an infinite geometric series, which converges to n * 2e / 25, or about .217n. Now I don't think that choosing k = 100 is the best sample size per group. However, choosing different sample sizes will alter the proportion of the remaining population with the disease, and I am too lazy to do all that calculation right now, especially since the sample size k will probably need to change as the percentage of the population that is diseased changes. I'm not sure im understanding the operating practices of the hospital though; I am right in saying that the hospital takes the same effort to process one sample of any size (even if it's just one person)?
|
« Last Edit: Dec 5th, 2008, 10:09am by teekyman » |
IP Logged |
|
|
|
tomas
Newbie
Posts: 8
|
|
Re: infectious disease
« Reply #4 on: Dec 5th, 2008, 5:01pm » |
Quote Modify
|
sorry for my ambiguous expression. but the result is positive for either 1 infected sample or 0 infected sample in a k-sized sample ( in the first reply i assumed that u will understand that 0 infected sample always make the result correct). Quote:I am right in saying that the hospital takes the same effort to process one sample of any size (even if it's just one person)? |
| ---> yes, u are right one hint :even n --> infinity or a random large value of n gives us the same strategy and result , i mean the solution is independent of n . anyways, nice try
|
« Last Edit: Dec 5th, 2008, 5:06pm by tomas » |
IP Logged |
|
|
|
tomas
Newbie
Posts: 8
|
|
Re: infectious disease
« Reply #5 on: Dec 6th, 2008, 7:14am » |
Quote Modify
|
noone help me ?
|
|
IP Logged |
|
|
|
rmsgrey
Uberpuzzler
Gender:
Posts: 2873
|
|
Re: infectious disease
« Reply #6 on: Dec 6th, 2008, 10:13am » |
Quote Modify
|
For clarification, what is the result of the test if: a) there are 0 infected people b) there is exactly 1 infected person c) there are 2 or more infected people in the sample?
|
|
IP Logged |
|
|
|
tomas
Newbie
Posts: 8
|
|
Re: infectious disease
« Reply #7 on: Dec 6th, 2008, 5:58pm » |
Quote Modify
|
a)the sample is positive b)the sample is negative here, if it is a sample of size 1 , u will know the infected person if it is a sample of size m (m>=1), it will not inform the name of the infected person, only say "negative" c) the sample is negative
|
|
IP Logged |
|
|
|
rmsgrey
Uberpuzzler
Gender:
Posts: 2873
|
|
Re: infectious disease
« Reply #8 on: Dec 7th, 2008, 9:09am » |
Quote Modify
|
In other words, the test tells you whether there is at least one infected person in the group tested rather than telling you whether there is exactly one (which is how people were reading the earlier statements)
|
|
IP Logged |
|
|
|
tomas
Newbie
Posts: 8
|
|
Re: infectious disease
« Reply #9 on: Dec 7th, 2008, 7:29pm » |
Quote Modify
|
sorry for my ambiguous expression in early posts. in short , what u can get from the result is : a/ positive : no infected person in the sample b/ negative : at least 1 infected person in the sample
|
|
IP Logged |
|
|
|
hawkxor
Newbie
Gender:
Posts: 5
|
|
Re: infectious disease
« Reply #10 on: Dec 7th, 2008, 9:48pm » |
Quote Modify
|
tomas, you are still ambiguous about the abilities of the testing center. is it correct that the center can process 10^4 samples each day, even if each of those samples consists of the blood of 1000 unique people?
|
|
IP Logged |
|
|
|
teekyman
Full Member
Gender:
Posts: 199
|
|
Re: infectious disease
« Reply #11 on: Dec 8th, 2008, 5:05am » |
Quote Modify
|
Just dealing with the total number of samples needed, (assuming nothing too funny happens with the processing of samples in the hospital once that gets cleared up), then we only gain information about the infected status of an individual person when either: that person's sampled in a batch of people that are tested positive OR that person is tested individually negative. To avoid testing everyone individually, our only hope to be faster is to perform batch tests of people and hope to clear large groups of people as free of disease at once before we resort to individual testing among a more concentrated pool. If we test a batch of k people when the concentration of infected people is p, then the probability that they are cleared is (1-p)^k. the expected number cleared is k * (1-p) ^ k if k > 1. With a little calculus and algebra, we see this is maximized when k = -1/ln(1-p) ~ 1/p for small p. k as a function of p is a decreasing, concave up function until p = 1 - e^-2 (about .864), which tells us that greedily trying to maximize k until this point is probably is the right thing to do. After p = .3 or so, that expected value drops to below 1, and we want to start testing people individually anyways. So the idea is to repeatedly: 1) Calculate the proportion p of remaining people in the population that are infected. 2) if this is below p = 1 - e^(-e^-1) (about .3) then test a population of size about k = -1/ln(1-p). If its above that value, then test single individuals. 3) If the group has at least one infected person, throw them back in the pool. If they are clear, set them aside. 3) repeat until everyone is tested. I'm sure optimizations can be made with respect to rounding, and perhaps even with what to do with the people who test positive. As for the total number of samples needed for this method: a lot. This will be a little annoying to calculate at best, and I will do it later, provided I am able to.
|
« Last Edit: Dec 8th, 2008, 5:07am by teekyman » |
IP Logged |
|
|
|
rmsgrey
Uberpuzzler
Gender:
Posts: 2873
|
|
Re: infectious disease
« Reply #12 on: Dec 8th, 2008, 12:16pm » |
Quote Modify
|
A naive solution (which I'm not even going to bother working out expected runtimes for): For any infected sample, divide into two equal samples and test each separately. This will find k infected people in an initial sample of n in less than 2k log2n tests
|
|
IP Logged |
|
|
|
River Phoenix
Junior Member
Gender:
Posts: 125
|
|
Re: infectious disease
« Reply #13 on: Dec 8th, 2008, 3:08pm » |
Quote Modify
|
on Dec 8th, 2008, 12:16pm, rmsgrey wrote:A naive solution (which I'm not even going to bother working out expected runtimes for): For any infected sample, divide into two equal samples and test each separately. This will find k infected people in an initial sample of n in less than 2k log2n tests |
| Suppose that we know that at least 1 from a sample of n people is infected: Split in half, test each group. What is the probability that one of the groups is clean? I believe it is simply 2*0.99^(n/2). Then, the problem reduces to 1 instance of the the n/2 case. Otherwise, it reduces to 2 instances. Let f(n) be the number of trials required for a sample of size n. Then f(2n) = (1 - 0.99^n) * 2 f(n) Clearly f(2) = 2. It follows that: f(2^n) = 2^n [1-0.99^2][1-0.99^4]...[1-0.99^(2^(n-1))] = 2^n (1-0.99)^(n-1) [1+0.99][1+0.99^2]...[1+0.99^(2^(n-2))] I believe that (1+a)(1+a^2)... converges, hence: f(2^n) < C (0.02)^n f(n) < C (0.02)^log2n So the probabilistic order of growth seems like it is actually very good.
|
« Last Edit: Dec 8th, 2008, 4:49pm by River Phoenix » |
IP Logged |
|
|
|
tomas
Newbie
Posts: 8
|
|
Re: infectious disease
« Reply #14 on: Dec 8th, 2008, 10:19pm » |
Quote Modify
|
rmsgrey and Phoenix are on the right track .However, so far no answer is correct. You all make this one seem more complicated than it must be... Let solve this extracted version then you can find the solution for the original one: there are 100 people in the town and k infected people among them (1<=k<=100) prove that : the number of tests required <=7k (sorry if i made a mistake again, cause this is the first time i created a puzzle >_<)
|
|
IP Logged |
|
|
|
River Phoenix
Junior Member
Gender:
Posts: 125
|
|
Re: infectious disease
« Reply #15 on: Dec 8th, 2008, 11:52pm » |
Quote Modify
|
My calculation must be wrong by the way, the end equation is decreasingb
|
|
IP Logged |
|
|
|
towr
wu::riddles Moderator Uberpuzzler
Some people are average, some are just mean.
Gender:
Posts: 13730
|
|
Re: infectious disease
« Reply #16 on: Dec 9th, 2008, 1:34am » |
Quote Modify
|
on Dec 8th, 2008, 10:19pm, tomas wrote:Let solve this extracted version then you can find the solution for the original one: there are 100 people in the town and k infected people among them (1<=k<=100) prove that : the number of tests required <=7k |
| Do you know k to start with? log2 C(100,k) 7k
|
« Last Edit: Dec 9th, 2008, 1:34am by towr » |
IP Logged |
Wikipedia, Google, Mathworld, Integer sequence DB
|
|
|
tomas
Newbie
Posts: 8
|
|
Re: infectious disease
« Reply #17 on: Dec 9th, 2008, 2:04am » |
Quote Modify
|
Towr, of course ur inequation is correct but let's try that with k=1 [log2C(1,100)]=6 pls tell me a strategy that work with 6 times to find out that 1 infected person ? hidden: | ( i think the optimal solution is 7 here ) | denote the times required =f 7k is just a general, loose sup value ex : k=1, f=7 k=2, f=13(not 14)
|
|
IP Logged |
|
|
|
towr
wu::riddles Moderator Uberpuzzler
Some people are average, some are just mean.
Gender:
Posts: 13730
|
|
Re: infectious disease
« Reply #18 on: Dec 9th, 2008, 2:12am » |
Quote Modify
|
on Dec 9th, 2008, 2:04am, tomas wrote:Towr, of course ur inequation is correct but let's try that with k=1 [log2C(1,100)]=6 |
| I get 7. x means you round x up. ~6.6 rounds up to 7. Quote:hidden: | ( i think the optimal solution is 7 here ) | |
| indeed it is. However, the key question is, do we know k beforehand, or do we need to find it, also?
|
« Last Edit: Dec 9th, 2008, 2:13am by towr » |
IP Logged |
Wikipedia, Google, Mathworld, Integer sequence DB
|
|
|
River Phoenix
Junior Member
Gender:
Posts: 125
|
|
Re: infectious disease
« Reply #19 on: Dec 9th, 2008, 3:59am » |
Quote Modify
|
on Dec 8th, 2008, 3:08pm, River Phoenix wrote: Suppose that we know that at least 1 from a sample of n people is infected: Split in half, test each group. What is the probability that one of the groups is clean? I believe it is simply 2*0.99^(n/2). Then, the problem reduces to 1 instance of the the n/2 case. Otherwise, it reduces to 2 instances. Let f(n) be the number of trials required for a sample of size n. Then f(2n) = (1 - 0.99^n) * 2 f(n) Clearly f(2) = 2. It follows that: |
| Actually (1-0.99^n) goes to 1, so if you just take it to be the case that f(2n) < 2 f(n), then f(n) < n. Which is obvious.. so I think this whole strategy is a deadend. I think the reason is that in the steady state as n-> infinity and you start subdividing, both halves always have at least one infected person, so it's no better order of growth wise than checking all people separately. As towr notes, you can do it in log2C(n,k) given that you know k (via binary search over all possible sets of k). There is an obvious duality between this and the strategy I analyzed which ends up taking n steps: the sum of log2C(n,k) over all k equals n. Since they both use binary search, they are essentially the same strategy. The cleverness is to separate out the instances by k, but in sum we still haven't really beaten the naive strategy of checking each person separately. This is not that insightful but I think interesting.
|
« Last Edit: Dec 9th, 2008, 4:01am by River Phoenix » |
IP Logged |
|
|
|
|