wu :: forums - Print Page


    
      
        wu :: forums
        (http://www.ocf.berkeley.edu/~wwu/cgi-bin/yabb/YaBB.cgi)
      

        riddles >> medium >> Probability of probability
        
(Message started by: jollytall on Apr 26^th, 2008, 1:06am)

Title: Probability of probability
Post by jollytall on Apr 26^th, 2008, 1:06am

Thinking on some statistical confidence level questions, I came across a seemingly easy question, but I got stuck on it. Probably it can be found in a stat book, but I was too lazy to look it up.

The starting point is that we have a hat with only red and blue balls in it. The total number of balls is known (n), but not the distribution of the colours. We draw balls and place them back immediately.

The questions:
Q1: What is the probability (p1) of the first ball to be red? (Well this is easy, since we have zero information, it is p1=0.5).
Q2: The first ball was red. What is the probability (p2) that the second ball will also be red? (If n=1, then p2=1, and as n grows, P2 goes down to I guess 0.5, but how?).
Q3: We had already drawn j balls and from the j balls k were red. What is the probability (p3) that the next one will be red? p3(n, j, k)?
Q4: The same as Q3, but the question is: What is the probability (p4) that the probability of the next red is more that q4? If I think right that Q3 is a special case of Q4, where p4=p3 means q4=0.5 but not sure.

Two games to explain Q3 and Q4 (especially Q4, since even the question seems complicated).

Game 1 for Q3.
1: A machine fills up the hat with n balls.
2. The players draw j balls and get k red ones.
The two players create a pot of money (D dollars) with the following rule:
3. Player A decides how much from the D should be paid by the “red” player and the rest by the “blue” player.
4. Player B decides whether he wants to be “red” or “blue”.
5. They draw the next ball.
The question is: What pot allocation should Player A choose so to Player B has no winning strategy (irrelevant whether he chooses red or blue). With other words, what are the fair odds?

Game 2 for Q4.
1.-2. Same as above.
3. Player A chooses a red ball ratio (q4) at his own discretion (not intended to reflect the probable allocation of red and blue balls).
They create again a pot of money.
4. Player B decides how much from the D should be played by the player who wins if the ratio of red balls is equal to or more than q4.
5. Player A decides whether he wants to be the “equal or more” player or the “less than” player.
6. They count the balls in the hat.
The first question is: For a given q4, what is the fair pot allocation, i.e. Player A’s choice does not matter.
The second question is: If Player A is very good in math, but Player B is less, than what q4 should Player A choose, as to maximise his profit in case Player B deviates a bit from the optimal pot allocation? With other words: at which q4 is the dp4/dq4 the highest.

And a final question to both games. How much does it matter (a) if we know that the machine generates first a random number between 0 and n and fills the hat accordingly (even distribution) or (b) if we know that it fills the hat with n randomly chosen balls?

Title: Re: Probability of probability
Post by FiBsTeR on Apr 26^th, 2008, 8:13am

A few guesses, though I'm no expert:

on 04/26/08 at 01:06:45, jollytall wrote:

Q2: The first ball was red. What is the probability (p2) that the second ball will also be red? (If n=1, then p2=1, and as n grows, P2 goes down to I guess 0.5, but how?).

If every distribution is equally likely, then we now know that 1 ball is red, and each of the other n-1 balls are equally likely to be red or blue. So the chance of drawing a red ball is now:

(1+0.5(n-1))/n = 0.5(n+1)/n

on 04/26/08 at 01:06:45, jollytall wrote:

Q3: We had already drawn j balls and from the j balls k were red. What is the probability (p3) that the next one will be red? p3(n, j, k)?

Similar to above, we know that k balls are red and j-k balls are blue, and the other n-j-k balls are equally likely to be red or blue. So the chance of picking a red ball is now:

(k+0.5(n-j-k))/n = 0.5(n+k-j)/n

Title: Re: Probability of probability
Post by jollytall on Apr 26^th, 2008, 10:27am

For Q2 I agree if the hat is filled up with a random generator choosing red and blue balls (method "b"). I got the same results for smaller n-s on a different way. Btw. in this case not all distributions are equally likely, distributions with about n/2 reds are more likely than others.
If method "a" is used then I have a different result.

For Q3 I don't think it is right.
It might be a misunderstanding, but as I said every ball is replaced immediately, so having k red already, does not mean that there are k reds (actually k>n is also allowed).
There is probably also a formula error, taking j=1, k=1 it should give the Q2 formula back, but it does not.

Title: Re: Probability of probability
Post by towr on Apr 26^th, 2008, 2:29pm

For 3, consider the different cases

For example for j=2 and both selections where the same colour, then you have:
1/n chance that you got the same ball twice
(n-1)/n chance that you got two different balls with the same colour

For each case seperately you can find the probability the next one will be red, and then sum them together weighted for the probability of the different cases.

I think in the end you'll need to do something along the lines of http://en.wikipedia.org/wiki/Bayesian_inference But this problem is a bit more complicated than the examples there, and it's late here..

Title: Re: Probability of probability
Post by jollytall on Apr 27^th, 2008, 2:27am

Thanks for the link. Very useful.

Title: Re: Probability of probability
Post by towr on Apr 27^th, 2008, 7:54am

on 04/26/08 at 01:06:45, jollytall wrote:

And a final question to both games. How much does it matter (a) if we know that the machine generates first a random number between 0 and n and fills the hat accordingly (even distribution) or (b) if we know that it fills the hat with n randomly chosen balls?

Well, in case b) you get a binomial distribution, the number of red and blue balls are very likely to be roughly the same.
For example if you have 3 balls
a) p(3r)=1/4 , p(2r1b)=1/4, p(1r2b)=1/4, p(3b)=1/4
b) p(3r)=1/8 , p(2r1b)=3/8, p(1r2b)=3/8, p(3b)=1/8

Because these a priori chances vary much more, you'll need much stronger evidence to make 3r or 3b more likely than the other two options.

Title: Re: Probability of probability
Post by towr on Apr 27^th, 2008, 9:21am

For Q3)

P(red | k_in_j_red) = http://www.ocf.berkeley.edu/~wwu/YaBBImages/symbols/sum.gif_#reds=0..n P(red | #reds) P (#reds | k_in_j_red)
P(red | #reds) = #reds/n
P (#reds | k_in_j_red) = P (k_in_j_red | #reds) P(#reds)/P(k_in_j_red)
P(k_in_j_red) = http://www.ocf.berkeley.edu/~wwu/YaBBImages/symbols/sum.gif_#reds=0..n P(k_in_j_red|#reds) P(#reds)
P(k_in_j_red | #reds) = C(j, k) (#reds/n)^k(1 - #reds/n)^j-k
P(#reds) = 1/n {assuming a uniform distribution; if it's binomial change it to C(n, #reds)/2ⁿ}

P(k_in_j_red) = 1/n http://www.ocf.berkeley.edu/~wwu/YaBBImages/symbols/sum.gif_#reds=0..n C(j, k) (#reds/n)^k(1 - #reds/n)^j-k

P (#reds | k_in_j_red) = C(j, k) (#reds/n)^k(1 - #reds/n)^j-k / http://www.ocf.berkeley.edu/~wwu/YaBBImages/symbols/sum.gif_#reds=0..n C(j, k) (#reds/n)^k(1 - #reds/n)^j-k

P(red | k_in_j_red) = http://www.ocf.berkeley.edu/~wwu/YaBBImages/symbols/sum.gif_#reds=0..n #reds/n * [C(j, k) (#reds/n)^k(1 - #reds/n)^j-k / http://www.ocf.berkeley.edu/~wwu/YaBBImages/symbols/sum.gif_#reds=0..n C(j, k) (#reds/n)^k(1 - #reds/n)^j-k] =
[http://www.ocf.berkeley.edu/~wwu/YaBBImages/symbols/sum.gif_#reds=0..n #reds * C(j, k) (#reds/n)^k(1 - #reds/n)^j-k] / [n http://www.ocf.berkeley.edu/~wwu/YaBBImages/symbols/sum.gif_#reds=0..n C(j, k) (#reds/n)^k(1 - #reds/n)^j-k]

[edit]A few example values:

n=1, j=k=1
P(red | 1_in_1_red) = http://www.ocf.berkeley.edu/~wwu/YaBBImages/symbols/sum.gif_#reds=0..1 #reds * (#reds) / [http://www.ocf.berkeley.edu/~wwu/YaBBImages/symbols/sum.gif_#reds=0..1 (#reds)] = 1/1 = 1

n=2, j=k=1
P(red | 1_in_1_red) = [http://www.ocf.berkeley.edu/~wwu/YaBBImages/symbols/sum.gif_#reds=0..2 #reds * (#reds/2) ] / [2 http://www.ocf.berkeley.edu/~wwu/YaBBImages/symbols/sum.gif_#reds=0..2 (#reds/2)] = 5/6

n=2, j=1, k=0
P(red | 0_in_1_red) = [http://www.ocf.berkeley.edu/~wwu/YaBBImages/symbols/sum.gif_#reds=0..2 #reds * (1 - #reds/2) ] / [2 http://www.ocf.berkeley.edu/~wwu/YaBBImages/symbols/sum.gif_#reds=0..2 (1 - #reds/2)] = 1/6

n=2, j=k=2
P(red | 2_in_2_red) = [http://www.ocf.berkeley.edu/~wwu/YaBBImages/symbols/sum.gif_#reds=0..2 #reds * (#reds/2)²] / [2 http://www.ocf.berkeley.edu/~wwu/YaBBImages/symbols/sum.gif_#reds=0..2 (#reds/2)²] = 9/10

n=2, j=2, k=1
P(red | 1_in_2_red) = [http://www.ocf.berkeley.edu/~wwu/YaBBImages/symbols/sum.gif_#reds=0..2 #reds * 2 (#reds/2)(1 - #reds/2)] / [2 http://www.ocf.berkeley.edu/~wwu/YaBBImages/symbols/sum.gif_#reds=0..2 2(#reds/2)(1 - #reds/2)] = 1/2

Seems ok, so far..
[/edit]

Title: Re: Probability of probability
Post by jollytall on Apr 30^th, 2008, 12:53am

Thanks towr, very clear description. I guess along the same lines Q4 can also be answered. It might be even simpler, as your calculation somewhere halfway uses in it the probability of various red/blue distributions. We simply have to add them up:
Sum P (#reds | k_in_j_red) for #reds > q4*n and then for all values. The ratio should give p4.
So the original questions solved (I think).

The only problem with it, that we have to know n. Now in real life observing nature, n can be virtually infinite.
If I assume correctly, though did not do it, for p3 the following assumptions can be made:
If we know nothing about the distribution, we can assume linear only, and so p3=k/j.
If we know that the distibution is binomial, then p3=1/2 regadless of j, k.

For assumed linear distribution, Q4 it is more tricky though. If we only know j and k, I am not sure how would it be possible to say anything about p4.
My first instinct would say that increasing j, we can be more and more sure that the real distibution is k/j, so for q4 less than k/j, I would say p4 is getting close to 1. But my second instinct says, that even a "large" j is very small compared to n, so we cannot say anything about how fast p4 converges to 1 (how peaky is the probability curve of the distribution).
So I think the best is if we disregard the results of j, k and simply use p4=1-q4, although it is against my p3=k/j logic. Also, if there is a pile of balls (millions of them) and we randomly choose 1000 and 800 of them are red, would you say p4=30% for q4=0.7? I would say p4 more than 50%. This is where our brain works different from math.

But more back onto mathematical ground, I look for a more complex solution. If we know the history of the j draws then I can make a more detailed analysis. I split up the j draws chronologically into a number of j' samples. I check all k'/j' and their deviation. If the deviation is low, it means that the original sample and even the sub-samples were big enough (represent well the original) so I can assume that the result is good. If the deviation is large then either the original sample was not big enough or I split it into too small sub-samples (think of j'=1).
So I would do some sort of halving and again halving method to increase the confidence level.
I cannot turn it into formulas and again my instint says that for n>>j even this will not work.
On the other hand this is what all experimental scientists do.

Any thoughts?