Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
Mathematics
General TopicsResearchOperations ResearchStatisticsMathematical LogicNumerical AnalysisUndergraduate MathAlgebra HelpRecreational Math
Math Software
MapleMathematicaMATLABScilabSASSPSS

Math Forum / Mathematics / Statistics / October 2008



Tip: Looking for answers? Try searching our database.

Variance for missing digits

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Stig Holmquist - 05 Oct 2008 20:14 GMT
With the formula M=10x0.9^n I calculate that after 15 random draws(d)
one at a time with replacement from an urn with the ten digits 0 to 9
there is likely to remain 2.06 digits not drawn or missing (M) on
average.

Next I wish to calculate the variance for this number andI assume I
can use the binomial variance(V) formula V=np(1-p). But what value
must I use for p? Should it be 2.06/10 or 2.06/15? In short, is p=M/10
or M/d? Am I using the correct formula or is there another?

Bertil
se16@btinternet.com - 06 Oct 2008 01:44 GMT
> With the formula M=10x0.9^n I calculate that after 15 random draws(d)
> one at a time with replacement from an urn with the ten digits 0 to 9
> there is likely to remain 2.06 digits not drawn or missing (M) on
> average.

Yes - the chance of any one not being drawn is (1-1/10)^15 which is
about 0.206, so the expected number not drawn is about 2.06

> Next I wish to calculate the variance for this number andI assume I
> can use the binomial variance(V) formula V=np(1-p). But what value
> must I use for p? Should it be 2.06/10 or 2.06/15? In short, is p=M/10
> or M/d? Am I using the correct formula or is there another?
>
> Bertil

I do not understand that.  As far as I can see by finding the
probabilities of each number of missing digits, the variance is about
0.986, which is not either of your numbers.  Note that whether a
particular digit is drawn is not independent of the others being drawn
(e.g. at least one must be drawn).
Stig Holmquist - 06 Oct 2008 13:48 GMT
>> With the formula M=10x0.9^n I calculate that after 15 random draws(d)
>> one at a time with replacement from an urn with the ten digits 0 to 9
[quoted text clipped - 16 lines]
>particular digit is drawn is not independent of the others being drawn
>(e.g. at least one must be drawn).

I tested 18 sets of 5 pick3 simulations generated with a good RNG and
found the mean of missing digits was 2.1666 with a std.dev. of only
0.857. This test had one set of triples, which yielded four missing
digits, without which the mean and the std.dev. would be lower.
Thus I suspect there must be another formula than the binomial for
predicting the variance. What formula did you use to get 0.986?

Bertil
se16@btinternet.com - 06 Oct 2008 15:53 GMT
> >> With the formula M=10x0.9^n I calculate that after 15 random draws(d)
> >> one at a time with replacement from an urn with the ten digits 0 to 9
[quoted text clipped - 27 lines]
>
> - Show quoted text -

I make the probabilities for the number of unused digits after 15
draws with replacement from 10 to be

10    0
9    0.00000000000001
8    0.00000000147447
7    0.00000171007272
6    0.00021347398800
5    0.00637359166080
4    0.06360882287760
3    0.24720675819840
2    0.39304955289600
1    0.24359586451200
0    0.04595022432000

If P(n,k,m) is the probability of n remaining unused digits out of an
original k after m draws then
P(n,k,m) = P(n+1,k,m-1) * (n+1)/k + P(n,k,m-1) * (k-n)/k
starting with P(k,k,0) = 1

I make the expected value of this to be 2.05891132094649 and variance
about 0.986389 (sd about 0.993171).
Scott Hemphill - 06 Oct 2008 17:42 GMT
>>> With the formula M=10x0.9^n I calculate that after 15 random draws(d)
>>> one at a time with replacement from an urn with the ten digits 0 to 9
[quoted text clipped - 23 lines]
> Thus I suspect there must be another formula than the binomial for
> predicting the variance. What formula did you use to get 0.986?

se16's figures are correct.  The exact figures are:

mean:      2.05891132094649
variance:  0.9863889814197496485705566799

You can verify these by computing the exact probabilities for each
number of missing digits.

Let d(n) be a 10-tuple which contains the number of ways there can be
1..10 digits present with n picks.  The probability of each number of
digits present is then just d(n)/10^n.

d(1)={10, 0, 0, 0, 0, 0, 0, 0, 0, 0}
p(1)={1, 0, 0, 0, 0, 0, 0, 0, 0, 0}

In this case, the probability is 1 that there is exactly one digit
present after one pick, and zero that there are any more than one
digit.

You can now calculate d(n) from d(n-1), continuing until you get to
d(15).

To get d(2), note that for each of the 10 ways to get one digit in
d(1), there is one way to get the same digit, and nine ways to get a
different digit.  Therefore:

d(2)={10, 90, 0, 0, 0, 0, 0, 0, 0, 0}
p(2)={0.1, 0.9, 0, 0, 0, 0, 0, 0, 0, 0}

In general, you can get k digits in n picks by either having (k-1)
digits in (n-1) picks and picking a *new* digit, or by having k digits
in (n-1) picks and picking an *old* digit.  So for example, the second
element of d(3) is arrived at by:

 10 ways of having 1 digit in 2 picks times 9 ways of picking a
 different digit

   plus

 90 ways of having 2 digits in 2 picks times 2 ways of picking one of
 those 2 digits

 or (10*9) + (90*2) = 90 + 180 = 270

The complete 10-tuple for n = 3 is:

d(3)={10, 270, 720, 0, 0, 0, 0, 0, 0, 0}
p(3)={0.01, 0.27, 0.72, 0, 0, 0, 0, 0, 0, 0}

I'll give one more example.

d(4)={10*1, 270*2, 720*3,   0*4, 0*5, 0*6, 0*7, 0*8, 0*9, 0*10}
   +{0*10,  10*9, 270*8, 720*7, 0*6, 0*5, 0*4, 0*3, 0*2, 0*1}

   ={10, 630, 4320, 5040, 0, 0, 0, 0, 0, 0}
p{4)={0.001, 0.063, 0.432, 0.504, 0, 0, 0, 0, 0, 0}

So you just continue until you get d(15), then use the exact probabilities
to calculate the mean and variance of the number of missing digits.
For example, the mean for n=4 is:

mean(4)=0.001*9 + 0.063*8 + 0.432*7 + 0.504*6
      =6.561

var(4)=0.001*(9-6.561)^2 + 0.063*(8-6.561)^2 + 0.432*(7-6.561)^2
     +0.504*(6-6.561)^2

     =0.378279

There is an exact formula for the mean and variance, but you can't use
the binomial variance formula.  As was noted by se16:

mean(n) = 10*0.9^n

I derived a formula for the variance:

var(n) = 90*0.8^n - 100*0.81^n + 10*0.9^n

There's probably a simple way to derive or prove this, but I used a
method that is outside the scope of the rest of this message.
Basically, I noted that p(n) is a linear transformation of p(n-1),
wrote out the transition matrix (which is bidiagonal), calculated the
eigenvalues and eigenvectors of the matrix, wrote an expression for
the n'th power of that matrix, multiplied p(1) by the n'th power,
getting formulas for any n for the probabilities of 1..10 digits.  I
then used these formulas to get the mean and variance of the number of
missing digits for any n.

Scott
Signature

Scott Hemphill    hemphill@alumni.caltech.edu
"This isn't flying.  This is falling, with style."  -- Buzz Lightyear

Stig Holmquist - 06 Oct 2008 23:01 GMT
>>>> With the formula M=10x0.9^n I calculate that after 15 random draws(d)
>>>> one at a time with replacement from an urn with the ten digits 0 to 9
[quoted text clipped - 101 lines]
>
>var(n) = 90*0.8^n - 100*0.81^n + 10*0.9^n

Thank you for this last formula. It's what the doctor odered.

Bertil

>There's probably a simple way to derive or prove this, but I used a
>method that is outside the scope of the rest of this message.
[quoted text clipped - 7 lines]
>
>Scott
Scott Hemphill - 06 Oct 2008 18:50 GMT
>>> With the formula M=10x0.9^n I calculate that after 15 random draws(d)
>>> one at a time with replacement from an urn with the ten digits 0 to 9
[quoted text clipped - 23 lines]
> Thus I suspect there must be another formula than the binomial for
> predicting the variance. What formula did you use to get 0.986?

Generally, you need on the order of 10^(2*n) simulations to get n
significant figures in your statistics, so you would need 100 to get
one significant figure, 10000 to get two significant figures, etc.

I ran 10^10 (that's 10 U.S. billion) experiments using GNU's
drand48() RNG.  The results were:

mean: 2.05891
variance: 0.986385

Scott
Signature

Scott Hemphill    hemphill@alumni.caltech.edu
"This isn't flying.  This is falling, with style."  -- Buzz Lightyear

 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.