MORE INFORMATION
When
cumulative = TRUE, the
BINOMDIST(
x,
n,
p,
cumulative) function
returns the probability of
x or fewer successes in
n independent Bernoulli trials. Each of the trials
has an associated probability
p of success (and
probability 1-
p of failure). When
cumulative = FALSE, BINOMDIST returns the
probability of exactly
x successes.
Syntax
BINOMDIST(x, n, p, cumulative)
Parameters
- x is a non-negative
integer
- n is a positive
integer
- 0 < p < 1
- Cumulative is a logical variable
that takes on the values TRUE or FALSE
Example usage
Make the following assumptions:
- In baseball, a ".300 hitter" hits (succeeds) with
probability 0.300 each time he comes to bat (each trial).
- Successive times at bat are independent Bernoulli
trials.
You can use the following table to find the probability that
such a batter gets exactly 0, 1, 2, ..., or 10 hits in 10 trials and the
probability that the batter gets 0, 1 or fewer, 2 or fewer, ..., 9 or fewer, or
10 or fewer hits in 10 trials.
If the batter gets 50 hits in his first
200 trials (a .250 average), he must get 100 hits in his next 300 trials to
have 150 hits and a .300 average over 500 trials. You can use the following
table to analyze the chance that the batter gets sufficient hits to maintain
his average. Baseball commentators frequently allude to the "law of averages"
when they say that fans do not have to worry about the performance of this
batter with only 50 hits in his first 200 trials because "by the end of the
season his average will be .300." If the trials really were independent, and
the batter really had a 0.3 chance of success on any one trial, this reasoning
is fallacious because the outcomes of the first 200 trials do not affect the
success or the failure over the last 300 trials.
To illustrate the use
of BINOMDIST, create a blank Excel worksheet, copy the following table, select
cell
A1 in your blank Excel worksheet, and then click
Paste on the
Edit menu so that the entries in
the following table fill cells A1:C22 in your worksheet.
number of trials | 10 | |
success probability | 0.3 | |
successes, x | P(exactly x successes) | P(x or
fewer successes) |
0 | =BINOMDIST(A4,$B$1,$B$2,FALSE) | =BINOMDIST(A4,$B$1,$B$2,TRUE) |
1 | =BINOMDIST(A5,$B$1,$B$2,FALSE) | =BINOMDIST(A5,$B$1,$B$2,TRUE) |
2 | =BINOMDIST(A6,$B$1,$B$2,FALSE) | =BINOMDIST(A6,$B$1,$B$2,TRUE) |
3 | =BINOMDIST(A7,$B$1,$B$2,FALSE) | =BINOMDIST(A7,$B$1,$B$2,TRUE) |
4 | =BINOMDIST(A8,$B$1,$B$2,FALSE) | =BINOMDIST(A8,$B$1,$B$2,TRUE) |
5 | =BINOMDIST(A9,$B$1,$B$2,FALSE) | =BINOMDIST(A9,$B$1,$B$2,TRUE) |
6 | =BINOMDIST(A10,$B$1,$B$2,FALSE) | =BINOMDIST(A10,$B$1,$B$2,TRUE) |
7 | =BINOMDIST(A11,$B$1,$B$2,FALSE) | =BINOMDIST(A11,$B$1,$B$2,TRUE) |
8 | =BINOMDIST(A12,$B$1,$B$2,FALSE) | =BINOMDIST(A12,$B$1,$B$2,TRUE) |
9 | =BINOMDIST(A13,$B$1,$B$2,FALSE) | =BINOMDIST(A13,$B$1,$B$2,TRUE) |
10 | =BINOMDIST(A14,$B$1,$B$2,FALSE) | =BINOMDIST(A14,$B$1,$B$2,TRUE) |
| | |
300 trials, success probability 0.3: | | |
successes, x | P(exactly x successes) | P(x or
fewer successes) |
89 | =BINOMDIST(A18,300,0.3,FALSE) | =BINOMDIST(A18,300,0.3,TRUE) |
90 | =BINOMDIST(A19,300,0.3,FALSE) | =BINOMDIST(A19,300,0.3,TRUE) |
99 | =BINOMDIST(A20,300,0.3,FALSE) | =BINOMDIST(A20,300,0.3,TRUE) |
100 | =BINOMDIST(A21,300,0.3,FALSE) | =BINOMDIST(A21,300,0.3,TRUE) |
101 | =BINOMDIST(A22,300,0.3,FALSE) | =BINOMDIST(A22,300,0.3,TRUE) |
Note After you paste this table into your new Excel worksheet, click
the
Paste Options button, and then click
Match
Destination Formatting. With the pasted range still selected, point to
Column on the
Format menu, and then click
AutoFit Selection. You may want to format cells B4:C22 for
consistent readability (for example, format numbers to five decimal
places).
Cells B4:B14 show the probabilities of exactly
x successes in 10 trials. The most likely number of
successes is 3. The chances of 0, 6, 7, 8, 9, or 10 successes are each less
than 0.05 and add to about 0.076. So the chances of 1, 2, 3, 4, or 5 successes
is about 1 - 0.076 = 0.924. Cells C4:C14 show the probabilities of
x or fewer successes in 10 trials. You can verify
that the entries in column C in any row are each equal to the sum of all the
entries in column B down to and including that row.
B18:B20 show that
the most likely number of successes in 300 trials is 90. The probability of
exactly
x successes increases as
x increases to 90, and then decreases as
x continues to increase higher than 90. The chance
of 90 or fewer successes is just over 50%, as C20 shows. The chance of 99 or
fewer successes is about 0.884. Therefore, there is only an 11.6% chance (0.116
= 1 - 0.884) of 100 or more successes.
Results in earlier versions of Excel
Knusel (see note 1) documented instances where BINOMDIST does not
return a numeric answer and yields #NUM! instead because of a numeric overflow.
When BINOMDIST returns numeric answers, they are correct. BINOMDIST returns
#NUM! only when the number of trials is greater than or equal to 1030. There
are no computational problems if
n < 1030. In
practice, such high values of
n are unlikely. With
such a high number of independent trials, a user may want to approximate the
Binomial distribution by a normal distribution (if
n*
p and
n*(1-
p) are sufficiently
high, for example, each is greater than 30) or by a Poisson distribution
otherwise.
Note 1 Knusel, L. "On the Accuracy of Statistical Distributions in
Microsoft Excel 97",
Computational Statistics and Data Analysis (1998), 26: 375-377.
For the non-cumulative case,
BINOMDIST(
x,
n,
p, false) uses the following formula
COMBIN(n,x)*(p^x)*((1-p)^(n-x))
COMBIN is an Excel function that gives the number of combinations
of
x items in a population of
n items.
COMBIN(
n,
x) is sometimes
written
nC
x, and named a
"combinatorial coefficient" or just, "
n choose
x". If you experiment with COMBIN by typing
=COMBIN(1029,515) in one cell and
=COMBIN(1030,515) in a different cell, the first cell
returns an astronomical number, 1.4298E+308, and the second cell returns #NUM!
because it is even larger. The overflow of COMBIN causes an overflow of
BINOMDIST in earlier versions of Excel.
COMBIN has not been modified
for Excel 2003.
Results in Excel 2003
Because Microsoft has diagnosed when an overflow causes BINOMDIST
to return #NUM! and knows that BINOMDIST is well-behaved when overflow does not
occur, Microsoft has implemented a conditional algorithm in Excel 2003. The
algorithm uses BINOMDIST code from earlier versions of Excel (the computational
formula mentioned earlier in this article) when
n
< 1030. When
n >= 1030, Excel 2003 uses the
alternative algorithm that is described later in this article. Typically,
COMBIN overflows because it is astronomical, but
p^
x and
(1-
p)^(
n-
x)
are each infinitesimal. If it were possible to multiply them together, the
product would be a realistic probability between 0 and 1. But because existing
finite arithmetic cannot multiply them, an alternative algorithm avoids the
evaluation of COMBIN. Microsoft's approach calculates an unscaled sum of all
the probabilities of exactly
x successes that are
used later for scaling purposes. It also calculates an unscaled value of the
probability that you want BINOMDIST to return. Finally, it uses the scaling
factor to return a correct BINOMDIST value. The algorithm takes advantage of
the fact that the ratio of successive terms of the form
COMBIN(
n,
k)*(
p^
k)*((1-
p)^(
n-
k))
has a simple form. The algorithm proceeds as described in the pseudocode in the
following steps.
Step 0: (Initialization). Initialize the
TotalUnscaledProbability and the
UnscaledResult properties to 0. Initialize the constant
EssentiallyZero to a very small number, for example, 10^(-12).
Step 1:
Find
n*
p and round down
to the nearest whole number,
m. The most likely
number of successes in
n trials is either
m or
m+1.
COMBIN(
n,
k)*(
p^
k)*((1-
p)^(
n-
k))
decreases as
k decreases from
m to
m-1 to
m-2, and so on. Also,
COMBIN(
n,
k)*(
p^
k)*((1-
p)^(
n-
k))
decreases as
k increases from
m+1 to
m+2 to
m+3, and so on.
TotalUnscaledProbability = TotalUnscaledProbability + 1;
If (m == x) then UnscaledResult = UnscaledResult + 1;
If (cumulative && m < x) then UnscaledResult = UnscaledResult + 1;
Step 2: Calculate the unscaled probabilities for
k >
m:
PreviousValue = 1;
Done = FALSE;
k = m + 1;
While (not Done && k <= n)
{
CurrentValue = PreviousValue * (n - k + 1) * p / (k * (1 - p));
TotalUnscaledProbability = TotalUnscaledProbability + CurrentValue;
If (k == x) then UnscaledResult = UnscaledResult + CurrentValue;
If (cumulative && k < x) then UnscaledResult = UnscaledResult +
CurrentValue;
If (CurrentValue <= EssentiallyZero) then Done = TRUE;
PreviousValue = CurrentValue;
k = k+1;
}
end While;
Step 3: Calculate the unscaled probabilities for
k <
m:
PreviousValue = 1;
Done = FALSE;
k = m - 1;
While (not Done && k >= 0)
{
CurrentValue = PreviousValue * k+1 * (1-p) / ((n - k) * p);
TotalUnscaledProbability = TotalUnscaledProbability + CurrentValue;
If (k == x) then UnscaledResult = UnscaledResult + CurrentValue;
If (cumulative && k < x) then UnscaledResult = UnscaledResult +
CurrentValue;
If (CurrentValue <= EssentiallyZero) then Done = TRUE;
PreviousValue = CurrentValue;
k = k-1;
}
end While;
Step 4: Combine the unscaled results:
Return UnscaledResult/TotalUnscaledProbability;
Although this method is used only for
n
>= 1030, you can use the following additions to the Excel worksheet to help
you hand-execute this algorithm to calculate BINOMDIST(3, 10, 0.3, TRUE) (in
the baseball example, the chance of 3 or fewer hits in 10 trials for a .300
batter).
To illustrate this, copy the following table, select cell
D4 in the Excel worksheet that you created earlier, and then click
Paste on the
Edit menu. When you do this, the entries in the following table fill cells D1:E15 in your worksheet.
| |
| |
| |
=D5*(1-$B$2)*(A4+1)/($B$2*($B$1-A4)) | =D4/$D$15 |
=D6*(1-$B$2)*(A5+1)/($B$2*($B$1-A5)) | =D5/$D$15 |
1 | =D6/$D$15 |
=D6*$B$2*($B$1-A7+1)/((1-$B$2)*A7) | =D7/$D$15 |
=D7*$B$2*($B$1-A8+1)/((1-$B$2)*A8) | =D8/$D$15 |
=D8*$B$2*($B$1-A9+1)/((1-$B$2)*A9) | =D9/$D$15 |
=D9*$B$2*($B$1-A10+1)/((1-$B$2)*A10) | =D10/$D$15 |
=D10*$B$2*($B$1-A11+1)/((1-$B$2)*A11) | =D11/$D$15 |
=D11*$B$2*($B$1-A12+1)/((1-$B$2)*A12) | =D12/$D$15 |
=D12*$B$2*($B$1-A13+1)/((1-$B$2)*A13) | =D13/$D$15 |
=D13*$B$2*($B$1-A14+1)/((1-$B$2)*A14) | =D14/$D$15 |
=SUM(D4:D14) | |
Column D contains the unscaled probabilities. The 1 in cell D6
is the result of Step 1 of the algorithm. Excel 2003 calculates the entries in
cells D7, D8, ..., D14 in that order in Step 2. It calculates the entries in
cells D5 and D4 (in that order) in Step 3. The sum of all unscaled
probabilities appears in D15.
To calculate the probability of 3 or
fewer successes, type the following formula in any blank cell:
In the previous example,
EssentiallyZero does not stop Steps 2 or 3. However, if you want to evaluate
BINOMDIST(550, 2000, 0.3, TRUE),
EssentiallyZero may stop Step 2 or Step 3. A binomial random variable with
n = 2000 and
p = 0.3 has
a distribution that is approximated by the normal with mean 600 and standard
deviation SQRT(2000*0.3*(1 - 0.3)) = SQRT(420) = 20.5. Then 805 is 10 standard
deviations higher than the mean and 395 is 10 standard deviations lower than
the mean. Depending on your setting of
EssentiallyZero,
EssentiallyZero may stop Step 2 before you reach 805 and may stop Step 3 before
you reach 395.
Conclusions
Inaccuracies in versions of Excel earlier than Excel 2003 occur
only when the number of trials is greater than or equal to 1030. In such cases,
BINOMDIST returns #NUM! in earlier versions of Excel because one term in a
sequence of terms that are multiplied together overflows. To correct this
behavior, Excel 2003 uses the alternative procedure that is mentioned earlier
in this article when such an overflow would otherwise occur.
The
CRITBINOM, HYPGEOMDIST, NEGBINOMDIST, and POISSON function exhibit similar
behavior in earlier versions of Excel. These functions also either return
correct numeric results or #NUM! or #DIV/0!. Again, problems occur because of
overflow (or underflow). It is easy to determine when and how these problems
occur. Excel 2003 uses an alternative algorithm that is similar to the one for
BINOMDIST to return correct answers in cases where earlier versions return
#NUM!.
For more
information about BINOMDIST, click
Microsoft Excel Help on the
Help menu, type
binomdist in the
Search for box in the Assistance pane, and then click
Start searching to view the topic.