Excel statistical functions: BINOMDIST (827459)

MORE INFORMATION

When cumulative = TRUE, the BINOMDIST(x, n, p, cumulative) function returns the probability of x or fewer successes in n independent Bernoulli trials. Each of the trials has an associated probability p of success (and probability 1-p of failure). When cumulative = FALSE, BINOMDIST returns the probability of exactly x successes.

Syntax

BINOMDIST(x, n, p, cumulative)

Parameters

x is a non-negative integer
n is a positive integer
0 < p < 1
Cumulative is a logical variable that takes on the values TRUE or FALSE

Example usage

Make the following assumptions:

In baseball, a ".300 hitter" hits (succeeds) with probability 0.300 each time he comes to bat (each trial).
Successive times at bat are independent Bernoulli trials.

You can use the following table to find the probability that such a batter gets exactly 0, 1, 2, ..., or 10 hits in 10 trials and the probability that the batter gets 0, 1 or fewer, 2 or fewer, ..., 9 or fewer, or 10 or fewer hits in 10 trials.

If the batter gets 50 hits in his first 200 trials (a .250 average), he must get 100 hits in his next 300 trials to have 150 hits and a .300 average over 500 trials. You can use the following table to analyze the chance that the batter gets sufficient hits to maintain his average. Baseball commentators frequently allude to the "law of averages" when they say that fans do not have to worry about the performance of this batter with only 50 hits in his first 200 trials because "by the end of the season his average will be .300." If the trials really were independent, and the batter really had a 0.3 chance of success on any one trial, this reasoning is fallacious because the outcomes of the first 200 trials do not affect the success or the failure over the last 300 trials.

To illustrate the use of BINOMDIST, create a blank Excel worksheet, copy the following table, select cell A1 in your blank Excel worksheet, and then click Paste on the Edit menu so that the entries in the following table fill cells A1:C22 in your worksheet.

number of trials	10
success probability	0.3
successes, x	P(exactly x successes)	P(x or fewer successes)
0	=BINOMDIST(A4,$B$1,$B$2,FALSE)	=BINOMDIST(A4,$B$1,$B$2,TRUE)
1	=BINOMDIST(A5,$B$1,$B$2,FALSE)	=BINOMDIST(A5,$B$1,$B$2,TRUE)
2	=BINOMDIST(A6,$B$1,$B$2,FALSE)	=BINOMDIST(A6,$B$1,$B$2,TRUE)
3	=BINOMDIST(A7,$B$1,$B$2,FALSE)	=BINOMDIST(A7,$B$1,$B$2,TRUE)
4	=BINOMDIST(A8,$B$1,$B$2,FALSE)	=BINOMDIST(A8,$B$1,$B$2,TRUE)
5	=BINOMDIST(A9,$B$1,$B$2,FALSE)	=BINOMDIST(A9,$B$1,$B$2,TRUE)
6	=BINOMDIST(A10,$B$1,$B$2,FALSE)	=BINOMDIST(A10,$B$1,$B$2,TRUE)
7	=BINOMDIST(A11,$B$1,$B$2,FALSE)	=BINOMDIST(A11,$B$1,$B$2,TRUE)
8	=BINOMDIST(A12,$B$1,$B$2,FALSE)	=BINOMDIST(A12,$B$1,$B$2,TRUE)
9	=BINOMDIST(A13,$B$1,$B$2,FALSE)	=BINOMDIST(A13,$B$1,$B$2,TRUE)
10	=BINOMDIST(A14,$B$1,$B$2,FALSE)	=BINOMDIST(A14,$B$1,$B$2,TRUE)

300 trials, success probability 0.3:
successes, x	P(exactly x successes)	P(x or fewer successes)
89	=BINOMDIST(A18,300,0.3,FALSE)	=BINOMDIST(A18,300,0.3,TRUE)
90	=BINOMDIST(A19,300,0.3,FALSE)	=BINOMDIST(A19,300,0.3,TRUE)
99	=BINOMDIST(A20,300,0.3,FALSE)	=BINOMDIST(A20,300,0.3,TRUE)
100	=BINOMDIST(A21,300,0.3,FALSE)	=BINOMDIST(A21,300,0.3,TRUE)
101	=BINOMDIST(A22,300,0.3,FALSE)	=BINOMDIST(A22,300,0.3,TRUE)

Note After you paste this table into your new Excel worksheet, click the Paste Options button, and then click Match Destination Formatting. With the pasted range still selected, point to Column on the Format menu, and then click AutoFit Selection. You may want to format cells B4:C22 for consistent readability (for example, format numbers to five decimal places).

Cells B4:B14 show the probabilities of exactly x successes in 10 trials. The most likely number of successes is 3. The chances of 0, 6, 7, 8, 9, or 10 successes are each less than 0.05 and add to about 0.076. So the chances of 1, 2, 3, 4, or 5 successes is about 1 - 0.076 = 0.924. Cells C4:C14 show the probabilities of x or fewer successes in 10 trials. You can verify that the entries in column C in any row are each equal to the sum of all the entries in column B down to and including that row.

B18:B20 show that the most likely number of successes in 300 trials is 90. The probability of exactly x successes increases as x increases to 90, and then decreases as x continues to increase higher than 90. The chance of 90 or fewer successes is just over 50%, as C20 shows. The chance of 99 or fewer successes is about 0.884. Therefore, there is only an 11.6% chance (0.116 = 1 - 0.884) of 100 or more successes.

Results in earlier versions of Excel

Knusel (see note 1) documented instances where BINOMDIST does not return a numeric answer and yields #NUM! instead because of a numeric overflow. When BINOMDIST returns numeric answers, they are correct. BINOMDIST returns #NUM! only when the number of trials is greater than or equal to 1030. There are no computational problems if n < 1030. In practice, such high values of n are unlikely. With such a high number of independent trials, a user may want to approximate the Binomial distribution by a normal distribution (if n*p and n*(1-p) are sufficiently high, for example, each is greater than 30) or by a Poisson distribution otherwise.

Note 1 Knusel, L. "On the Accuracy of Statistical Distributions in Microsoft Excel 97", Computational Statistics and Data Analysis (1998), 26: 375-377.

For the non-cumulative case, BINOMDIST(x, n, p, false) uses the following formula

COMBIN(n,x)*(p^x)*((1-p)^(n-x))

COMBIN is an Excel function that gives the number of combinations of x items in a population of n items. COMBIN(n,x) is sometimes written nCx, and named a "combinatorial coefficient" or just, "n choose x". If you experiment with COMBIN by typing =COMBIN(1029,515) in one cell and =COMBIN(1030,515) in a different cell, the first cell returns an astronomical number, 1.4298E+308, and the second cell returns #NUM! because it is even larger. The overflow of COMBIN causes an overflow of BINOMDIST in earlier versions of Excel.

COMBIN has not been modified for Excel 2003.

Results in Excel 2003

Because Microsoft has diagnosed when an overflow causes BINOMDIST to return #NUM! and knows that BINOMDIST is well-behaved when overflow does not occur, Microsoft has implemented a conditional algorithm in Excel 2003. The algorithm uses BINOMDIST code from earlier versions of Excel (the computational formula mentioned earlier in this article) when n < 1030. When n >= 1030, Excel 2003 uses the alternative algorithm that is described later in this article. Typically, COMBIN overflows because it is astronomical, but p^x and (1-p)^(n-x) are each infinitesimal. If it were possible to multiply them together, the product would be a realistic probability between 0 and 1. But because existing finite arithmetic cannot multiply them, an alternative algorithm avoids the evaluation of COMBIN. Microsoft's approach calculates an unscaled sum of all the probabilities of exactly x successes that are used later for scaling purposes. It also calculates an unscaled value of the probability that you want BINOMDIST to return. Finally, it uses the scaling factor to return a correct BINOMDIST value. The algorithm takes advantage of the fact that the ratio of successive terms of the form COMBIN(n,k)*(p^k)*((1-p)^(n-k)) has a simple form. The algorithm proceeds as described in the pseudocode in the following steps.

Step 0: (Initialization). Initialize the TotalUnscaledProbability and the UnscaledResult properties to 0. Initialize the constant EssentiallyZero to a very small number, for example, 10^(-12).

Step 1: Find n*p and round down to the nearest whole number, m. The most likely number of successes in n trials is either m or m+1. COMBIN(n,k)*(p^k)*((1-p)^(n-k)) decreases as k decreases from m to m-1 to m-2, and so on. Also, COMBIN(n,k)*(p^k)*((1-p)^(n-k)) decreases as k increases from m+1 to m+2 to m+3, and so on.

TotalUnscaledProbability = TotalUnscaledProbability + 1;
If (m == x) then UnscaledResult = UnscaledResult + 1;
If (cumulative && m < x) then UnscaledResult = UnscaledResult + 1;

Step 2: Calculate the unscaled probabilities for k > m:

PreviousValue = 1;
Done = FALSE;
k = m + 1;
While (not Done && k <= n)
  {
	CurrentValue = PreviousValue * (n - k + 1) * p / (k * (1 - p));
	TotalUnscaledProbability = TotalUnscaledProbability + CurrentValue;
	If (k == x) then UnscaledResult = UnscaledResult + CurrentValue;
	If (cumulative && k < x) then UnscaledResult = UnscaledResult + 
		CurrentValue;
	If (CurrentValue <= EssentiallyZero) then Done = TRUE;
	PreviousValue = CurrentValue;
	k = k+1;
  }
end While;

Step 3: Calculate the unscaled probabilities for k < m:

PreviousValue = 1;
Done = FALSE;
k = m - 1;
While (not Done && k >= 0)
  {
	CurrentValue = PreviousValue * k+1 * (1-p) / ((n - k) * p);
	TotalUnscaledProbability = TotalUnscaledProbability + CurrentValue;
	If (k == x) then UnscaledResult = UnscaledResult + CurrentValue;
	If (cumulative && k < x) then UnscaledResult = UnscaledResult + 
		CurrentValue;
	If (CurrentValue <= EssentiallyZero) then Done = TRUE;
	PreviousValue = CurrentValue;
	k = k-1;
  }
end While;

Step 4: Combine the unscaled results:

Return UnscaledResult/TotalUnscaledProbability;

Although this method is used only for n >= 1030, you can use the following additions to the Excel worksheet to help you hand-execute this algorithm to calculate BINOMDIST(3, 10, 0.3, TRUE) (in the baseball example, the chance of 3 or fewer hits in 10 trials for a .300 batter).

To illustrate this, copy the following table, select cell D4 in the Excel worksheet that you created earlier, and then click Paste on the Edit menu. When you do this, the entries in the following table fill cells D1:E15 in your worksheet.




=D5(1-$B$2)(A4+1)/($B$2*($B$1-A4))	=D4/$D$15
=D6(1-$B$2)(A5+1)/($B$2*($B$1-A5))	=D5/$D$15
1	=D6/$D$15
=D6$B$2($B$1-A7+1)/((1-$B$2)*A7)	=D7/$D$15
=D7$B$2($B$1-A8+1)/((1-$B$2)*A8)	=D8/$D$15
=D8$B$2($B$1-A9+1)/((1-$B$2)*A9)	=D9/$D$15
=D9$B$2($B$1-A10+1)/((1-$B$2)*A10)	=D10/$D$15
=D10$B$2($B$1-A11+1)/((1-$B$2)*A11)	=D11/$D$15
=D11$B$2($B$1-A12+1)/((1-$B$2)*A12)	=D12/$D$15
=D12$B$2($B$1-A13+1)/((1-$B$2)*A13)	=D13/$D$15
=D13$B$2($B$1-A14+1)/((1-$B$2)*A14)	=D14/$D$15
=SUM(D4:D14)

Column D contains the unscaled probabilities. The 1 in cell D6 is the result of Step 1 of the algorithm. Excel 2003 calculates the entries in cells D7, D8, ..., D14 in that order in Step 2. It calculates the entries in cells D5 and D4 (in that order) in Step 3. The sum of all unscaled probabilities appears in D15.

To calculate the probability of 3 or fewer successes, type the following formula in any blank cell:

= SUM(D4:D7)/D15

In the previous example, EssentiallyZero does not stop Steps 2 or 3. However, if you want to evaluate BINOMDIST(550, 2000, 0.3, TRUE), EssentiallyZero may stop Step 2 or Step 3. A binomial random variable with n = 2000 and p = 0.3 has a distribution that is approximated by the normal with mean 600 and standard deviation SQRT(2000*0.3*(1 - 0.3)) = SQRT(420) = 20.5. Then 805 is 10 standard deviations higher than the mean and 395 is 10 standard deviations lower than the mean. Depending on your setting of EssentiallyZero, EssentiallyZero may stop Step 2 before you reach 805 and may stop Step 3 before you reach 395.

Conclusions

Inaccuracies in versions of Excel earlier than Excel 2003 occur only when the number of trials is greater than or equal to 1030. In such cases, BINOMDIST returns #NUM! in earlier versions of Excel because one term in a sequence of terms that are multiplied together overflows. To correct this behavior, Excel 2003 uses the alternative procedure that is mentioned earlier in this article when such an overflow would otherwise occur.

The CRITBINOM, HYPGEOMDIST, NEGBINOMDIST, and POISSON function exhibit similar behavior in earlier versions of Excel. These functions also either return correct numeric results or #NUM! or #DIV/0!. Again, problems occur because of overflow (or underflow). It is easy to determine when and how these problems occur. Excel 2003 uses an alternative algorithm that is similar to the one for BINOMDIST to return correct answers in cases where earlier versions return #NUM!.

For more information about BINOMDIST, click Microsoft Excel Help on the Help menu, type binomdist in the Search for box in the Assistance pane, and then click Start searching to view the topic.