Excel Statistical Functions: VARP (826393)



The information in this article applies to:

  • Microsoft Office Excel 2003

SUMMARY

The purpose of this article is to describe the VARP function in Excel 2003, to illustrate how the function is used, and to compare results of the function for Excel 2003 with the results of VARP when used in earlier versions of Excel.

MORE INFORMATION

VARP returns the population variance for a population whose values are contained in an Excel worksheet and are specified by the argument or arguments to VARP.

Syntax

VARP(value1, value2, value3, ...)

where value1, value2, ..., up to 30 value arguments.

The most common usage of VARP includes only 1 value argument that specifies a range of cells that contains the population, for example, VARP(A1:B100).

Example Usage

Create a blank Excel worksheet, copy the table below, select cell A1 in your blank Excel worksheet, and then click Paste on the Edit menu so that the entries in the table below fill cells A1:D17 in your worksheet.

Data
6sample mean:=AVERAGE(A3:A8)
4sample size:=COUNT(A3:A8)
2VARP:=VARP(A3:A8)
1pre-Excel 2003 VARP v1:=(D4*SUMSQ(A3:A8) - (SUM(A3:A8)^2))/(D4^2)
3pre-Excel 2003 VARP v2:=(SUMSQ(A3:A8) - (SUM(A3:A8)^2)/D4)/D4
5Excel 2003 VARP:=DEVSQ(A3:A8)/D4
Modified DataPower of 10 to add to data:1
=A3 + 10^$D$10sample mean:=AVERAGE(A12:A17)
=A4 + 10^$D$10sample size:=COUNT(A12:A17)
=A5 + 10^$D$10VARP:=VARP(A12:A17)
=A6 + 10^$D$10pre-Excel 2003 VARP v1:=(D13*SUMSQ(A12:A17) - (SUM(A12:A17)^2))/(D13^2)
=A7 + 10^$D$10pre-Excel 2003 VARP v2:=(SUMSQ(A12:A17) - (SUM(A12:A17)^2)/D13)/D13
=A8 + 10^$D$10Excel 2003 VARP:=DEVSQ(A12:A17)/D13


Note After you paste this table into your new Excel worksheet, click the Paste Options button, and then click Match Destination Formatting. With the pasted range still selected, point to Column on the Format menu, and then click AutoFit Selection.

Cell D5 contains the value of VARP for your current version of Excel. If you use Excel 2003, this value should agree with the value in cell D8. Cell D8 shows the Excel 2003 value of VARP (regardless of the version of Excel that you are using). Cells D6 and D7 show two approximations to the value of VARP that was computed by earlier versions of Excel. The formula in cell D6 is the formula that is shown in the Help file for Excel 2002 and earlier.

In this example, all versions return the value 2.9166667. There are no computational problems here that cause differences in VARP between versions of Excel.

You can use rows 10 to 17 to experiment with modified data by adding a constant (in this case a power of 10 is added) to each data point. It is well-known that adding a constant to each data point will have no affect on the value of population variance.

If you change the value in cell D10 (for example: to 1, 2, 3, 4, 5, 6, or 7), you can see the revised data values in cells A12:A17, and you can also see that all versions of VARP are well-behaved in these 7 cases.

But, if you continue the experiment to try the values 8, 9, and 10 in cell D10, you will notice that the Excel 2003 value remains at 2.9166667 (as it should), while the Excel 2002 and earlier Excel version values change (even though they should remain constant at 2.9166667). This would not occur if computations could be done with infinite precision.

Earlier versions of Excel exhibit wrong answers in these cases because the effects of round-off errors are more profound with the computational formula used by these versions. Still, the cases used in this experiment could be viewed as rather extreme.

Results in Earlier Versions of Excel

In extreme cases where there are many significant digits in the data but, at the same time, a small variance, the old computational formula leads to inaccurate results. Earlier versions of Excel used a single pass through the data to compute the sum of squares of the data values, the sum of the data values, and the count of the data values (sample size). These quantities were then combined into the computational formula specified in the Help file in earlier versions of Excel.

Results in Excel 2003

The procedure that is used in Excel 2003 uses a two-pass process through the data. First, the sum and count of the data values are computed and from these the sample mean (average) can be computed. Then, on the second pass, the squared difference between each data point and the sample mean is found and these squared differences are summed. In the numeric examples, even with a high power of 10 in cell D10, these squared differences are not affected and the results of the second pass are independent of the entry in cell D10. Therefore, the results in Excel 2003 are more stable numerically.

Conclusions

Replacing a one-pass approach by a two-pass approach guarantees better numeric performance of VARP in Excel 2003. Excel 2003 results will never be less accurate than results in earlier versions.

In most practical examples, however, you are not likely to see a difference between the Excel 2003 results and the results in earlier versions of Excel. This issue occurs because typical data is unlikely to exhibit the kind of unusual behavior that this experiment illustrates. Numeric instability is most likely to appear in earlier versions of Excel when data contains a high number of significant digits combined with relatively little variation between data values.

If you use an earlier version of Excel and want to see if switching to Excel 2003 will make a difference, compare the results of

VARP(values)

with the results of

DEVSQ(values)/COUNT(values)

If the results are consistent to the accuracy that you want, then switching to Excel 2003 will not affect the value of VAR.

If you use Excel 2003 and want to see if the computed value of VARP(values) has changed from the value that would have been found when you use an earlier version of Excel, compare

VARP(values)

with

(SUMSQ(values) - (SUM(values)^2)/COUNT(values))/COUNT(values)

This comparison gives at least a good approximation to the value of VARP as found by earlier versions of Excel.

The procedure of finding the sum of squared deviations about a mean (average) by
  • Finding the mean,
  • Computing each squared deviation,
  • Summing the squared deviations
is more accurate than the alternative procedure (frequently named the "calculator formula" because it was suitable for the use of a calculator on a small number of data points):
  • Find the sum of squares of all observations, the number of observations, and the sum of all observations.
  • Compute the sum of squares of all observations minus ((sum of all observations)^2)/number of observations).
There are many other functions that have been improved for Excel 2003 by replacing this latter one-pass procedure by the two-pass procedure that finds the mean on the first pass and computes the sum of squared deviations about it on the second pass.

A short list of such functions includes VAR, VARP, STDEV, STDEVP, DVAR, DVARP, DSTDEV, DSTDEVP, FORECAST, SLOPE, INTERCEPT, PEARSON, RSQ, and STEYX. Similar improvements were made in each of the three Analysis of Variance tools in the Analysis ToolPak.

For more information about VARP, click Microsoft Office Excel Help on the Help menu, type varp in the Search for box in the Assistance pane, and then click Start searching to view the topic.

Modification Type:MinorLast Reviewed:1/11/2006
Keywords:kbinfo KB826393 kbAudEndUser