One of the most basic functions in any spreadsheet is to return an answer based upon some condition. This becomes especially useful when counting or summing based upon that condition. One condition is useful, but multiple conditions extend the functionality and flexibility, so that you can count say the number of items sold by part number AND by month. There are a number of ways that this can be achieved within Excel, but this paper is focussing on one particular function, the SUMPRODUCT function, which by creative use has evolved a flexibility undreamt of by its originators in Microsoft.
SUMPRODUCT is one of the most versatile functions provided in Excel. In its most basic form, SUMPRODUCT multiplies corresponding members in given arrays, and returns the sum of those products. This page discusses the classic use of SUMPRODUCT, how creativity and inbuilt flexibility has enabled it to evolve into a far more useful function, and explains some of the techniques being deployed. Finally, some examples of SUMPRODUCT show its versatility.
Standard Use of SUMPRODUCT
Evolving Use of SUMPRODUCT
Advantages of SUMPRODUCT
Format of SUMPRODUCT
Standard Use of SUMPRODUCT
In it's classic form, SUMPRODUCT multiplies each value in one array by the corresponding value in another array, and returns the summed result. As an example, if cells A9:A11 contain the values 1,2,3 and B9:B11 contain 10,20,30, then
returns 140, or (1*10)+(2*20)+(3*30)=10+40+90=140.
Evolving Use of SUMPRODUCT
Within Excel, there are two very useful functions that support conditional counting and summing, namely COUNTIF and SUMIF. Very useful functions, but limited in that they can only evaluate a single test range, and only a single test condition. Multiple conditions are so useful to test ranges (say between two dates), and double tests (one array = A and another = B), and whilst this can be managed using array functions
this is somewhat unwieldy, and is an array formula. And there is a better way, using SUMPRODUCT.
Note that in this section, all formulae given are using the '*' (multiply) operator format, but this in itself is one of the biggest discussion points around the SUMPRODUCT function, one which is discussed below.
To understand how SUMPRODUCT can be used, first consider the following data.
We can easily count the number of Fords with
which returns 4.
Similalrly, it is straight-forward to get the value of Fords sold, using
which gives 33,873.
How do we get a count of how many Fords are sold in June, or the value of them? The number can be calculated with
which is an array formula so is committed with Ctrl-Shift-Enter, not just Enter. Similarly, the value is obtained with
also an array formula.
But as this page is about SUMPRODUCT, you would expect that we could use that function in this case, and we can. The solution for the number of Fords sold in June using this function is
The value is obtained with
In my view, this formula more readily shows what the objective is.
As a further extension of its use, we can use the '+' (plus) operator to count OR conditions, such as how many cars sold were either Fords, or were sold in June. This formula shows how
Although array formulae are mentioned here, they are not explained. For a detailed discussion, see Chip Pearson's Array Formulas web page.
So far, so good, in that we have a versatile function that can do any number of conditional tests, and has an inbuilt flexibility that provides extensibility. Its power is augmented when combined with other functions, such as can be found in the examples below.
Multiple conditional tests are a major advantage of the SUMPRODUCT function as descibed above, but it has two other considerable advantages. The first is that it can function with closed workbooks, and the second is that the handling of text values can be tailored to the requirement.
In the case of another workbook, the SUMIF function can be used to calculate a value, such as in
This is fine in itself, and the value remains if the other workbook is closed, but as soon as the sheet is re-calculated, the formula returns #VALUE. Similarly, if the formula is entered with the other workbook already closed, a #VALUE is immediately returned.
SUMPRODUCT, however, overcomes this problem. The formula
=SUMPRODUCT(--('[Nowfal Rates.xls]RATES'!$K$11:$K$13>1),--('[Nowfal Rates.xls]RATES'!$K$11:$K$13))
returns the same result, but it will still work when the other workbook is closed and the sheet is re-calculated, and can be initially entered referencing the closed workbook, without a #VALUE error.
The second major advantage is being able to handle text in numeric columns differently. Consider the follwoing dataset, as shown in Table 2.
If we are looking at rows 1:4. we can see that we have a text value in B1 In this case it is simply a heading row, but the principle applies to a text value in any row.
Using SUMPRODUCT, we can either return an error, or ignore the text. This can be useful if we want to ignore errors, or if we want to trap the error (and presumably correct it later).
Errors will be returned if we use this version
To ignore errors, use this amended version which uses the double unary operator (see SUMPRODUCT Explained below for details)
But how does it work?
To understand how SUMPRODUCT works will help you to determine where to use it, how you can construct your formula, and thus how you can extend it.
Table 3. below shows an example data set that we will use. In this example, the problem is to find how many Fords with a category of "A" were sold. A9:A20 holds the make, B9:B20 has the category, and C9:C20 has the number sold. The formula to get this result is
The first part of the formula (A9:A20="Ford") checks the array of makes for a value of Ford. This returns an array of TRUE/FALSE, in this case it is
Similarly, the categories are checked for the vale A with (B9:B20="A"). Again, this returns an array of TRUE/FALSE, or
And finally, the numbers are not checked but taken as is, that is (C9:C20), which returns an array of numbers
So now we have three arrays, two of TRUE/FALSE values, one of numbers. This is showm in Table 4.
And this is where it gets interesting. SUMPRODUCT usually works on arrays of numbers, but we have arrays of TRUE/FALSE here as well as an array of numbers. By using the '*' (multiply) operator, we can get numeric values that can be summed. '*' has the effect of coercing these two arrays into a single array of 1/0 values. Multiplying TRUE by TRUE returns 1 (try it, enter =TRUE*TRUE in a cell and see the result), any other combination returns 0. Therefore, when both conditions are satisfied, we get a 1, whereas if any or both conditions are not satisfied, we get a 0. Multiplying the first array of TRUE/FALSE values by the second array of TRUE/FALSE values returns a composite array of 1/0 values, or
This new array of 1/0 values is then multiplied by the array of numbers sold to give another array of numbers sold that satisfy the two test conditions. SUMPRODUCT then sums the members of this array to give us a count.
Table 4. below shows the values that the conditional tests break down to before being acted upon by the '*' operator. Table 5. shows a virtual representation of those TRUE/FALSE values as their numerical equivalents of 1/0 and the individual multiplication results. From this, you should be able to see how SUMPRODUCT arrives at its result, namely 35.
Table 6. below shows you the same virtual representation of 1/0 numerical values without the numbers sold column, that is using SUMPRODUCT to count the number of rows satisfying the two conditions, or
which does use the product aspect (see more on this in the next section)
When using the SUMPRODUCT function, all arrays must be the same size, as corresponding members of each array are multiplied by each other.
When using the SUMPRODUCT function, no array can be a whole column (A:A), the array must be for a range within a column (although the best part of a column could be defined with A1:A65535 if so desired). Whole rows (1:1) are acceptable.
In a SUMPRODUCT function, the arrays being evaluated cannot be a mix of column and row ranges, they must all be columns, or all rows. However, the row data can be transposed to present it to SUMPRODUCT as columnar - see the Using TRANSPOSE to test against values in a column not row example.
Format of SUMPRODUCT
In the examples presented so far, the format has been
As mentioned above, we could also use
which works as the '*' operator is only required to coerce the conditional arrays that resolve to TRUE/FALSE into numeric values.
As it the use of a arithmetic operator that coreces the TRUE/FALSE values to 1/0, we could use many different operators and achieve the same result. Thus, it is also possible to coerce each of the conditional arrays individually by multiplying them by 1,
or by raising to the power of 1,
or by adding 0,
or even by using the N function,
These methods differ from the '*' operator in that they are applied to individual arrays, '*' operates on two arrays.
All of these methods work, when there is more than one conditional array, so it is really a matter of preference as to which to use. If there is a single conditional array, then the '*' operator cannot be used (there are not two to multiply), so one of the other above methods has to be used.
Yet another method is to use the double unary operator, --, in this way
The double unary operator also coerces the indivual array(s), which then acts more akin to classic SUMPRODUCT.
There has been much discussion that one way is faster than another, or is more of a 'standard' than another, but in reality there will be few instances where one method will gain a noticeable performance advantage over another, and as for standards, this is all new territory, and will mainly be used by people who have never been involved in using these standards, and who care even less.
For me, I believe it is a matter of preference. Personally, I am being swayed to the double unary -- notation, because it avoids a function call, it works in all situations (the '*' operator won't work on a single array), and I don't like the '1*', '*1', '^1', or '+0' variations. So my preference is for
which also has more similarity to classic SUMPRODUCT,
There is one other varitaion which has been promoted recently, which is the single unary operator, '-', such as
but I would not encourage this as it has no real merit that I can see, and has to be paired off, otherwise it will return a negative result.
So, to sum up ...
Tests, like A=10 normally resolve to TRUE or FALSE, and any operator is only needed if you want to coerce an array of TRUE/FALSE values to 1/0 integers, such as
SUMPRODUCT arrays are normally separated by the comma. So, to preserve this format, if you have multiple conditions, you can use the -- on both conditions like so
But, if you simply multiply two arrays of TRUE/FALSE, that implicitly resolves to 1/0 values that are then summed, you don;t need comma, so you could then use
Any further, final, array of values can use the same operator, or could revert to comma. So your formula can be written as
If the result is the product of two conditions being multiplied, it is fine to multiply them together as this will coerce the True/False values to 1/0 values to allow the summing
However, if there is only one condition, you can coerce to 1/0 with the double unary --
You could achieve this equally as well with
and equally the first could be represented as
There is no situation that I know of whereby a solution using -- could not be achieved somehow with a '*'. Conversely, if using the TRANSPOSE function within SUMPRODUCT, then the '*' has to be used.
So, as you can see there are a number of possibilities, and you make your own choice. I leave the final word to Harlan Grove, who once wrote this paragraph on why he prefers the double unary operator ...
....as I've written before, it's not the speed of double unary minuses I like, it's the fact that due to Excel's operator precedence it's harder to screw up double unary minuses with typos than it is to screw up the alternatives ^1, *1, +0. Also, since I read left to right, I prefer my number type coercions on the left rather than the right of my Boolean expressions, and -- looks nicer than 1* or 0+. Wrapping Boolean expressions inside N() is another alternative, possibly clearer, but it eats a nested function call level, so I don't use it.
Matching against values in another range
Dates for any international setting
Using TRANSPOSE to test against values in a column not row
Testing against multiple non-contiguous ranges
Find instances of a string, ignoring leading or trailing spaces
Count the number of unique values in a range
Avoid double-counting in multiple conditions
Count items matching a list
Count partial matching in a range
Count beteween two dates, excluding holidays
Sum visible cells