Wednesday 24 September 2014

Excel Average() is often wrong and over emphasises minorities and why 0 is your truthful friend.


Take this simple spreadsheet that records over 10 weeks the number of veterinary consults delivered in a small animal practice. In the columns we see the number of each type of animal seen in each week.  In all weeks we see cats and dogs but only in some weeks do we see the specialist animals.

In 10 weeks we have 139 Total Consults made up of 61 cats, 68 dogs and (5+2+3)=10 other animals.  


Looking at how the consults are recorded for Geckos and Stick Insects, we only have numbers in the weeks where consults were done. For the Cobras in weeks where 0 Cobras are seen 0 is entered. At first thought how the numbers are recorded should not make a difference but as we can see the recording of 0 consults in a week has profound impact on the results we get.

Using the Excel Average() formula down the columns does NOT, in all cases, give us the expected answer of "Average number of animals seen per week." Because the Average and Averageif formulas only include non-blank cells, we actually get the far less useful "On the weeks, were we see a Gecko, we see 1.67 Geckos."  The Gecko numbers are averaged for only the 3 weeks where the lizard consults occur.

The same Average() formula gives the correct answer for Cobras of 0.3 consults per week. Correctly averaged over the 10 weeks as we would expect.

Using the correct definition of Average bring the Sum of numbers / Number of Numbers it could be argued that Excel is giving the right answer.  This leaves the statistics trap door wide open for the readout to be "We took numbers over 10 weeks and on average we saw 1 Stick Insect per week. (Blue number)"  The flaw here is the per week.  Per week implies that every week we probably saw a Stick insect where as in 10 weeks we actually only saw 2 Stick Insects 0.2 per week.

The use of the formula Sum() / Size of range gives the correct answers here for all the speciallist animals. The difference is quite marked and builds a cumulative error when column are taken together. In fact we only had 10 specialist animal consults in 10 weeks ( 1 per week ) but if the incorrect averages are used (1.67 (red number) +1(blue number) +0.3(correct)) = 2.966 the answer appears to be just less than 3 consults per week.

The language used to describe the result lines is accurate but has subtly and will often be misheard and interpreted to over emphasis the minority animals.

The Office Excel page where it describes the Average and Averageif formulas correctly discusses how blank cells are not included in Average excluding cells with 0s but has no mention of how comparing stay the average of 3 numbers with the average of 10 numbers is deeply broken. The Average formulas, when used over blank cells, will give an indication of average capacity needed but not the average usage of a resource.

Zero is your friend when Average() is used. As with the Cobra column if there was nothing then 0 should be recorded and used in the average calculations.

Being controversial here ...... Watch out for this sneaky and systemic error especially when deciding the correct usage based level of provision of services and facilities for minority and special interests. Probe for "capacity needed" and "expected average usage". These are different numbers and often confused.





No comments: