Page 1 of 3
Simple Statistics Written by Joanne Birchall from Rainbow Research.
This article covers why we need statistics and also examines the following in detail:
- Average (Mean)
- Median and Mode
- Measures of Dispersion
- The range
- The mean deviation
- The standard deviation
Why we need Statistics
The number of people we speak to during quantitative Market Research projects can often run into thousands. Inevitably, this can result in very large sets of data. The human brain is limited in its capacity to deal with rapid incoming information, and when faced with large groups of numbers, most people cannot normally hold them all in mind at once. It is difficult to make any conclusions by simply looking at the data in its raw state; therefore it is useful to glean some kind of overall picture or summary of what is going on. The main purpose of statistics is to accurately summarise the data into easily interpretable fewer numbers. Some of the simplest statistics are described below.
Average (or mean)
Once Market Research data has been collected, a good starting point is to tabulate the data to find numbers of respondents’ answers to each question, for example the number of cans of varying types of dog food purchased from a particular supermarket each week. However with large data sets, this could result in numerous sheets of data and could be difficult to take in and mentally summarise. The next step would be to get an indication of the ‘normal’ or ‘usual’ number of cans of dog food purchased from the supermarket each week. These figures are called averages and the research might say: on average about 600 cans of Pedigree Chum are bought each week from the supermarket. Notice the word about is used, as you would not expect exactly 600 cans of Pedigree Chum to be bought every week, but that there would be some variation around the figure of 600. For example if the research spanned a four week period, there may have been 535 cans sold in the first week, 692 in the second week, 550 in the third week and 623 in the fourth week.
The average figure is calculated as:
The sum of all of the numbers (535+692+550+623)
The number of numbers (or weeks) (4)
The average is one kind of descriptive statistic, which indicates a ‘typical’ or ‘central’ figure for a group of numbers. It is officially called a ‘measure of central tendency’.
Median and Mode
On the whole, when numbers in a particular group cluster closely around the central value, the mean is a good way of indicating the ‘typical’ score, i.e. it is truly representative of the numbers. If however, the numbers are very widely spread, are very unevenly distributed, or contain extreme values, e.g. 9, 10, 13, 17, 23, 30, 45; or a hundred values of 10 and one value of 50 then the mean can be misleading, and other measures of central tendency such as the median or the mode should be used instead.
Median: If you have a set of values, and wish to obtain a figure which represents the central point, then a sensible way of doing this may be to arrange the numbers in order of size and pick the number which falls in the middle as being of typical value. For example if we had seven apples weighing 120g, 100g, 200g, 80g, 130g, 160g and 140g, if we arrange them in order of size, we get 80g, 100g, 120g, 130g, 140g, 160g and 200g. The value in the middle i.e. the fourth from the end weighs 130g, our median value.
If however there had been eight apples, we would take the weight of apples four and five as our two central numbers, and find the halfway point between them. For example, if the two central numbers were 140g and 180g, the median would be 160g i.e. you could find this mid point by adding the two numbers in question and then dividing by two, that is, by finding their average (or mean)!
Advantages of the median are:
· If one of the extreme values changes (and often in experiments it is the extreme values which are least reliable), then the median remains unaltered. Whereas the mean would be affected hugely.
· If a set of numbers has a lop-sided pattern – if for example, most of the scores are small, several medium sized, but only one or two high – then the median may again be more appropriate than the mean, as its value will be close to the majority of numbers
The disadvantages of the median are however:
· If there you have a large set of numbers, it would be time consuming to place each in order of size
· If one of the numbers near the middle of the distribution moves even slightly, then the median would alter, unlike the mean, which is relatively unaffected by change in one of the central numbers.