Page 3 of 4
Calculating a Sample Size
A frequently asked question is “How many people should I sample?” It is an extremely good question, although unfortunately there is no single answer! In general, the larger the sample size, the more closely your sample data will match that from the population. However in practice, you need to work out how many responses will give you sufficient precision at an affordable cost.
Calculation of an appropriate sample size depends upon a number of factors unique to each survey and it is down to you to make the decision regarding these factors. The three most important are:
- How accurate you wish to be
- How confident you are in the results
- What budget you have available
The temptation is to say all should be as high as possible. The problem is that an increase in either accuracy or confidence (or both) will always require a larger sample and higher budget. Therefore a compromise must be reached and you must work out the degree of inaccuracy and confidence you are prepared to accept.
There are two types of figures that you may wish to estimate in your Market Research project: values such as mean income, mean height etc. and proportions (the percentage of people who intend to vote for party X). There are slightly different sample size calculations for each:
The required formula is: s = (z / e)2
For a mean
s = the sample size
z = a number relating to the degree of confidence you wish to have in the result. 95% confidence* is most frequently used and accepted. The value of ‘z’ should be 2.58 for 99% confidence, 1.96 for 95% confidence, 1.64 for 90% confidence and 1.28 for 80% confidence.
e = the error you are prepared to accept, measured as a proportion of the standard deviation (accuracy)
For example, imagine we are estimating mean income, and wish to know what sample size to aim for in order that we can be 95% confident in the result. Assuming that we are prepared to accept an error of 10% of the population standard deviation (previous research might have shown the standard deviation of income to be 8000 and we might be prepared to accept an error of 800 (10%)), we would do the following calculation:
s = (1.96 / 0.1)2
Therefore s = 384.16
In other words, 385 people would need to be sampled to meet our criterion.
*Because we interviewed a sample and not the whole population (if we had done this we could be 100% confident in our results), we have to be prepared to be less confident and because we based our sample size calculation on the 95% confidence level, we can be confident that amongst the whole population there is a 95% chance that the mean is inside our acceptable error limit. There is of course a 5% chance that the measure is outside this limit. If we wanted to be more confident, we would base our sample size calculation on a 99% confidence level and if we were prepared to accept a lower level of confidence, we would base our calculation on the 90% confidence level.`
For a proportion
Although we are doing the same thing here, the formula is different:
s = z2(p(1-p))
s = the sample size
z = the number relating to the degree of confidence you wish to have in the result
p = an estimate of the proportion of people falling into the group in which you are interested in the population
e = the proportion of error we are prepared to accept
As an example, imagine we are attempting to assess the percentage of voters who will vote for candidate X. If we assume that we wish to be 99% confident of the result i.e. z = 2.85 and that we will allow for errors in the region of +/-3% i.e. e = 0.03. But in terms of an estimate of the proportion of the population who would vote for the candidate (p), if a previous survey had been carried out, we could use the percentage from that survey as an estimate. However, if this were the first survey, we would assume that 50% (i.e. p = 0.05) of people would vote for candidate X and 50% would not. Choosing 50% will provide the most conservative estimate of sample size. If the true percentage were 10%, we will still have an accurate estimate; we will simply have sampled more people than was absolutely necessary. The reverse situation, not having enough data to make reliable estimates, is much less desirable.
In the example:
s = 2.582(0.5*0.5)
Therefore s = 1,849
This rather large sample was necessary because we wanted to be 99% sure of the result and desired and desired a very narrow (+/-3%) margin of error. It does, however reveal why many political polls tend to interview between 1,000 and 2,000 people.