↓
1
| 1
| 26
| 3
| 2
| 2
| 1.82
| 2
| 0
|
2
| 2
| 35
| 2
| 3
| 8
| 1.63
| 3
| 2
|
3
| 2
| 56
| 2
| 4
| 3
| 1.55
| 3
| 3
|
4
| 1
| 42
| 4
| 1
| 1
| 1.74
| 1
| 1
|
5
| 2
| 83
| 5
| 5
| 7
| 9
| .
| 4
|
Note that the blank cell has been replaced with a full stop. This is a default missing value in SPSS and will be explained later. The matrix is now ready to be fed into a spreadsheet such as Excel or into a computer package such as SPSS ready for editing and analysis. These days it will often already be in a computer readable format generated by a research agency or after being entered by the researcher directly from the questionnaire.
3: Levels of measurement
For ease of computer processing, the values of variables are usually, but not always, coded as numbers, most often as integers (numbers with no decimals), but it is important to remember that these numeric codes frequently do not have all the properties of integers. This can affect the kind of statistical presentation and manipulation which is appropriate or permissible.
One important quality is the level of measurement.
The basic category is nominal (or categorical). All that is necessary is that the categories are properly defined (precise, mutually exclusive, exhaustive of all cases). Religious affiliation is such a variable: so are marital status and parliamentary constituency. Surveys usually ensure that categories are exhaustive by including a residual category 'Other'. Numeric codes are arbitrarily assigned to categories.
Ordinal implies that, in addition, the categories can be ranked, i.e. placed in order from highest to lowest on some defined criterion (e.g. Very satisfied, Quite satisfied, Neither satisfied nor dissatisfied, Quite dissatisfied, Very dissatisfied). Numeric codes cannot be arbitrarily assigned to categories, but they can be reversed.
Interval has all the characteristics of nominal and ordinal plus a defined unit of measurement. Thus, for instance, the distance from 2 to 4 is the same as that from 4 to 6. Examples include age, height, income in ££, number of children. Numerical codes are neither arbitrary nor reversible. If the scale has a true zero point it is a ratio scale (e.g. 4 is twice 2 for number of children, but not for temperature in degrees Celsius)
Note that in sociological discussion things like age or years of schooling are frequently used as indicators of something less precise such as "experience" or "level of education". It is somewhat dubious whether they really ought to be treated as interval variables in such a context.
When all the cases are grouped into only two categories, according to whether they do, or do not, have a particular characteristic, (e.g. Male - Female, Yes - No) they are known as dichotomies. These can always be treated as interval measurements.
The values of some interval variables are continuous (Height, age) and have to be rounded. It is important to remember how the rounding has been done: e.g. when height is measured in inches, 68 inches means from 67.5 inches up to, but not including, 68.5 inches, whereas 68 years old means from 68 up to, but not including, 69 years old. Thus average age calculated on age last birthday of each case in a sample will need to be adjusted by adding six months to the result.
The values of other interval variables are discrete (e.g. number of children) and can only increase in increments of 1. Sometimes it is important for statistical calculation to bear this in mind.
4: Author’s note:
These notes are based on the very first session of the part-time evening post-graduate course Survey Analysis Workshop which I taught at PNL from 1976 until 1992.
We used to introduce ourselves briefly with name, institutional affiliation (if any), previous qualifications and experience, and reasons for coming on the course. An all too frequent lament was “I’ve got a degree in Sociology and I want a job!”, but even the ones with jobs often had received little or no training in statistical or technical skills in their undergraduate courses (much of which was inadequate). Some of these wanted a better job and/or had been sent by their employers (mainly central and local government or the voluntary sector). Some were MPhil or PhD students.
Although I explained that there was no need for them to take notes, as everything was covered in the course booklets, this did not prevent some of them from scribbling furiously away.
I started by listing a few things typically measured by questionnaires (some solicited from the class) across the top of the (double-width) board. I then divided the board into columns defined by the items listed and filled in the responses of imaginary respondents (with a running commentary on the kind of thing they might mutter before giving a response to a questionnaire-clutching stranger knocking at the door just as they’ve settled down to watch their favourite TV programme) to yield something like the small data matrix in section 1 above. At this point I would write VARIABLES across the top, CASES down the side, note that the entries inside the matrix were VALUES and that we had just generated a DATA MATRIX.
This constituted the first introduction to formal terminology and to some keywords in the SPSS language. I then explained that, whilst it is possible for computers to work with text responses, it is normal and a lot quicker to work with numbers and so the non-numeric responses needed to be coded using numeric codes to represent the original responses.
Leaving the original responses on the board, using a different colour chalk (yes, chalk!) and referring to an imaginary coding frame (perhaps pausing to explain what a coding frame was) I would write in a numeric code alongside each response. After cleaning off the original text responses, leaving only the numeric codes, students were asked if they had any comments on, or noticed anything about, the numbers in the matrix, but this was usually met by blank expressions all round. (Remember, these were mostly social science graduates!) I then asked whether there was any difference in the way numbers were used between two variables such as vote and number of children, or between height in centimetres and number of children (using a joke about average families having 2.4 children as a hint), or anything that, say, two sets of numbers had, but another didn’t. It took different amounts of time with different groups, but eventually someone would get the idea and by this “Socratic” process the class would arrive at the notion of levels of measurement without the phrase having once being mentioned, thus proving that they weren’t as innumerate as they thought.
This was followed by a session in the computer lab in which students familiarised themselves with the terminals and the line-printers, learned to log on to the Vax, copy a short pre-prepared SPSS job into their area, run it with a specially written front-end program to make it easier to use SPSS on the Vax, print out the results and return to class with the printout for a brief explanation. No student ever left empty-handed (but the spare copies always came in handy for one or two of them!) from this or later sessions, which greatly assisted motivation and subsequent learning.
Here endeth the first lesson. Bulgarian Cabernet Sauvignon all round! Let’s have a fun course.
Last updated 6 March 2006 Feedback and enquiries welcome on
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
For more information click here
[1] The program was written by Jim Ring while he was Senior Research Officer in the Survey Research Unit at PNL. It limited SPSS output files to two editions (to avoid users running out of disk space) and had excellent error trapping. If errors occurred it returned users to the point in their syntax file where they had left off, although SPSS didn’t always precisely identify the type of error. It greatly assisted students and researchers with a series of prompts in editing SPSS syntax files, correcting and running of SPSS jobs and local printing of results and enabled a great deal of work to be completed in a very short time.