The analysis of survey data is a massive topic, and most of this exotic landscape is beyond the purview of this article. The purpose of this paper is to offer some suggestions for the novice researcher, but even those with experience might find one or two of the tips useful. It doesn’t matter how the data is collected (in-person interviews, phone interviews, online surveys, smart-phone surveys, mail surveys, etc.), so long as the data comes from some type of survey.
So let’s pretend that your boss hands you a set of cross-tabulations and a study file and asks you to analyze the data and make some sense out of it. Where do you start? What do you look at? What’s important and what’s trivial? How can you make sense out of thousands of different numbers? What’s the story hidden in the data?
Mush and Slop
Surveys are largely attempts to transform human feelings, memories, perceptions and attitudes into numbers. That is, the goal of a survey is to convert the “mush and slop” of human emotions and memories into words, and then the words into numbers. It’s magic! It’s alchemy. One starts with emotional goo and ends up with statistical significance at the 99.7-percent level of confidence.
Remember, however, that survey data are not the perfect measurements that cross-tabs and precise numbers tend to imply. So the first rule of survey analytics is humility. Skepticism and doubt are your analytical friends: Don’t get seduced by small differences in the cross-tabs, whether statistically significant or not. Don’t become a member of the “faith in numbers” cult. Significance testing rests atop a foundation of quicksand. Remember, you are searching for the big picture, the big story. You are searching for elephants, not mosquitoes.
Analysis Abhors a Vacuum
If you don’t know why a survey was conducted, don’t know the study objectives, and don’t understand the business issues that led to the study, your chances of doing great analytical work are meager. So the first task is to talk to your boss or whoever commissioned the study to find out as much as possible about the “background.” What business issues prompted the study? What decisions will be based on the research? What are the expectations and beliefs of senior executives related to the topics of the survey?
For example, if senior executives believe that men are the primary consumers of the brand, the fact that women are the primary consumers becomes the “big story.” But if you had not talked to senior executives before the analysis, you might easily have missed this important point. You cannot analyze survey data in a vacuum. You must understand the market, the issues, the context, the motives, and the executive expectations to help guide your analytical thinking as you study the cross-tabulations.
God’s Random Sample
Before you dive into the cross-tabs and immerse yourself in the data, however, you must first understand the sample. When training employees, I always say, “God created the random sample; all other sampling schemes are the work of the devil.” The random sample is usually representative of the total market. As you move away from a random sample, many dastardly sins lurk in the sampling and respondent-selection process. Exactly who was surveyed, and exactly how were those respondents chosen? What were the sampling and screening criteria? Who was included, and who was excluded, from the final sample?
If looking at survey data on the U.S. automotive market, for example, it would be essential to know that respondents who favored foreign cars were excluded from the sample (we actually observed this recently). Often, marketing and advertising executives only want to survey the primary target audience (say 18- to 24-year-old men), when this group constitutes only a small part of the total market. In this case, you can only analyze what is happening in the 18- to 24-year-old male segment, and you cannot generalize these results to any other groups. Review the sampling plan and know it inside and out.
Inequality of Questions
The next thing you must master is the questionnaire itself. Exactly what questions were asked in what order? Are the questions precise and easy to understand, or nebulous and confusing? You will want to give extra weight to the questions that are clear and precise, and ignore or discount those of dubious value. Are questions worded in a fair and balanced way, or do some of the questions “lead the witness” or bias respondents toward one answer over another?
For example, if respondents are shown a sequence of items (products, concepts, ads), those things shown first score higher (all other factors being equal). A questionnaire is made up of predictive questions (e.g., intent to buy), diagnostic questions (why did you prefer Coke over Pepsi?), descriptive questions (brand image ratings), and classification questions (demographics). By studying the questionnaire, you can identify the questions most likely to yield the very best information relative to your analytic objectives, and your analysis should be slanted toward or based on these “best” questions. An analysis is not, I repeat, not a summary of the answers to all of the questions. An analysis is an examination of the answers important and relevant to the purposes of your survey. Your final report may only include the results from a fraction of the questions in the survey if that is all you need to tell the big, important story.
Data Accuracy Supreme
Before you start searching for the big story in the cross-tabulations, you must personally verify that the data is accurate. So spend some time going through the cross-tabs to make sure all of the numbers add up and make sense. If you have any doubts or suspicions, then request a thorough re-examination of the data to make certain it is accurate. If the study design, questionnaire, or sampling plan was complicated, be even more suspicious of the data.
Heavy Users and Other Oddities
Marketing research surveys are very democratic. Each respondent gets one vote. The purely democratic model, despite its political strengths, can lead you astray in data analysis. Some groups of consumers have more income than others. Some groups of consumers buy your brand more frequently. In fact, in almost every product category, there is a relatively small group (10 percent to 20 percent of category users) that consumes the bulk of the category (accounting for 70 percent to 80 percent of total category sales volume).
Thus, always keep an eye on how these “heavy users” are answering the survey questions. Their answers are far more important than answers of the light users. Also, keep in mind that heavy users are typically buying the product more frequently, and typically are much more knowledgeable about the product category.
Percentage Points of Change
Researchers and writers of all sorts get confused and tangled up when trying to analyze changes in percentages. Let’s suppose that you conducted a phone survey and measured Total Advertising Awareness for a brand at 20 percent in January. Six months later you repeated the survey and Total Advertising Awareness increased to 30 percent. Some would report this as a 10-percent increase (30 percent minus 20 percent equals 10 percent), while others would report it as a 50-percent increase (10-percent increase divided by 20-percent base equals 50 percent). Which is correct, and what’s the best way to report this change in Total Advertising Awareness?
First, you should report exactly what happened: Total Advertising Awareness increased from 20 percent to 30 percent. You should describe this increase as a “10 percentage points” increase (not a 10-percent increase). Always use the term “percentage points” when reporting an increase or decrease in the level of a percentage, or comparing two percentages. If you also want to report the ratio of change (i.e., the 50-percent increase in our example), always show your math so the reader understands exactly what you are doing.
The research industry loves answer scales. These scales typically range from three-point scales (e.g., “very true,’’ “somewhat true,’’ “not true’’), up to five-point scales (e.g., the standard purchase-intent scale: “definitely buy,’’ “probably buy,’’ “might or might not buy,’’ “probably not buy,’’ “definitely not buy’’), up to 11-point scales (0 to 10) or even 100- (1 to 100) or 101-point scales (0 to 100).
Scales allow us to more precisely turn words and phrases into numbers. Once you have the answers, however, you are faced with an immediate problem: How do you comprehend and present the results, since you have respondents choosing many different points on the scale? One way is to aggregate some of the points (or boxes) on the scale, and report a “top-two box” percentage, or “top-three box” percentage. This may be the greatest sin ever committed by the research industry, because it implies that all of the boxes are equal in value (that is, that the answers on the top end of the scale are all equal in value, when the whole premise of the scale is just the opposite). To illustrate, here are two sets of answers where the “top-two box” scores are equal, although it’s obvious the results are quite different:
This is an extreme example, but it clearly illustrates the folly of the top-two-box percentage (“definitely buy” percentage plus “probably buy” percentage). It’s obvious that Product B is much more likely to be purchased than Product A, but the top-two-box percentage equals 60 percent for both, indicating that the two products are equal in purchase propensity. The top-two-box percentage (and similar aggregates) should be banished from marketing research forever.
What’s a better way? Well, you could just use an average of the answer code values. In the above example, a “definitely buy” answer would have a value of 5, and a “probably buy” a value of 4, and so on. The weighted average (number of respondents choosing each answer multiplied by the corresponding answer code value, summed, and then divided by total number of respondents) would discriminate between Products A and B. So it’s a better measure than top-two-box, but what’s the rationale for assigning a value of 5 to “definitely buy” answers and a value of 4 to “probably buy”?
Answer codes tend to be sequential numbers to help the computer understand the answer chosen, but generally answer codes do not accurately reflect the relative value of the words in the answer scale. Is a “definitely buy” only worth 25 percent more than a “probably buy” answer? That’s what is suggested by the answer codes (5 divided by 4, less 1.00). A better approach is to assign values to the answer codes that best reflect the relative meaning of the points on the scale. So in the example above you might decide to count 100 percent of all “definitely buys” and only 60 percent of the “probably buys,” because you think the “definitely buy” is quite a bit more important than a “probably buy.”
Whoa! Wait a minute, you say. You are advocating the intervention of human judgment to change the results. That is true. Remember, however, that a series of human judgments gave you the original numbers, so it’s okay to make new human judgments to create new numbers (hopefully ones closer to the truth). Always explain to the reader what you are doing and why. Transparency is next to godliness.
Graphs and Tables
The purpose of a graph, chart or table is to communicate; its purpose is not to dazzle the reader with how smart, creative and complicated you are. The current fashion in corporate America is to jam as much confusing information as possible onto every graph, so that everyone will understand just how clever the presenter is.
Even if a complicated chart is clear to the presenter, he will never be able to explain it to senior-level executives. The gods created the whole universe out of electrons, protons and neutrons (just three things). Keep your graphs and charts as simple as possible
The Findings Found
As you assemble the graphs, charts and tables that illustrate or support the most important findings, you will want to concisely state these findings atop the supporting chart or graph. Think of these “findings” as conclusions, as hypotheses: The finding should be the core, most important meaning of the data in the chart or table. The finding at the top of your chart or table should not be a recitation of the numbers in the chart or table.
That’s the whole purpose of the table or chart, to present the foundational data in an easy-to-understand format.No need to repeat the data in the finding.
Here is an example of a bad finding: “Awareness of Skippy Peanut Butter is 58 percent. Total awareness of Peter Pan Peanut Butter is 35 percent. Total awareness of Jif Peanut Butter is 21 percent, as shown in the table below.’’ This finding is merely a repetition of the data in the supporting table or graph. It adds nothing. It offers no conclusions or interpretations of what the data means.
Here is an example of a better finding: “Skippy Peanut Butter enjoys dominant brand awareness. Its total brand awareness is almost double the next brand, Peter Pan. This dominant brand awareness tends to create a positive halo around Skippy Peanut Butter, as demonstrated in later charts.
The second finding is better than the first. It reaches the conclusion that Skippy Peanut Butter dominates brand awareness and points out an important implication of that dominance: a positive halo for the brand and its image.
The findings are the building blocks of your analytic report. Each finding should be concise and singular, and focus on a single, most important point or implication. In addition, each finding should have some supporting role for your final conclusions and recommendations.
Conclusions and Recommendations
Once you have assembled the foundation of building blocks, the findings, think deeply about what the findings mean overall, and write down the most important conclusions. The conclusions should number no more than five to 10. These conclusions are the findings, or are based on the findings, to help support and provide the rationale for the recommendations to follow. The recommendations must address and answer, if possible, the original objectives and goals for the study.
Once the recommendations have addressed the study’s goals, then you might offer other recommendations. What’s the optimal target market? How should the brand be positioned? How can competitive threats be minimized? How can the product be improved? The recommendations section is where you pull all the pieces of the report together, and show your boss what a great analyst and writer you are.
by Jerry W. Thomas
Jerry W. Thomas is president and chief executive of Decision Analyst. He can be reached at 817-640-6166 or via email at
Decision Analyst is a leading international marketing research and analytical consulting firm. Along with its other services the company the firm conducts multinational studies on an array of topics, spanning industries from packaged goods to high technology.
31 May 2011