Questions focus on the problem we are trying to solve, while answers are more closely associated with the measurement scale we are using to achieve our analysis.
Survey research, as a source of marketing information addresses many topics of practical interest, including concept testing for new products, corporate image measurement, ad copy evaluation, purchase intentions, customer satisfaction, and so forth. Regardless of the research topic, useful data is obtained only when the researcher exercises care in making procedural decisions such as :
1. Defining what is to be measured
2. Deciding how to make the measurements
3. Deciding how to conduct the measuring operations
4. Deciding how to analyze the resulting data
Definitions and decisions play a significant role in scientific inquiry, especially in marketing research and the behavioral sciences.
In the first section of this chapter, we focus on conceptual and operational definitions and their use in research. Increasingly, behavioral scientists are paying greater attention to defining the concepts measured in their specific disciplines, and refining operational definitions that specify how to measure and quantify the variables defining the concepts
In the next section we discuss measurement scales and their relationship to the interpretation of statistical techniques. This section serves as useful background for the discussion of statistical techniques covered in later chapters. We then discuss the pragmatics of writing good questions.
The overall quality of a research project depends not only on the appropriateness and adequacy of its research design and sampling techniques, but also on the measurement procedures used. The third section of this chapter looks at measurement error and how we may control the reliability and validity in these measurements.
Definitions in Marketing Measurement
Marketers measure marketing program success as increased brand awareness, ad awareness, ratings of brand likeability and uniqueness, new product concept ratings and purchase intent, and customer satisfaction (Morgan, 2003). Researchers will often model these constructs
Models are representations of reality and therefore raise the fundamental question of how well each model confidently represents reality on all significant issues. The quality of a model is judged against the criteria of validity and utility. Validity refers to a model’s accuracy in describing and predicting reality, whereas utility refers to the value it adds to the making of decisions. A sales forecasting model that does not forecast sales with reasonable accuracy is probably worse than no sales forecasting model at all.
Model quality also depends on completeness and validity, two drivers of model accuracy. Managers should not expect a model to make decisions for them, but instead models should be viewed as one additional piece of information to help make decisions
Clearly, managers will probably benefit from models that are simple enough to understand and deal with. But models used to help make multi-million-dollar decisions should be more complete than those used to make hundred-dollar decisions. The required sophistication of a model to be used depends on the model’s purpose. We measure the value of a model based on its efficiency in helping us arrive at a decision. Models should be used only if they can help us arrive at results faster, with less expense, or with more validity
Building Blocks for Measurement and Models
We cannot measure an attitude, a market share, or even sales, without first specifying how it is defined, formed, and related to other marketing variables. To better understand this, we must briefly study the building blocks of measurement theory: concepts, constructs, variables, operational definitions, and propositions.
Concepts and Constructs
A concept is a theoretical abstraction formed by a generalization about particulars. “Mass”, “strength”, and “love” are all concepts, as are “advertising effectiveness”, “consumer attitude”, and “price elasticity”. Constructs are also concepts, but they are observable, measurable, and are defined in terms of other constructs. For example, the construct “attitude” may be defined as “a learned tendency to respond in a consistent manner with respect to a given object of orientation.”
Variables
Researchers loosely call the constructs that they study variables. Variables are constructs in measured and quantified form. A variable can take on different values (i.e., it can vary).
Operational Definitions
We can talk about “consumer attitudes” as if we know what it means, but the term makes little sense until we define it in a specific, measurable way. An operational definition assigns meaning to a variable by specifying what is to be measured and how it is to be measured. It is a set of instructions defining how we are going to treat a variable. For example, the variable “height” could be operationally defined in a number of different ways, including measures in inches with a precision ruler with the person (1) wearing shoes, or (2) not wearing shoes, (3) by an altimeter or barometer, or (4) for a horse, by the number of “hands”
As another example, measuring “purchase intentions” for Brand X window cleaner might be operationally defined as the answer to the following question :
|
P |
≈ |
BI |
= |
∑Ai |
× |
Bi |
|
Purchase |
|
Purchase |
|
Attitudes |
|
Importance
of |
|
Behavior |
|
Intention |
|
about
Brand X Window |
|
Attributes
of |
|
Toward |
|
For
Brand X |
|
Cleaner
Attributes |
|
Brand
X Window Cleaner |
Propositions
A proposition defines the relationships between variables, and specifies both the variables influencing the relationship and the form of the relationship. It is not enough to simply state that the concept “sales” is a function of the concept “advertising”, such that S = f (Adv). Intervening variables must be specified, along with the relevant ranges for the effect, including where we would observe saturation effects, threshold effects, and the symbolic form of the relationship.
Integration into a Systematic Model
A proposition is quite similar to a model. A model is produced by linking propositions together to provide a meaningful explanation for a system or a process. When concepts, constructs, variables and propositions are integrated into a model for a research plan, we should conceptually ask the following questions :
Are concepts and propositions specified?
Are the concepts relevant to solving the problem at hand?
Are the principal parts of the concept clearly defined?
Is there consensus as to which concepts are relevant in explaining the problem?
Are the concepts properly defined and labeled?
Is the concept specific enough to be operationally reliable and valid?
Do clear assumptions made in the model link the concepts?
Are the limitations of the model stated?
Can the model explain and predict?
Can the model provide results for managerial decision making?
Can the model be readily quantified?
Are the outcomes of the model supported by common sense?
If the model does not meet the relevant criteria, it probably should be revised. Concept definitions may be made more precise; variables may be redefined, added, or deleted ; operational definitions and measurements may be tested for validity; and/or mathematical forms revised.
Inaccuracies In
Measurement
Before delving into measurement scales and question types, it is helpful to remember that measurements in marketing research are rarely “exact.” Inaccuracies in measurement arise from a variety of sources or factors. A portion of this variation among individual scores may represent true differences in what is being measured, while other variation may be error in measurement. For any given research project, not all will necessarily be operative, but the many possible sources causing variations in respondent scores can be categorized as follows :
|
- |
True
differences in the characteristic or property |
|
- |
Other
relatively stable characteristics of individuals which affect scores (intelligence,
extent of education, information processed) |
|
- |
Transient
personal factors (health, fatigue, motivation, emotional strain) |
|
- |
Situational
factors (rapport established, distractions that arise) |
|
- |
Variations
in administration of measuring instrument, such as interviewers |
|
- |
Sampling
of items included in the instrument |
|
- |
Lack
of clarity (ambiguity, complexity, interpretation of words and context) |
|
- |
Mechanical
factors (lack of space to record response, appearance of instrument) |
|
- |
Factors
in the analysis (scoring, tabulation, statistical compilation) |
|
- |
Variations
not otherwise accounted for (chance), such as guessing an answer |
In the ideal situation, variation within a set of measurements would represent only true differences in the characteristic being measured. For instance, a company wanting to measure attitudes toward a possible new brand name and trademark would like to feel confident that measurement differences concerning the proposed names represent only the individuals’ differences in this attitude. Obviously the ideal situation for conducting research seldom, if ever exists. Measurements are often affected by characteristics of individual respondents such as intelligence, education level, and personality attributes. Therefore, the results of a study will reflect not only differences among individuals in the characteristic of interest but also differences in other characteristics of the individuals. Unfortunately, this type of situation cannot be easily controlled unless the investigator knows all relevant characteristics of the population members such that control can be introduced through the sampling process.
There are many influences in a measurement other than the true characteristic of concern—that is, there are many sources of potential error in measurement. Measurement error has a constant (systematic) dimension and a random (variable) dimension. If the error is truly random, (it is just as likely to be greater than the true values as less) then the expected value of the sum of all errors for any single variable will be zero, and therefore less worrisome than nonrandom measurement error (Davis, 1997). Systematic error is present because of a flaw in the measurement instrument or the research or sampling design. Unless the flaw is corrected, there is nothing the researcher can do to get valid results after the data are collected. These two subtypes of measurement error affect the validity and reliability of measurement, topics that are discussed in the later part of this chapter. But now that we are aware of the conceptual building blocks and errors in measurement that should be considered in developing measurement scales, we will consider the types of measurement and associated questions that are commonly used in marketing research today.
Measurement Concept
Measurement can be defined as a way of assigning symbols to represent the properties of persons, objects, events, or states. These symbols should have the same relevant relationship to each other as do the things they represent. Another way of looking at this is that measurement is the assignment of numbers to objects to represent amounts or degrees of a property possessed by all of the objects” (Torgerson, 1958, p. 19). If a characteristic, property, or behavior is to be represented by numbers, a one-to-one correspondence between the number system used and the various quantities (degrees) of that being measured must exist. There are three important characteristics or features of the real number series :
1. Order. Numbers are ordered.
2. Distance. Differences exist between the ordered numbers.
3. Origin. The series has a unique origin indicated by the number zero.
A scale of measurement allows the investigator to make comparisons of amounts and changes in the variable being measured. It is important to remember that it is the attributes or characteristics of objects we measure, not the objects themselves
Primary Types of Scales
To many people, the term scale suggests such devices as a bathroom scale, pan balances, yard sticks, gasoline gauges, measuring cups, and similar instruments for finding length, weight, volume, and the like. We ordinarily tend to think about measurement in the sense of well-defined scales possessing a natural zero and constant unit of measurement. In the behavioral sciences (including marketing research), however, we must frequently settle for less-precise data. Scales can be classified into four major categories, designated as Nominal, Ordinal, Interval, and Ratio scales.
Each scale possesses its own set of underlying assumptions about order, distance and origin, and how well the numbers correspond with real-world entities. As our rigor in conceptualizing concepts increases, we can upgrade our measurement scale. One example is the measurement of color. We may simply categorize colors (nominal scale), or measure the frequency of light waves (ratio scale).
The specification of scale is extremely important in all research, because the type of measurement scale dictates the specific analytical (statistical) techniques that are most appropriate to use in analyzing the obtained data.
Nominal Scales
Nominal scales are the least restrictive and, thus, the simplest of scales. They support only the most basic analyses. The nominal scale serves only as labels or tags to identify objects, properties or events. The nominal scale does not possess order, distance, or origin. For example, we can assign numbers to baseball players. We have a one-to-one correspondence between number and player and are careful to make sure that no players receive the same number (or that a single player is assigned two or more numbers). The classification of supermarkets into categories that “carry our brand” versus those that “do not carry our brand” is further illustration of the nominal scale.
It should be clear that nominal scales permit only rudimentary mathematical operations. We can count the stores that carry each brand in a product class and find the modal (highest number of mentions) brand carried. The usual statistical operations involving the calculations of means, standard deviations, etc. are not appropriate or meaningful for nominal scales.
Ordinal Scales
Ordinal scales are ranking scales and possess only the characteristic of order. These scales require the ability to distinguish between objects according to a single attribute and direction. For example, a respondent may be asked to rank a group of floor polish brands according to “cleaning ability”. An ordinal scale results when we assign the number 1 to the highest-ranking polish, 2 to the second-highest ranking polish, and so on. Note, however, that the mere ranking of brands does not quantify the differences separating brands with regard to cleaning ability. We do not know if the difference in cleaning ability between the brands ranked 1 and 2 is larger, less than, or equal to the difference between the brands ranked 2 3. In dealing with ordinal scales, statistical description can employ positional measures such as the median, quartile, and percentile, or other summary statistics that deal with order among.
An ordinal scale possesses all the information of a nominal scale in the sense that equivalent entities receive the same rank. Also, like the nominal scale, arithmetic averaging is not meaningful for ranked data.
Interval Scales
Interval scales possess a constant unit of measurement and permit one to make meaningful statements about differences separating two objects. This type of scale possesses the properties of order and distance, but the zero point of the scale is arbitrary. Among the most common examples of interval scaling are the Fahrenheit and Centigrade scales used to measure temperature, and various types of indexes like the Consumer Price Index. While an arbitrary zero is assigned to each temperature scale, equal temperature differences are found by scaling equal volumes of expansion in the liquid used in the thermometer. Interval scales permit inferences to be made about the differences between the entities to be measured (say, warmth), but we cannot meaningfully state that any value on a specific interval scale is a multiple of another.
An example should make this point clearer. It is not empirically correct to say that 50°F is twice as hot as 25°F. Converting from Fahrenheit to Centigrade, we find that the corresponding temperatures on the centigrade scale are 10°C and –3.9°C, which are not in the ratio 2:1. We can say, however, that differences between values on different temperature scales are multiples of each other. That is, the difference of 50°F–0°F is twice the difference of 25°– 0°F. The corresponding differences on the Centigrade scale are 10°C – (–17.7°C) = 27.7°C and – 3.9°C – (–17.7°C) = 13.8°C are in the same 2:1 ratio.
Interval scales are unique up to a transformation of the form y = a + bx; b > 0. This means that interval scales can be transformed from one to another by adding or multiplying a constant. For example, we can convert from a Fahrenheit to Celsius using the formula :
TC = 5/9 (TF – 32)
Most ordinary statistical measures (such as arithmetic mean, standard deviation, and correlation coefficient) require only interval scales for their computation
Ratio Scales
Ratio scales represent the elite of scales and contain all the information of lower-order scales and more besides. These are scales like length and weight that possess a unique zero point, in addition to equal intervals. All types of statistical operations can be performed on ratio scales
An example of ratio-scale properties is that 3 yards is three times 1 yard. If transformed to feet, then 9 feet and 3 feet are in the same 3:1 ratio. It is easy to move from one scale to another merely by applying an appropriate positive multiplicative constant; this is the practice followed when changing from grams to pounds or from feet to inches.
Relationships Among Scales
To provide some idea of the relationships among nominal, ordinal, interval, and ratio scales, the marketing researcher who uses descriptive statistics (arithmetic mean, standard deviation) and tests of significance (t-test, F-test) should require that the data are (at least) interval-scaled
From a purely mathematical point of view, you can obviously do arithmetic with any set of numbers—and any scale. What is at issue here is the interpretation and meaningfulness of the results. As we select more powerful measurement scales, our abilities to predict, explain, and otherwise understand respondent ratings also increase.
Table 10.1 Scales of Measurement
|
Scale |
Mathematical Group Structure |
Permissible
Statistics |
Typical
Elements |
|
Nominal |
Permutation group y = f(x), where f(x) means any one-to-one correspondence |
Mode Contingency Coefficient |
Numbering of football players Assignment of type or model numbers to classes |
|
Ordinal |
Isotonic group y = f(x),
where f(x) means any strictly
increasing function |
Median Percentile Order
correlation Sign
test; run test |
Hardness of minerals Quality of leather, lumber, wool, etc. Pleasantness of odors |
|
Interval |
General linear
group y = a+bx b
> 0 |
Mean Average
deviation Standard
deviation Product-moment
correlation t-test, F-test |
Temperature (Fahrenheit and centigrade) EnergyCalendar dates |
|
Ratio |
Similarity
group y = cx c
> 0 |
Geometric mean Harmonic mean Coefficient
of variation |
Length, width, density, resistance Pitch scale, loudness scale |
( Stevens 1946, p. 678)
Basic Question and Answer Formats
Underlying every question is a basic reason for asking it. This reason reflects the construct to be measured, the problem to be solved or hypothesis to be tested. Constructing a question that reflects this reason will result in a higher probability that the desired response will be obtained. Table 10.2 shows nine different types of questions (based on the nature of content), the broad reason underlying asking each type of question, and some examples of each type.
Table 10.2 Basic Question Types
|
Type of Question |
Goal of Question |
Positioning of Question |
|
Factual or behavioral |
To
get information. |
Questions beginning
with what, where, when, why, who and how. |
|
Explanatory |
To get additional information or to broaden discussion. |
How would that help? How would you go about doing that? What other things should be considered? |
|
Attitudinal |
To get perceptions, motivations, feelings, etc., about an object or topic. |
What do you believe to be the best? How strongly do you feel about XYZ? |
|
Justifying |
To get proof to challenge old ideas and to get new ones. |
How do you know? What makes you say that? |
|
Leading |
To
introduce a thought of your own. |
Would this be a possible solution? What do you think of this plan? |
|
Hypothetical |
To
use assumptions or suppositions |
What would
happen if we did it this way? If it came in blue would you buy it today? |
|
Alternative |
To
get a decision or agreement. |
Which of these
plans do you think is best? Is one or two o’clock best for you? |
|
Coordinative |
To develop common agreement. To take action. |
Do
we all agree that this is our next step? |
|
Comparative |
To compare alternatives or to get a judgment anchored by another item. |
Is baseball more
or less exciting to watch on TV than soccer? |
Based on this structure, and the information in Table 10.3, which deals with standard answer formats, we are able to distinguish four basic question/answer types :
1. Free-answer (open-ended text)
2. Choice answers: dichotomous, single choice and multiple choice (select k of n)
3. Rank order answers
4. Constant sum answers
Table 10.3 Standard Answer Formats Based on Task
|
Measurement Scale* |
Format Type |
Description |
|
N,O,I |
Select
1/n—pick-1 : |
The
respondent is given a list of n options and is required to choose one
option only. |
|
N,O |
Select
k/n—pick-k : |
The respondent gets a set of n options to select from but this time chooses up to k options (k =n). |
|
N,O |
Select k1/k2/n—pickand-Pick : |
The respondent
is asked to select k1 options in Category 1 and k2 options in
Category 2. Each option can be selected in only one of the two categories. |
|
N, I |
Sort
and rank : |
The respondent
picks k items and allocates them into L buckets, then items allocated to each
bucket are assigned ranks |
|
N,O |
Rank
k/n—rank : |
In
this question the respondent gets n options and is asked to rank the
top k (k = n). |
|
N,O |
Select k1/n
and Rank k2/k1—pick
and rank : |
This question
type is similar to pick-k, but in addition to selecting k1
options from a list of n options, the respondent is then asked to rank
some fraction, k2/k1 of those selected |
|
O,I |
Integer
Rating : |
The respondent is asked to rate on a linear scale of 1 to n the description on the screen or accompanying prop card (for example, 1 for completely disagree to 5 for completely agree). Only integer responses are accepted. |
|
O,I |
Continuous
Rating : |
This is
similar to integer rating, except that the response can be any number (not
necessarily an integer number) within the range (for example, 5.2 on a scale
of 0 to 10). |
|
R |
Constant
Sum : |
The respondent
is provided with a set of attributes (5, 10, etc.) and is asked to distribute
a total of p points across those attributes. |
|
N,O |
Yes/No
: |
This
question entails a yes/no answer and is of course, a Select 1/2---pick-1
question type. |
|
I |
Integer—integer-#: |
The respondent is asked for a fact that can be expressed in integer number form. A valid range can be provided for error checking. Example : Age. |
|
I,R |
Real—real-#: |
Similar to
integer-# except that the answer expected is in the form of a real (not
necessarily an integer) number. Example: Income. A valid range can be
provided for error checking |
|
C |
Character: |
The
respondent types in a string of characters as a response. Example : Name. |
|
I |
Multiple
Integer Ratings
: |
This question
type is identical to integer-scale except that multiple questions (classified
as “options”) can appear on a single screen. Each question is answered and
recorded separately. |
|
I,R |
Multiple Real Number
Ratings : |
This question
type is identical to real-scale except that multiple questions (classified as
“options”) can appear on a single screen. Each question is answered and
recorded separately. |
*Legend: N = Nominal, O = Ordinal, I = Interval, R = Ratio, C = Alpha-Numeric Text Characters
Free Answer or Open-Ended Text Questions/Answers
The free answer (or open-ended text question) has no fixed alternatives to which the answer must conform. The respondent answers in his or her own words and at the length he or she chooses, subject of course to any limitations imposed by the questionnaire itself. Interviewers are usually instructed to make a verbatim record of the answer
While free-answer questions are usually shorter and less complex than multiple-choice and dichotomous questions, they place greater demands on the ability of the respondents to express themselves. As such, this form of question provides the opportunity for greater ambiguity in interpreting answers. To illustrate, consider the following verbatim transcript of one female respondent’s reply to the question :
What suggestions could you make for improving tomato juice?
“I really don’t know. I never thought much about it. I suppose that it would be nice if you could buy it in bottles because the can turns black where you pour the juice out after it has been opened a day or two. Bottles break, though.”
Did she have “no suggestion”, “suggest packaging in a glass container”, or “suggest that some way be found to prevent the can from turning black around the opening”?
One way to overcome some of these problems, at least in personal and telephone surveys is to have interviewers probe respondents for clarity (rather than additional information). One practitioner has gone so far as to suggest that questionnaires should clearly instruct interviewers to probe only once for additional information, and to continue to probe for clarity until the interviewer understands a respondent’s reply.
Compared with other question forms (see Exhibit 10.1), we may tentatively conclude that the free-answer question provides the lowest probability of the questions being ambiguous, but the highest probability of the answers being ambiguous,.
Exhibit 10.1 Open-Ended Questions and Answers
The advantages of the open-ended format are considerable, but so are its disadvantages (Sudman and Bradburn, 1982). In the hands of a good interviewer, the open format allows and encourages respondents to give their opinions fully and with as much nuance as they are capable of. It also allows respondents to make distinctions that are not usually possible with the fixed alternative formats, and to express themselves in language that is comfortable for them and congenial to their views. In many instances it produces vignettes of considerable richness and quotable material that will enliven research reports.
The richness of the material can be a disadvantage if there is need to summarize the data into simple response categories. Coding of free-response material is known as content analysis and is not only time consuming and costly, but also introduces some amount of coding error.
Open-ended questions also take somewhat more time and psychological work to answer than closed questions. They also require greater interviewer skill to recognize ambiguities of response and to probe and draw respondents out, particularly those who are reticent and not highly verbal, to make sure that they give answers that can be coded. Open-ended response formats may work better with telephone interviews, where a close supervision of interview quality can be maintained, although there is a tendency for shorter answers to be given on the telephone. No matter how well controlled the interviewers may be, however, factors such as carelessness and verbal facility will generate greater individual variance among respondents than would be the case with fixed alternative response formats.
Dichotomous and Multiple-Choice Answers
The select k of n format is the workhorse of survey building, and provides the general form for both dichotomous and multiple-choice answer types. Three general forms of questions are frequently used :
Select Exactly 1 of n Answers:
When selecting k = 1/n, the type of answer scale is dependent on n, the number of answers. A dichotomous question has two fixed answer alternatives of the type “Yes/No”, “In favor/Not in favor”, “Use/Do not use”, and so on. The question quoted earlier, “Do you like the taste of tomato juice?”is an example of a dichotomous question. Multiple-choice questions are simply an extension of the dichotomous question that have more answer points and often take the form of an ordered or interval measurement scale.
Traditional multiple–choice answers also are of the select 1 of n answer form, but have more than two available answers. For example, an agreement scale could have three, five, or seven available answers
|
Three
answers |
: |
Agree/Neutral/Disagree |
|
Five
answers |
: |
Strongly
Agree/Agree/Neither/Disagree/Strongly Disagree |
|
Seven
answers |
: |
Very Strongly
Agree/Strongly Agree/Agree/Neither Agree nor Disagree/ Disagree/Strongly
Disagree/Very Strongly Disagree |
As with all select 1 of n answers, the specific text associated with the answer options is variable and could measure many different constructs such as affect (liking), satisfaction, loyalty, purchase likelihood, and so forth
Select Exactly k of n Answers
When questions are developed that accept or require multiple responses within a set of answers, the form “exactly k of n” or “as many as k of n” can be used. This general form asks the respondent to indicate that several answers meet the requirements of the question. In this case, the data collected would be of type categorical or even loosely ordered if presence or absence of a characteristic is being measured (data is coded as 0 if not selected, and 1 if selected). This type of question might be
A variable number of answers may also be appropriate, particularly where long lists of attributes or features are given. In these cases, the respondent is asked to select as many as k of the n possible answers, where k can be any number from 2 to n. For example, in the previous question, the respondent could select as many as three (one, two, or three) of 10 possible answers. The question might be reworded to read something like . . .
Please identify which service activities are most likely to be outsourced in the next 12 months (check all that apply).
Whitlark and Smith (2004) show an application of pick k of n data that asks respondents to pick a small number of attributes that they feel best describe a brand from a list of 10, 20, or even 30 attributes. Collecting the pick data is much faster than asking a respondent to rate brands with respect to a long list of attributes. In an online survey environment, respondents can quickly scan down columns or across screens and quickly complete the pick data task for a familiar brand, thereby saving time and reducing respondent fatigue and dropout rates.
Having people describe a brand by picking attributes from a list is a quick and simple way to assess brand performance and positioning. Whitlark and Smith (2004) show that when respondents are asked to pick from one third to one half of the viewed items, the pick k of n data can be superior to scaled data in terms of reliability and power to discriminate between attributes.
Rank-Order Questions/Answers
The next level of measurement rank-orders the answers and thereby increases the power of the measurement scale over categorical measurement by including the characteristic of order in the data. Whereas the categorical data associated with many dichotomous or multiple-choice items does not permit us to say that one item is greater than another, rank-order data allows for the analysis of differences. Rank-order questions use an answer format that requires the respondent to assign a rank position to all items, or a subset of items in the answer list. The first, second, and so forth up to the nth item would be ordered. Procedures for assigning position numbers can be very versatile, resulting in different types of questions that can be asked. Typical questions might include identifying preference rankings, or attribute associations from first to last, most recent to least recent or relative position (most, next most, and so forth, until either a set number of items is ordered or all items may be ordered).
When this type of question is administered online or using a CATI (Computer Aided Telephone Interviewing) system, additional options for administration may include randomization and acceptance/validation of ties in the ranking. Randomization of the answer list order helps to control for presentation order bias. It is well established that in elections, being the first in a ballot candidate list increases chances of receiving the voter’s election
Tied rankings are another issue to be considered for rank-order questions. When ties are permitted, several items may be evaluated as having the same rank. In general, this is not a good idea because it weakens the data. However, if ties truly exist, then the ranking should reflect this. Rank-order questions are generally a difficult type of question for respondents to answer, especially if the number of items to be ranked goes beyond five or seven.
Constant Sum Questions/Answers
A constant sum question is a powerful question type that permits collection of ratio data, meaning that the data is able to express the relative value or importance of the options (option A is twice as important as option B). This type of question is used when you are relatively sure of the answer set (i.e., reasons for purchase, or you want to evaluate a limited number of reasons that you believe are important). The following example of a constant sum question from Qualtrics, uses sliding scales to select a sum of 100 points :
Advanced Measurement And
Scaling Concepts
Continuing our discussion of scales, we now focus on some of the more common scaling techniques and models. We focus on broad concepts of attitude scaling—the study of scaling for the measurement of managerial and consumer or buyer perception, preference, and motivation. All attitude (and other psychological) measurement procedures are concerned with having people—consumers, purchasing agents, marketing managers, or whomever—respond about certain stimuli according to specified sets of instructions. The stimuli may be alternative products or services, advertising copy themes, package designs, brand names, sales presentations, and so on. The response may involve judging which copy theme is more pleasing than another, which package design is more appealing than another, what mental images do new brand names evoke, which adjectives best describe each salesperson, and so on.
Scaling procedures can be classified in terms of the measurement properties of the final scale (nominal, ordinal, interval, or ratio), the task that the subject is asked to perform, or in still other ways, such as whether the scale measures the subject, the stimuli, or both (Torgerson, 1958).
We begin with a discussion of various methods for collecting ordinal-scaled data (paired comparisons, rankings, ratings, etc.) in terms of their mechanics and assumptions regarding their scale properties. Then specific procedures for developing these actual scales are discussed. Techniques such as Thurstone Case V scaling, semantic differential, the Likert summated scale, and the Thurstone differential scale are illustrated. The chapter concludes with some issues and limitations of scaling.
Advanced Ordinal Measurement Methods
The variety of ordinal measurement methods includes a number of techniques :
Paired comparisons
Ranking procedures
Ordered-category sorting
Rating techniques
We discuss each of these data collection procedures in turn
Paired Comparisons
As the name suggests, paired comparisons require the respondent to choose one of a pair of stimuli that “has more of”, “dominates”, “precedes”, “wins over”, or “exceeds” the other with respect to some designated property of interest. If, for example, six laundry detergent brands are to be compared for “sudsiness”, a full set of paired comparisons would involve (n x n – 1)/2 = (6 x 5) / 2, or 15, paired comparisons (if order of presentation is not considered). Respondents are asked which one of each pair has the most sudsiness.
A sample question format for paired comparisons is shown in Table 10.4. The order of presentation of the pairs and which item of a pair is shown first are typically determined and/or presented randomly. Consider the following hypothetical brand names (and numerical categories): Arrow (1), Zip (2), Dept (3), Advance (4), Crown (5), and Mountain (6).
Table 10.4 Example of the Paired Comparisons Question
|
Brand |
|||||||
|
Brand |
|
Arrow |
Zip |
Advance |
Dept |
Crown |
Mountain |
|
Arrow |
X |
0 |
1 |
1 |
1 |
1 |
|
|
Zip |
1 |
X |
1 |
1 |
1 |
1 |
|
|
Advance |
0 |
0 |
X |
0 |
0 |
0 |
|
|
Dept |
0 |
0 |
1 |
X |
0 |
0 |
|
|
Crown |
0 |
0 |
1 |
1 |
X |
1 |
|
|
Mountain |
0 |
0 |
1 |
1 |
0 |
X |
|
|
Brand |
||||||||
|
Brand |
|
Zip |
Arrow |
Crown |
Mountain |
Dept |
Advance |
Total |
|
Arrow |
X |
1 |
1 |
1 |
1 |
1 |
5 |
|
|
Zip |
0 |
X |
1 |
1 |
1 |
1 |
4 |
|
|
Advance |
0 |
0 |
X |
1 |
1 |
1 |
3 |
|
|
Dept |
0 |
0 |
0 |
X |
1 |
1 |
2 |
|
|
Crown |
0 |
0 |
0 |
1 |
X |
1 |
1 |
|
|
Mountain |
0 |
0 |
0 |
1 |
0 |
X |
0 |
|
A cell value of 1 implies that a row brand exceeds the column brand, “0” otherwise.
Ranking Procedures
Ranking procedures require the respondent to order stimuli with respect to some designated property of interest. For example, instead of using the paired-comparison technique, respondent might have been asked to directly rank the detergents with respect to sudsiness. Similarly, ranking can be used to determine key attributes for services
In a survey conducted by Subaru of America, new Subaru car purchasers were asked questions regarding the purchase and delivery processes. One question required ranking :
Ordered-Category Sorting
Pick-Group-Rank is one of a variety of data collection procedures that have as their purpose the assignment of a set of stimuli to a set of ordered categories. For example, if 15 varieties of laundry detergents represented the stimulus set, the respondent might be asked to complete the following task :
The pick-group-rank could be used with ordered categories to sort all of a large list of items, where there is :
|
1. |
free
versus forced assignment of names to grouping categories |
|
2. |
free
verses forced assignment of stimuli to grouping categories |
|
3. |
the assumption
of equal intervals between category boundaries versus the weaker assumption
of category boundaries that are merely ordered with regard to the attribute
of interest |
|
1. |
Construct
definition—the construct being measured must be explicitly defined and the
key dimensions identified |
|
2. |
Item generation—statements
must be generated describing actual behaviors that would illustrate specific
levels of the construct for each dimension identified |
|
3. |
Item
testing—to unambiguously fit behavioral statements to dimensions |
|
4. |
Scale
construction—lay out the scale with behavioral statements as anchors |
|
1. |
Should negative
numbers be used? |
|
2. |
How many
categories should be included? |
|
3. |
Related to the number of categories is: Should
there be an odd number or an even number? That is, should a neutral
alternative be provided? |
|
4. |
Should the scale
be balanced or unbalanced? |
|
5. |
Is it desirable to not force a substantive
response by giving an opportunity to indicate “don’t know,” “no opinion,” or
something similar? |
|
6. |
What does one do about halo effects—that is,
the tendency of raters to ascribe favorable property levels to all attributes
of a stimulus object if they happen to like a particular object in general? |
|
7. |
How does one examine raters’ biases—for example, the tendency to use extreme values or, perhaps, only the middle range of the response scale, or to overestimate the desirable features of the things they like (i.e., the generosity error)? |
|
8. |
How should descriptive adjectives for rating
categories be selected? |
|
9. |
How anchoring phrases for the scale’s origin
should be chosen? |
|
1. |
Respondents’
subjective scale units may differ across each other, across testing occasions,
or both. |
|
2. |
Respondents’
subjective origins (zero points) may differ across each other, across occasions,
or both. |
|
3. |
Unit
and origin may shift over stimulus items within a single occasion |
|
4. |
Subjective
distance between stimuli may not equal one’s perception of the distance on the
scale. |
|
- |
Comparing
corporate images, both among suppliers of particular products and against an
ideal image of what respondents think a company should be |
|
- |
Comparing
brands and services of competing suppliers |
|
- |
Determining
the attitudinal characteristics of purchasers of particular product classes
or brands within a product class, including perceptions of the country of
origin for imported products |
|
- |
Analyzing the
effectiveness of advertising and other promotional stimuli toward changing attitudes |
Techniques for Scaling Respondents
|
1. |
The researcher
assembles a large number (e.g., 75 to 100) of statements concerning the
public’s sentiments toward travel and vacations. |
|
2. |
Each of the
test items is classified by the researcher as generally “favorable” or“unfavorable”
to the attitude under study. No attempt is made to scale the items; however,
a pretest is conducted that involves the full set of statements and a limited
sample of respondents. Ideally, the initial classification should be checked
across several judges. |
|
3. |
In the pretest
the respondent indicates approval (or not) with every item, checking
one of the following direction-intensity descriptors : |
|
4. |
Each
response is given a numerical weight (e.g., +2, +1, 0, -1, -2 or +1 to +5). |
|
5. |
The individual’s
total-attitude score is represented by the algebraic summation of weights
associated with the items checked. In the scoring process, weights are
assigned such that the direction of attitude— favorable to unfavorable—is
consistent over items. For example, if a + 2 were assigned to “strongly
approve/agree” for favorable items, a + 2 should be assigned to “strongly
disapprove/disagree” for unfavorable items. |
|
6. |
On the basis
of the results of the pretest, the analyst selects only those items that appear
to discriminate well between high and low total scorers. This may be
done by first finding the highest and lowest quartiles of subjects on the
basis of total score. Then, the mean differences on each specific item
are compared between these high and low groups (excluding the middle 50
percent of subjects). |
|
7. |
The 20 to 25
items finally selected are those that have discriminated “best” (i.e.,
exhibited the greatest differences in mean values) between high versus low
total scorers in the pretest. |
|
8. |
Steps 3
through 5 are then repeated in the main study. |
|
1. |
The
items should be clear, well-written, and contain a single idea |
|
2. |
The scale must
be appropriate to the population of people who use it, such as having an appropriate
reading level. |
|
3. |
The
items should be kept short and the language simple. |
|
4. |
Consider
possible biasing factors and sensitive items. |
|
1. |
American
people should always buy American-made products instead of imports. |
|
2. |
Only
those products that are unavailable in the United States should be imported. |
|
3. |
Buy
American-made products. Keep America working. |
|
4. |
American
products first, last and foremost. |
|
5. |
Purchasing
foreign-made products is un-American |
|
6. |
It
is not right to purchase foreign products |
|
7. |
A
real American should always buy American-made products |
|
8. |
We
should purchase products in America instead of letting other countries get
rich off us. |
|
9. |
It
is always best to purchase American products. |
|
10. |
There should be very little trading or purchasing of goods from other countries unless out of necessity. |
|
11. |
Americans should not buy foreign products, because this hurts American business and causes unemployment. |
|
12. |
Curbs should
be put on all imports |
|
13. |
It may cost me
in the long run, but I prefer to support American products. |
|
14. |
Foreigners
should not be allowed to put their products on our markets |
|
15. |
Foreign
products should be taxed heavily to reduce their entry into the United
States. |
|
16. |
We should buy from foreign countries only those products that we cannot obtain within our own country. |
|
17. |
American consumers who purchase products made in other countries are responsible for putting their fellow Americans out of work. |
|
1. |
Strength of Question Wording The wording of
questions is a critical consideration when obtaining information from respondents.
One study “should”, “could” and “might” was shown to three matched samples of
respondents (Payne, 1951, pp. 8–9). Do you think anything should be done to
make it easier for people to pay doctor or hospital bills? (82 percent
replied “Yes”.) For the sample shown the sentence with the word “could”, 77
percent replied “Yes”, and with “might”, 63 percent replied “Yes”. These
three words are sometimes used as synonyms, and yet at the extreme, responses
are 19 percentage points apart. Questions portraying a more descriptive and
positive position may show a large difference in the evaluation score. |
|
|
2. |
Avoid loaded or leading words or questions. Slight wording
changes can produce great differences in results. Could, Should, Might all
sound almost the same, but may produce a 20% difference in agreement to a
question (The supreme court could.. should.. might.. have forced the breakup
of Microsoft Corporation). Strong words that represent control or action,
such as prohibit produces similar results (Do you believe that congress
should prohibit insurance companies from raising rates?) Sometimes wording is
just biased: You wouldn't want to go to Rudolpho’s Restaurant for the company’s
annual party would you? |
|
|
3. |
Framing effects. Information
framing effects reflect the difference in response to objectively equivalent
information depending upon the manner in which the information is labeled or framed.
Levin, Schneider, and Gaeth (1998) and Levin et al. (2001) identify three
distinct types of framing effects : |
|
|
|
- |
Attribute
framing effects occur when evaluations of an object or product are more
favorable when a key attribute is framed in positive rather than negative
terms |
|
|
- |
Goal framing
effects occur when a persuasive message has different appeal depending on whether
it stresses the positive consequences of performing an act to achieve a
particular goal or the negative consequences of not performing the act. |
|
|
- |
Risky choice
framing effects occur when willingness to take a risk depends upon whether potential
outcomes are positively framed (in terms of success rate) or negatively
framed (in terms of failure rate). |
|
|
Which type of
potential framing effects should be of concern to the research designer
depends upon the nature of the information being sought in a questionnaire.
At the simplest level, if intended purchase behavior of ground beef was being
sought, the question could be framed as “80 percent lean” or “20 percent fat.”
This is an example of attribute framing. It should be obvious that this is
potentially a pervasive effect in question design, and is something that
needs to be addressed whenever it arises. More detailed discussion of these
effects is given by Hogarth (1982). |
|
|
4. |
Misplaced questions. Questions
placed out of order or out of context should be avoided. In general, a funnel
approach is advised. Broad and general questions at the beginning of the questionnaire
as a warm-up (What kind of restaurants do you most often go to?). Then more specific
questions, followed by more general easy to answer questions (like
demographics) at the
end of the questionnaire. |
|
|
5. |
Mutually non-exclusive response categories. Multiple
choice response categories should be mutually exclusive so that clear choices
can be made. Non-exclusive answers frustrate the respondent and make interpretation
difficult at best. |
|
|
6. |
Nonspecific questions. Do you like
orange juice? This is very unclear...do I like what? Taste, texture,
nutritional content, Vitamin C, the current price, concentrate, fresh
squeezed? Be specific in what you want to know about. Do you watch TV
regularly? (what is regularly?). |
|
|
7. |
Confusing or unfamiliar words. Asking about caloric content,
acrylamide, phytosterols, and other industry specific jargon and acronyms are
confusing. Make sure your audience understands your language level, terminology
and above all, what you are asking |
|
|
8. |
Non-directed questions give respondents excessive latitude. What
suggestions do you have for improving tomato juice? The question is about
taste, but the respondent may offer suggestions about texture, the type of
can or bottle, mixing juices, or something related to use as a mixer or in
recipes. |
|
|
9. |
Forcing answers. Respondents may not want, or may not be able to provide the information requested. Privacy is an important issue to most people. Questions about income, occupation, finances, family life, personal hygiene and beliefs (personal, political, religious) can be too intrusive and rejected by the respondent |
|
|
10. |
Non-exhaustive listings. Do you have all of the options
covered? If you are unsure, conduct a pretest using the "Other (please
specify) __________" option. Then revise the question making sure that
you cover at least 90% of the respondent answers. |
|
|
11. |
Unbalanced listings. Unbalanced
scales may be appropriate for some situations and biased in others. When
measuring alcohol consumption patterns, one study used a quantity scale that made
the heavy drinker appear in the middle of the scale with the polar ends
reflecting no consumption and an impossible amount to consume. However, we
expect all hospitals to offer good care and may use a scale of excellent,
very good, good, fair. We do not expect poor care. |
|
|
12. |
Double barreled questions. What is the
fastest and most convenient Internet service for you? The fastest is certainly
not the most economical. The double barreled question should be split into
two questions. |
|
|
13. |
Independent answers. Make sure answers are independent. For example the question "Do you think basketball players as being independent agents or as employees of their team?” Some believe that yes, they are both. |
|
|
14. |
Long questions. Multiple choice questions are the longest and most complex. Free text answers are the shortest and easiest to answer. When you Increase the length of questions and surveys,nyou decrease the chance of receiving a completed response. |
|
|
15. |
Questions on future intentions. Yogi Berra
(Famous New York Yankees Baseball Player) once said that making predictions
is difficult, especially when they are about the future. Predictions are
rarely accurate more than a few weeks or in some case months ahead. |
|
Validitiy And Reliability
Of Measurement
The content of a measurement instrument includes a subject, theme, and topics that relate to the characteristics being measured. However the measuring instrument does not include all of the possible items that could have been included. When measuring complex psychological constructs such as perceptions, preferences, and motivations, hard questions must be asked to identify the items most relevant in solving the research problem :
|
1. |
Do
the scales really measure what we are trying to measure? |
|
2. |
Do
subjects’ responses remain stable over time? |
|
3. |
If we have a variety of scaling procedures, are respondents consistent in their scoring over those scales that purport to be measuring the same thing? |
References
Albaum, G., Best, R., & Hawkins, D. (1981). Continuous vs. discrete semantic differential scales. Psychological Reports, 49, 83–86.
Bearden, W. O., & Netemeyer, R. G. (1999). Handbook of Marketing Scales (2nd ed.). Thousand Oaks, CA: Sage.
Churchill, G. A., Jr., & Peter, J. P. (1984, November). Research design effects on the reliability of rating scales: A meta analysis. Journal of Marketing Research, 21, 360–375
Coombs, C. H. (1964). A Theory of Data. New York: Wiley.