Marekting Research: Multivariate Analysis II

In this chapter we discuss two techniques that do not require data to be partitioned into criterion and predictor variables. Rather, it is the entire set of interdependent relationships that are of interest. We discuss factor analysis as a methodology that identifies the commonality existing in sets of variables. This methodology is useful to identify consumer lifestyle and personality types.

Third, we discuss two sets of multivariate techniques, multidimensional scaling and conjoint analysis, that are particularly well suited (and were originally developed) for measuring human perceptions and preferences. Multidimensional scaling methodology is closely related to factor analysis, while conjoint analysis uses a variety of techniques (including analysis of variance designs and regression analysis) to estimate parameters, and both techniques are related to psychological scaling (discussed in Chapter 9). The use of both multidimensional scaling and conjoint analysis in marketing is widespread.

An Introductin To The Basic Concepts Of Factor Analysis

Factor analysis is a generic name given to a class of techniques whose purpose often consists of data reduction and summarization. Used in this way, the objective is to represent a set of observed variables, persons, or occasions in the form of a smaller number of hypothetical, underlying, and unknown dimensions called factors.

Factor analysis operates on the data matrix. The form of the data matrix can be flipped (transposed) or sliced to produce different types, or modes, of factor analysis. The most widely used mode of factor analysis is the R-technique (relationships among items or variables are examined), followed distantly by the Q-technique (persons or observations are examined). These, together with other modes, are identified in Exhibit 14.1. “Creative” marketing researchers may find S - and T - techniques helpful when analyzing purchasing behavior or advertising recall data.The P- and O-techniques might be appropriate for looking at the life cycle of a product class, or perhaps even changes in demographic characteristics of identified market segments.

Exhibit 14.1 Modes of Factor Analysis

Six distinct modes of factor analysis have been identified (Stewart, 1981, p. 53). The alternative modes of factor analysis can be portrayed graphically. The original data set is viewed as a variables/persons/occasions matrix (a). R-type and Q-type techniques deal with the variables/persons dichotomy (b). In contrast, P-type and O-type analyses are used for the occasions/ variables situation and S-type and T-type are used when the occasions/persons relationship is of interest (c).

Factor analysis does not entail making predictions using criterion and predictor variables; rather, interest is centered on summarizing the relationships involving a whole set of variables. Factor analysis has three main qualities :

1.	The analyst is interested in examining the strength of the overall association among variables, in the sense that a smaller set of factors (linear composites of the original variable) may be able to preserve most of the information in the full data set. Often one’s interest will stress description of the data rather than statistical inference.
2.	No attempt is made to divide the variables into criterion versus predictor sets
3.	The models typically assume that the data are interval scaled.

The major substantive purpose of factor analysis is to search for (and sometimes test) structure in the form of constructs, or dimensions, assumed to underlie the measured variables. This search for structure is accomplished by literally partitioning the total variance associated with each variable into two components: (a) common factors and (b) unique factors. Common factors are the underlying structure that contributes to explaining two or more variables. In addition, each variable is usually influenced by unique individual characteristics not shared with other variables, and by external forces that are systematic (non-random) and not measured (possibly business environment variables). This non-common factor variance is called a unique factor and is specific to an individual variable. Graphically, this may appear as diagrammed in Figure 14.1, in which four variables are reduced to two factors that summarize the majority of the underlying structure, and four unique factors containing information unique to each variable alone.

The structure of the factors identified by the analysis will, of course, differ for each data set analyzed. In some applications the researcher may find that the variables are so highly correlated that a single factor results. In other applications the variables may exhibit low correlations and result in weak or ill-defined factors. In response to these eventualities, the researcher may revise the variable list and add additional variables to the factor analysis. The process of adding and eliminating variables is common in factor analysis when the objective of the analysis is to identify those variables most central to the construct and to produce results that are both valid and reliable. Behavioral and consumer researchers have employed these methods to develop measurement instruments such as personality profiles, lifestyle indexes, or measures of consumer shopping involvement. Thus, in addition to serving as a data reduction tool, factor analysis may be used for to develop behavioral measurement scales

We use a numerical example to illustrate the basic ideas of factor analysis. A grocery chain was interested in the attitudes (in the form of images) that customers and potential customers had of their stores. A survey of 169 customers was conducted to assess images. The information obtained included 14 items that were rated using a seven-category semantic differential scale. These items are shown in Table 14.1. The resulting data set would then be a matrix of 169 rows (respondents) by 14 columns (semantic differential scales). These data will be analyzed as R-type factor analysis.

Figure 14.1 The Concept of Factor Analysis

Table 14.1 Bipolar Dimensions Used in Semantic Differential Scales for Grocery Chain Study

Inconvenient location—Convenient location

Low-quality products—High-quality products

Modern—Old-fashioned

Unfriendly clerks—Friendly clerks

Sophisticated customers—Unsophisticated customers

Cluttered—Spacious

Fast check-out—Slow check-out

Unorganized layout—Organized layout

Enjoyable shopping experience—Unenjoyable shopping experience

Bad reputation—Good reputation

Good service—Bad service

Unhelpful clerks—Helpful clerks

Good selection of products—Bad selection of products

Dull—Exciting

Identifying The Factors

If we now input the raw data into a factor analysis program, correlations between the variables are computed, as is the analysis. Some relevant concepts and definitions for this type of analysis are presented in Exhibit 14.2.

A factor analysis of the 14 grocery-chain observed variables produces a smaller number of underlying dimensions (factors) that account for most of the variance. It may be helpful to characterize each of the 14 original variables as having an equal single unit of variance that is redistributed to 14 underlying dimensions or factors. In every factor analysis solution, the number of input variables equals the number of common factors plus the number of unique factors to which the variance is redistributed. In factor analysis, the analysis first determines how many of the 14 underlying dimensions or factors are common, and then the common factors are interpreted.

Exhibit 14.2 Some Concepts and Definitions of R-Type Factor Analysis

Factor Analysis	:	A set of techniques for finding the underlying relationships between many variables and condensing the variables into a smaller number of dimensions called factors.
Factor	:	A variable or construct that is not directly observable, but is developed as a linear combination of observed variables.
Factor Loading	:	The correlation between a measured variable and a factor. It is computed by correlating factor scores with observed manifest variable scores.
Factor Score	:	A value for each factor that is assigned to each individual person or object from which data was collected. It is derived from a summation of the derived weights applied to the original data variables.
Communality (h²)	:	The common variance of each variable summarized by the factors, or the amount (percent) of each variable that is explained by the factors. The uniqueness component of a variable’s variance is 1-h².
Eigenvalue	:	The sum of squares of variable loadings of each factor. It is a measure of the variance of each factor, and if divided by the number of variables (i.e., the total variance), it is the percent of variance summarized by the factor.

Table 14.2 identifies the proportion of variance associated with each of the 14 factors produced by the analysis where the factors were extracted by Principal Component analysis. Principal Components, one of the alternative methods of factor analysis, is a method of factoring which results in a linear combination of observed variables possessing such properties as being orthogonal to each other (i.e., independent of each other), and the first principal component represents the largest amount of variance in the data, the second representing the second largest, and so on. It is the most conservative method. For a more detailed discussion of the alternative methods, see Kim and Mueller (1978a, 1978b). In column two, the eigenvalues are reported.

Computed as the sum of the squared correlations between the variables and a factor, the eigenvalues are a measure of the variance associated with the factor. The eigenvalues reported in Table 14.2 are a measure of the redistribution of the 14 units of variance from the 14 original variables to the 14 factors. We observe that factors 1, 2, 3, and 4 account for the major portion (66.5 percent) of the variance in the original variables. In Figure 14.2, a scree plot depicts the rapid decline in variance accounted for as the number of factors increase. This chart graphs the eigenvalues for each factor. It is a useful visual tool for determining the number of significant factors to retain. The shape of this curve suggests that little is added by recognizing more than four factors in the solution (the additional factors will be unique to a single variable).

Figure 14.2 Scree Plot for Grocery Chain Data Factors

An accepted rule-of-thumb states that if a factor has an associated eigenvalue greater than or equal to 1.0, then the factor is “common” and a part of the solution. This rule-of-thumb is closely aligned with the intuitive decision rules associated with the scree chart. When we observe an eigenvalue less than 1.0, the factor accounts for less variance than was input by a single input variable.

Table 14.2 Factor Eigenvalues and Variance Explained for Grocery Chain Study

Table 14.3 shows the matrix of factor loadings, or correlations of the variables with the factors. If each factor loading in each column were squared, the sum would equal the eigenvalue shown in Table 14.2. Squaring the loadings (h2) and summing across the columns results in the amount of variance in the variables that is to be explained by the factors. These values are known as communalities.

Interpreting the Factors

The interpretation of the factors is subjectively based on the pattern of correlations between the variables and the factors. The factor loadings provide the basis for interpreting the factors ; those variables having the highest loading contribute most to the factor and thereby should receive the most weight in interpreting of the factor.

In factor analysis two solutions typically are obtained. The initial solution is based on certain restrictions: (a) there are k common factors; (b) underlying factors are orthogonal (i.e., uncorrelated or independent) to each other; and (c) the first factor accounts for as much variance as possible, the second factor for as much of the residual variance as possible left unexplained by the first factor, and so on (Kim & Mueller, 1978a). The second solution is accomplished through rotation aimed at getting loadings for the variables that are either near one or near zero for each factor. The most widely used rotation is called varimax, a method of rotation which leaves the factors uncorrelated. This rotation maximizes the variance of a column of the factor loading matrix, thus simplifying the factor structure and making the factors more interpretable.

In Table 14.3, Factor 1 is identified by four variables. The major contributions are made by the variables “Quality of products,” “Reputation,” Selection of products,” and “Modernism.” We might interpret this factor as the construct up-to-date quality products.

Factor 2 is identified by three variables: “Sophistication of customers,” “Speed of checkout,” and “Dull/Exciting.” This factor might be interpreted as the fast and exciting for sophisticated customers. Factor 3 is explained by the variables “Friendliness of clerks,” Cluttered/ Spacious,” and “Layout.” One interpretation of this factor is that it represents the construct of friendliness of store. Finally, the last factor is defined by five variables. These all might be a reflection of satisfaction with the shopping experience.

Table 14.3 Varimax Rotated Factor Loading Matrix for Grocery Chain Data Factor

The example of Table 14.3 depicts a set of factors with loadings that are generally high or low. However, the loadings are often in the .4 to .8 range, questioning at what level the variables make significant enough input to warrant interpretation in the factor solution. A definitive answer to this question cannot be given; it depends on sample size. If the sample size is small, correlations should be high (generally .6 and above) before the loadings are meaningful. But as the sample size increases, the meaning of correlations of lower value may be considered (generally .4 and above).

Overall, it should be obvious that more than one interpretation may be possible for any given factor. Moreover, it may be that a factor may not be interpretable in any substantive sense. This may or may not be a problem, depending upon the objective of the factor analysis. If done for data-reduction purposes, and the results will be used in a further analysis (such as multiple regression or discriminant analysis), being unable to interpret substantively may not be critical. One use of factor analysis is to identify those variables that reflect underlying dimensions or constructs. Once identified, the researcher can select one or more original variables for each underlying dimension to include in a subsequent multivariate analysis. This ensures that all underlying or latent dimensions are included in the analysis

Factor Scores

Once the underlying factors are identified, the resulting factors or constructs are often interpreted with respect to the individual respondents. Simply stated, we would like to know how each respondent scores on each factor. Does the respondent have high scores on the up-to-datequality products and friendliness of store constructs? In general, since a factor is a linear combination (or linear composite) of the original scores (variable values), it can be shown as

where Fi

is the factor score for the ith factor, the an are weights (factor loadings for the n variables) and the Xn are respondent i’s standardized variable scores.

Most factor analysis computer programs produce these factor score summates and merge them with the original data file. Augmenting the data set with factor scores enables the analyst to easily prepare descriptive or predictive analyses that segment respondents scoring high on a given factor. In short, factor scores (rather than original data values) can be used in subsequent analysis.

Correspondence Analysis

Correspondence analysis can be viewed as a special case of canonical correlation analysis that is analogous to a principal components factor analysis for nominal data. Canonical correlation, as we have just seen, examines the relations between two sets of continuous variables; correspondence analysis examines the relations between the categories of two discrete variables. Correspondence analysis can be applied to many forms of contingency table data, including frequency counts, associative data (pick k of n), or dummy variables. This analysis develops ratio-scaled interpoint distances between the row and column categories that depict accurate and useful positioning maps.

Correspondence analysis is often used in positioning and image studies where the researcher wants to explore the relationships between brands, between attributes, and between brands and attributes. In strategic terms, the marketing researcher may want to identify (a) closely competitive brands, (b) important attributes, (c) how attributes cluster together, (d) a brand’s competitive strengths, and most importantly (e) ideas for improving a brand’s competitive position (Whitlark & Smith, 2001).

According to Clausen (1998, p. 1), the main purpose of correspondence analysis is twofold :

1.	To reveal the relationships in a complex set of variables by replacing the data with a simpler data matrix without losing essential information.
2.	To visually display the points in space. This helps interpretation. Correspondence analysis analyzes the association between two or more categorical variables and represents the categorical marketing research data with a two- or three-dimensional map.

Figure 14.3 Example Correspondence Analysis Map of Logistical Services Providers

Another way of thinking about correspondence analysis is as a special type of crosstabulation analysis of contingency tables. Categories with similar distributions will be placed in close proximity to each other, while those with dissimilar distributions will be farther apart. The technique is capable of handling large contingency tables.

An example from Whitlark and Smith (2004) will be used to illustrate this technique. When administering long and complicated surveys, it is sometimes impractical to collect attribute ratings for all brands and products the researcher is interested in. The respondent may have little or no knowledge of some brands, and providing many ratings can be an onerous task that produces poor quality data. Obviously the respondent should not be asked to evaluate unfamiliar brands. If the list of brands is large, the respondent will be asked to only evaluate a subset of the list that contains familiar brands. But more to the point of using correspondence analysis, in this situation the researcher will often simplify the data collection task by not using rating scales, but will instead give the respondent a list of attributes and ask them to check off the ones they feel best describe a particular brand that they are familiar with. This type of question produces “pick k of n” associative data, where k represents the number of attributes a respondent associates with a brand and n represents the total number of descriptive attributes included in the survey.

The correspondence analysis map shown in Figure 10.9 describes 12 companies providing communications, logistics consulting, and software support to a group of international freight handlers and shippers. Three well-known companies in the United States—Oracle, Nokia, and FedEx—are labeled using their names, and a series of nine less-familiar companies are labeled using letters of the alphabet. The 12 companies were evaluated by nearly 800 freight handlers, who indicated which attributes (pick k of n) best described the companies. The twodimensional map of the companies accounts for nearly 90 percent of the variance in the data

The results of this analysis show that Oracle and Nokia are perceived as being the most innovative and as industry leaders, while FedEx offers a relevant and total solution

Correspondence analysis is a very helpful and interesting analysis tool that provides meaning and interpretation to large, complex data sets that contain this type of data. A more detailed explanation of this technique will be found in the excellent works by Clausen (1998), Greenacre (1993), and Carroll, Green, and Schaffer (1986; 1987).

Basic Concepts Of Cluster Analysis

Like factor analysis, clustering methods are most often applied to object variable matrices. However rather than focus on the similarity of variables as in factor analysis, the usual objective of cluster analysis is to separate objects (or people) into groups such that we maximize the similarity of objects within each group, while maximizing the differences between groups. Cluster analysis is thus concerned ultimately with classification, and its techniques are part of a field of study called numerical taxonomy (Sokal & Sneath, 1963; Sneath & Sokal, 1973). Cluster analysis can also be used to (a) investigate useful conceptual schemes derived from grouping entities; (b) generate a hypothesis through data exploration; and (c) attempt to determine if types defined through other procedures are present in a data set (Aldenderfer & Blashfield, 1984). Thus, cluster analysis can be viewed as a set of techniques designed to identify objects, people, or variables that are similar with respect to some criteria or characteristics. As such, it seeks to describe so-called natural groupings, as described in Exhibit 14.3.

Exhibit 14.3 Clustering for Segmentation

From a marketing perspective, it should be made clear that a major application of cluster analysis is for segmentation. To illustrate, consider a financial services company that wanted to do a segmentation study among its sales force of dealers/agents (Swint, 1994/1995). The objective was to identify the characteristics of “high producers” and “mediocre producers” of sales revenue. The desire was to profile the dealers/agents and segment them with respect to motivations, needs, work styles, beliefs, and behaviors. The data were analyzed using cluster analysis, and six cluster solutions emerged. The six clusters were then subject to discriminant analysis to how well the individual clustering attributes actually discriminated between the segments. The end result of all these analyses was six well defined clusters that identified the producer segments.

The type of clustering procedure that we shall discuss assigns each respondent (object) to one and only one class. Objects within a class are usually assumed to be indistinguishable from one another. Thus in cluster analysis, we assume here that the underlying structure of the data involves an unordered set of discrete classes. In some cases we may also view these classes as hierarchical in nature, where some classes are divided into subclasses

Primary Questions

Clustering procedures can be viewed as preclassificatory in the sense that the analyst has not used prior information to partition the objects (rows of the data matrix) into groups. We note that partitioning is performed on the objects rather than the variables; thus, cluster analysis deals with intact data (in terms of the variables). Moreover, the partitioning is not performed a priori but is based on the object similarities themselves. Thus, the analyst is assuming that clusters Clustering procedures can be viewed as preclassificatory in the sense that the analyst has not used prior information to partition the objects (rows of the data matrix) into groups. We note that partitioning is performed on the objects rather than the variables; thus, cluster analysis deals with intact data (in terms of the variables). Moreover, the partitioning is not performed a priori but is based on the object similarities themselves. Thus, the analyst is assuming that clusters

-	Most cluster-analysis methods are relatively simple procedures that are usually not supported by an extensive body of statistical reasoning
-	Cluster-analysis methods have evolved from many disciplines, and the inbred biases of these disciplines can differ dramatically
-	Different clustering methods can (and do) generate different solutions from the same data set.
-	The strategy of cluster analysis is structure-seeking, although its operation is structure imposing.

Given that no information on group definition in advance, we can identify four important considerations in selecting (or developing) cluster analysis algorithms. We must decide :

1.	What measure of inter-object similarity is to be used, and how is each variable to be weighted in the construction of such a summary measure?
2.	After inter-object similarities are obtained, how are the classes of objects to be formed?
3.	After the classes have been formed, what summary measures of each cluster are appropriate in a descriptive sense—that is, how are the clusters to be defined?
4.	Assuming that adequate descriptions of the clusters can be obtained, what inferences can be drawn regarding their statistical reliability?

Choice of Proximity Measure

The choice of a proximity, similarity, or resemblance measure (all three terms will be used synonymously here) is an interesting problem in cluster analysis. The concept of similarity always raises the question: Similarity with respect to what? Proximity measures are viewed in relative terms—two objects are similar, relative to the group, if their profiles across variables are close or if they share many aspects in common, relative to those which other pairs share in common.

Most clustering procedures use pairwise measures of proximity. The choice of which objects and variables should be included in the analysis, and how they should be scaled, islargely a matter of the researcher’s judgment. The possible measures of pairwise proximity are many. Generally speaking, these measures fall into two classes: (a) distance-type measures (Euclidean distance); and (b) matching-type measures. A simple application illustrating the nature of cluster analysis using a distance measure is shown in Exhibit 14.4.

Exibit 14.4 A Simple Example of Cluster Analysis

We can illustrate cluster analysis by a simple example. The problem is to group a set of twelve branches of a bank into three clusters of four branches each. Groups will be formed based on two variables, the number of men who have borrowed money (X1) and the number of women who have borrowed money (X2). The branches are plotted in two dimensions in the figure

We use a proximity measure, based on Euclidean distances between any two branches. Branches 2 and 10 appear to be the closest together. The first cluster is formed by finding the midpoint between branches 2 and 10 and computing the distance of each branch from this midpoint (this is known as applying the nearest-neighbor algorithm). The two closest branches (6 and 8) are then added to give the desired-size cluster. The other clusters are formed in a similar manner. When more than two dimensions (that is, characteristics) are involved, the algorithms become more complex and a computer program must be used for measuring distances and for performing the clustering process.

Selecting the Clustering Methods

Once the analyst has settled on a pairwise measure of profile similarity, some type of computational routine must be used to cluster the profiles. A large variety of such computer programs already exist, and more are being developed as interest in this field increases. Each clustering program tends to maintain a certain individuality, although some common characteristics can be drawn out. The following categories of clustering methods are based, in part, on the classification of Ball and Hall (1964) :

1.	Dimensionalizing the association matrix. These approaches use principal-components or other factor-analytic methods to find a dimensional representation of points from interobject association measures. Clusters are then developed on the basis of grouping objects according to their pattern of component scores.
2.	Nonhierarchical methods. The methods start right from the proximity matrix and can be characterized in three ways:
	a.	Sequential threshold. In this case a cluster center is selected and all objects within a prespecified distance threshold value are grouped. Then a new cluster center is selected and the process is repeated for the unclustered points, and so on. (Once points enter a cluster, they are removed from further processing.)
	b.	Parallel threshold. This method is similar to the preceding method, except that several cluster centers are selected simultaneously and points within a distance threshold level are assigned to the nearest center; thresholds can then be adjusted to admit fewer or more points to clusters
	c.	Optimizing partitioning. This method modifies categories (a) or (b) in that points can later be reassigned to clusters on the basis of optimizing some overall criterion measure, such as average within-cluster distance for a given number of clusters
3.	Hierarchical methods. These procedures are characterized by the construction of a hierarchy or tree-like structure. In some methods each point starts out as a unit (single point) cluster. At the next level the two closest points are placed in a cluster. At the following level a third point joins the first two, or else a second two-point cluster is formed based on various criterion functions for assignment. Eventually all points are grouped into one larger cluster. Variations on this procedure involve the development of a hierarchy from the top down. At the beginning the points are partitioned into two subsets based on some criterion measure related to average within-cluster distance. The subset with the highest average within-cluster distance is next partitioned into two subsets, and so on, until all points eventually become unit clusters.

While the above classes of programs are not exhaustive of the field, most of the more widely used clustering routines can be classified as falling into one (or a combination) of the above categories. Criteria for grouping include such measures as average within-cluster distance and threshold cutoff values. The fact remains, however, that even the optimizing approaches achieve only conditional optima, since an unsettled question in this field is how many clusters to form in the first place.

A Product-Positioning Example of Cluster Analysis

Cluster analysis can be used in a variety of marketing research applications. For example,vccompanies are often interested in determining how their products are positioned in terms of competitive offerings and consumers’ views about the types of people most likely to own the product.

For illustrative purposes, Figure 14.4 shows the result of a hypothetical study conducted for seven sport cars, six types of stereotyped owners, and 13 attributes often used to describe cars.

Figure 14.4 Complete-Linkage Analysis of Product-Positioning Data

The inter-object distance data were based on respondents’ degree-of-belief ratings about which attributes and owner types described each car. In this case, a complete-linkage algorithm was also used to cluster the objects (Johnson, 1967). The complete linkage algorithm starts by finding the two points with the minimum Euclidean distance. However, joining points to clusters is accomplished by maximizing the distance from a point in the first cluster to a point in the second cluster. Looking first at the four large clusters, we note the car groupings :

􀁸 Mazda Miata, Mitsubishi Eclipse

􀁸 VW Golf

􀁸 Mercedes 600 SL, Lexus SC, Infinity G35

􀁸 Porsche Carrera

In this example, the Porsche Carrera is seen as being in a class by itself, with the attributes high acceleration and high top speed. Its’ perceived (stereotyped) owners are rally enthusiast and amateur racer

Studies of this type enable the marketing researcher to observe the interrelationships among several types of entities—cars, attributes, and owners. This approach has several advantages. For example, it can be applied to alternative advertisements, package designs, or other kinds of communications stimuli. That is, the respondent could be shown blocks of advertising copy (brand unidentified) and asked to provide degree-of-belief ratings that the brand described in the copy possesses each of the n features.

Similarly, in the case of consumer packaged goods, the respondent could be shown alternative package designs and asked for degree-of-belief ratings that the contents of the package possess various features. In either case one would be adding an additional set (or sets) of ratings to the response sets described earlier. Hence, four (or more) classes of items could be represented as points in the cluster analysis.

Foreign Market Analysis

Companies considering entering foreign markets for the first time, as well as those considering expanding from existing to new foreign markets, have to do formal market analysis. Often a useful starting point is to work from a categorization schema of potential foreign markets. Cluster analysis can be useful in this process

To illustrate, we use the study by Green and Larsen (1985). In this study, 71 nations were clustered on the basis of selected economic characteristics and economic change. The specific variables used were (a) growth in Gross Domestic Product; (b) literacy rate; (c) energy consumption per capita; (d) oil imports; and (e) international debt. Variables a, d, and e were measured as the change occurring during a specified time period

Clustering was accomplished by use of a K-means clustering routine. This routine is a nonhierarchical method that allocates countries to the group whose centroid is closest, using a Euclidean distance measure. A total of five clusters was derived based on the distance between countries and the centers of the clusters across the five predictor variables. The number of clusters selected was based on the criteria of total within-cluster distance and interpretability. A smaller number of clusters led to a substantial increase of within-cluster variability, while an increase in the number of clusters resulted in group splitting with a minimal reduction in distance. The composition of the clusters is shown in Table 14.4.

Computer Analyses

There are many computer programs available for conducting cluster analysis. Most analysis packages have one or more routines. Smaller, more specialized packages (such as PCMDS) for cluster routines are also available. Finally, some academicians have developed their own cluster routines that they typically make available to other academicians for no charge

Multidimensional Scaling (MDS) Analysis

Multidimensional scaling is concerned with portraying psychological relations among stimuli—either empirically-obtained similarities, preferences, or other kinds of matching or ordering—as geometric relationships among points in a multidimensional space. In this approach one represents psychological dissimilarity as geometric distance. The axes of the geometric space, or some transformation of them, are often (but not necessarily) assumed to represent the psychological bases or attributes along which the judge compares stimuli (represented as points or vectors in his or her psychological space).

Figure 14.5 Nonmetric MDS of 10 U.S. Cities

I. Geographic locations of ten U.S. cities

II. Airline distance between ten U.S. cities

III. Nonmetric (Rank Order) Distance Data

IV. Original and recovered (o) city locations via nonmetric MDS

In this section we start with an intuitive introduction to multidimensional scaling that uses a geographical example involving a set of intercity distances. In particular, we show how MDS takes a set of distance data and tries to find a spatial configuration or pattern of points in some number of dimensions whose distances best match the input data

Let us start by looking at Panel I of Figure 14.5. Here we see a configuration of ten U.S. cities, whose locations have been taken from an airline map (Kruskal & Wish, 1978). The actual intercity distances are shown in Panel II. The Euclidean distance between a pair of points i and j, in any number of r dimensions, is given by

In the present case, r = 2 and only two dimensions are involved. For example, we could use the map to find the distance between Atlanta and Chicago by (a) projecting their points on axis 1 (East-West), finding the difference, and squaring it; (b) projecting their points on axis 2 (North-South) and doing the same; and then (c) taking the square root of the sum of the two squared differences.

In short, it is a relatively simple matter to go from the map in Panel I to the set of numerical distances in Panel II. However, the converse is not so easy. And that is what MDS is all about.

Suppose that we are shown Panel II of Figure 14.4 without the labels so that we do not even know that the objects are cities. The task is to work backward and develop a spatial map. That is, we wish to find, simultaneously, the number of dimensions and the configuration (or pattern) of points in that dimensionality so that their computed interpoint distances most closely match the input data of Panel II. This is the problem of metric MDS

Next, suppose that instead of the more precise air mileage data, we have only rank order data. We can build such a data matrix by taking some order-preserving transformation in Panel II to produce Panel III. For example, we could take the smallest distance (205 miles between New York and Washington) and call it 1. Then we could apply the same rules and rank order the remaining 44 distances up to rank 45 for the distance (2,734 miles) between Miami and Seattle. We could use a nonmetric MDS program to find the number of dimensions and the configuration of points in that dimensionality such that the ranks of their computed interpoint distances most closely match the ranks of the input data.

In this example where the actual distance data is considered, it turns out that metric MDS methods can find, for all practical purposes, an exact solution (Panel I). However, what is rather surprising is that, even after downgrading the numerical data to ranks, nonmetric methods can also achieve a virtually perfect recovery as well.

Panel IV shows the results of applying a nonmetric algorithm to the ranks of the 45 numbers in Panel III. Thus, even with only rank-order input information, the recovery of the original locations is almost perfect.

We should quickly add, however, that neither the metric nor nonmetric MDS procedures will necessarily line up the configuration of points in a North-South direction; all that the methods try to preserve are relative distances. The configuration can be arbitrarily rotated, translated, reflected, or uniformly stretched or shrunk by so-called configuration congruence or matching programs, so as to best match the target configuration of Panel I. None of these operations will change the relative distances of the points.

Psychological Versus Physical Distance

The virtues of MDS methods are not in the scaling of physical distances but rather in their scaling of psychological distances, often called dissimilarities. In MDS we assume that individuals act as though they have a type of “mental map”, (not necessarily visualized or verbalized), so that they view pairs of entities near each other as similar and those far from each other as dissimilar. Depending on the relative distances among pairs of points, varying degrees of dissimilarity could be imagined.

We assume that the respondent is able to provide either numerical measures of his or her perceived degree of dissimilarity for all pairs of entities, or, less stringently, ordinal measures of dissimilarity. If so, we can use the methodology of MDS to construct a physical map in one or more dimensions whose interpoint distances (or ranks of distances, as the case may be) are most consistent with the input data.

This model does not explain perception. Quite the contrary, it provides a useful representation of a set of subjective judgments about the extent to which a respondent views various pairs of entities as dissimilar. Thus, MDS models are representations of data rather than theories of perceptual processes.

Classifying MDS Techniques

Many different kinds of MDS procedures exist. Accordingly, it seems useful to provide a set of descriptors by which the methodology can be classified. These descriptors are only a subset of those described by Carroll and Arabie (1998); and Green, Carmone and Smith (1989)

1.	Mode: A mode is a class of entities, such as respondents, brands, use occasions, or attributes of a multiattribute object.
2.	Data array: The number of ways that modes are arranged. For example, in a two-way array of single mode dissimilarities, the entities could be brand-brand relationships, such as a respondent’s rating of the ijth brand pair on a 1–9 point scale, ranging from 1 (very similar) to 9 (very different). Hence, in this case, we have one mode, two-way data on judged dissimilarities of pairs of brands.
3.	Type of geometric model: Either a distance model or a vector or projection model (the latter represented by a combination of points and vectors).
4.	Number of different sets of plotted points (or vectors): One, two, or more than two.
5.	Scale type: Nominal-, ordinal-, interval-, or ratio-scaled input data.

Data Mode/Way

In marketing research most applications of MDS entail either single mode, two-way data, or two-mode, two-way data. Single mode, two-way data are illustrated by input matrices that are square and symmetric, in which all distinct pairs of entities (e.g., brands) in a I (I-1)/2 matrix are often judgment data expressing relative similarity/dissimilarity on some type of rating scale. The data collection instructions can refer to the respondent making judgments that produce data representing pair-wise similarity, association, substitutability, closeness to, affinity for, congruence with, co-occurrence with, and so on. Typically, only I (I – 1)/2 pairs are evaluated and entered into the lower- or upper-half of the matrix. This is the case because of the symmetric relationship that is assumed to exist. MDS solutions based on single mode, two-way input data lead to what are often called simple spaces—that is, a map that portrays only the set of I points, as was shown in Figure 14.4. Pairs of points close together in this geometric space are presumed to exhibit high subjective similarity in the eyes of the respondent.

Another popular form of marketing research data entails input matrices that represent two-mode, two-way relationships, such as the following six examples :

1.	A set of I judges provide preference ratings of J brands
2.	Average scores (across respondents) of J brands rated on I attributes
3.	The frequency (across respondents) with which J attributes are assumed to be associated with I brands
4.	The frequency (across respondents) with which respondents in each of I brand-favorite groups pick each of J attributes as important to their brand choice
5.	The frequency (across respondents) with which each of J use occasions is perceived to be appropriate for each of I brands
6.	The frequency (across respondents) with which each of J problems is perceived to be associated with using each of I brands.

These geometric spaces are often called joint spaces in that two different sets of points (e.g., brands and attributes) are represented in the MDS map. In Figure 14.6 we observe that brands are positioned in 3-dimensional space as points and the attributes are vectors extending a distance of 1 unit away from the origin. The brands project onto each attribute vector to define their degree of association with that attribute. The further out on the vector the brand projects, the stronger the association. In some cases three or more sets of entities may be scaled.

Figure 14.6 MDPREF Joint Space MDS Map

Type of Geometric Model

In applications of single-mode, two-way data the entities being scaled are almost always represented as points (as opposed to vectors). However, in the case of two-mode, two-way data, the two sets of entities might each be represented as points or, alternatively, one set may be represented as points while the other set is represented as vector directions. In the latter case the termini of the vectors are often normalized to lie on a common circumference around the origin of the configuration.

The point-point type of two-mode, two-way data representation is often referred to as an unfolding model (Coombs, 1964). If the original matrix consists of I respondents’ preference evaluations of J brands, then the resulting joint-space map has I respondents’ ideal points and J brand points. Brand points that are near a respondent’s ideal point are assumed to be highly preferred by that respondent. Although the original input data may be based on between-set relationships, if the simple unfolding model holds, one can also infer respondent to- respondent similarities in terms of the closeness of their ideal points to each other. Brand to- brand similarities may be analogously inferred, based on the relative closeness of pairs of brand points

The point-vector model of two-mode, two-way data is a projection model in which one obtains respondent i’s preference scale by projecting the J brand points onto respondent i’s vector (Figure 14.5). Point-vector models also show ideal points or points of “ideal preference”. This ideal point is located at the terminus or end of the vector. Projections are made by drawing a line so that it intersects the vector at a 90-degree angle. The farther out (toward vector i’s terminus) the projection is, the more preferred the brand is for that respondent

Collecting Data for MDS

The content side of MDS—dimension interpretation, relating physical changes in products to psychological changes in perceptual maps—poses the most difficult problems for researchers. However, methodologists are developing MDS models that provide more flexibility than a straight dimensional application. For example, recent models have coupled the ideas of cluster analysis and MDS into hybrid models of categorical-dimensional structure. Furthermore, conjoint analysis, to be discussed next, offers high promise for relating changes in the physical (or otherwise controlled) aspects of products to changes in their psychological imagery and evaluation. Typically, conjoint analysis deals with preference (and other dominance-type) judgments rather than similarities. However, more recent research has extended the methodology to similarities judgments.

On the input side, there are issues that arise concerning data collection methods. In MDS studies, there are four most commonly used methods of data collection :

When subjects perform a similarity (or dissimilarity) judgment they may experience increases in fatigue and boredom (Bijmolt and Wedel,1995, p. 364)

Bijmolt and Wedel examined the effect of the alternative data collection methods on fatigue, boredom and other mental conditions. They showed that when collecting data, conditional rankings and triadic combinations should be used only if the stimulus set is relatively small, and in situations where the maximum amount of information is to be extracted from the respondents. If the stimulus set is relatively large, sorting and paired comparisons are better suited for collecting similarity data. Which of these two to use will depend on characteristics of the application, such as number of stimuli and whether or not individual-level perceptual maps are desired.

Marketing Applications of MDS

MDS studies have been used in a variety of situations to help marketing managers see how their brand is positioned in the minds of consumers, vis-à-vis competing brands. Illustrations include (a) choosing a slogan for advertising a soft drink, (b) the relationship between physical characteristics of computers and perceptions of users and potential users, (c) effectiveness of a new advertising campaign for a high-nutrition brand of cereal, (d) positioning in physicians’ minds of medical magazines and journals, and (e) positioning of new products and product concepts. There is no shortage of applications in real-world marketing situations.

Current research activity in MDS methods, including the increasing use of correspondence analyses for representing nominal data (Hoffman & Franke, 1986; Carroll, Green, & Schaffer, 1986; Whitlark & Smith, 2003), shows few signs of slowing down. In contrast, industry applications for the methods still seem to be emphasizing the graphical display and diagnostic roles that characterized the motivation for developing these techniques in the first place. The gap between theory and practice appears to be widening. A comprehensive overview of the developments in MDS is provided by Carroll and Arabie (1998).

Fundamentals Of Conjoint Analysis

Conjoint analysis is one of the most widely used advanced techniques in marketing research. It is a powerful tool that allows the researcher to predict choice share for evaluated stimuli such as competitive brands. When using this technique the researcher is concerned with the identification of utilities—values used by people making tradeoffs and choosing among objects having many attributes and/or characteristics.

There are many methodologies for conducting conjoint analysis, including two-factor-at a- time tradeoff, full profile, Adaptive Conjoint Analysis (ACA), choice-based conjoint, self explicated conjoint, hybrid conjoint, and Hierarchical Bayes (HB). In this chapter, two of the most popular methodologies are discussed: the full-profile and self-explicated models.

Conjoint analysis, like MDS, concerns the measurement of psychological judgments, such as consumer preferences. The stimuli to be presented to the respondent are often designed beforehand according to some type of factorial structure. In full-profile conjoint analysis, the objective is to decompose a set of overall responses to a set of stimuli (product or service attribute descriptions) so that the utility of each attribute describing the stimulus can be inferred from the respondent’s overall evaluations of the stimuli. As an example, a respondent might be presented with a set of alternative product descriptions (automobiles). The automobiles are described by their stimulus attributes (level of gas mileage, size of engine, type of transmission, etc.). When choice alternatives are presented, choice or preference evaluations are made. From this information, the researcher is able to determine the respondent’s utility for each stimulus attribute (i.e., what is the relative value of an automatic versus a five-speed manual transmission). Once the utilities are determined for all respondents, simulations are run to determine the relative choice share of a competing set of new or existing products.

Conjoint analysis models are constrained by the amount of data required in the data collection task. Managers demand models that define products with increasingly more stimulus attributes and levels within each attribute. Because more detail increases the size, complexity, and time of the evaluation task, new data collection methodologies and analysis models are continually being developed.

One early conjoint data collection method presented a series of attribute-by-attribute (two attributes at a time) tradeoff tables where respondents ranked their preferences of the different combinations of the attribute levels. For example, if each attribute had three levels, the table would have nine cells and the respondents would rank their tradeoff preferences from 1 to 9. The two-factor-at-a-time approach makes few cognitive demands of the respondent and is simple to follow . . . but it is both time-consuming and tedious. Moreover, respondents often lose their place in the table or develop some stylized pattern just to get the job done. Most importantly, however, the task is unrealistic in that real alternatives do not present themselves for evaluation two attributes at a time.

For the last 30 years, full-profile conjoint analysis has been a popular approach to measure attribute utilities. In the full-profile conjoint task, different product descriptions (or even different actual products) are developed and presented to the respondent for acceptability or preference evaluations. Each product profile is designed as part of a fractional factorial experimental design that evenly matches the occurrence of each attribute with all other attributes. By controlling the attribute pairings, the researcher can estimate the respondent’s utility for each level of each attribute tested.

A third approach, Adaptive Conjoint Analysis, was developed to handle larger problems that required more descriptive attributes and levels. ACA uses computer-based interviews to adapt each respondent’s interview to the evaluations provided by each respondent. Early in the interview, the respondent is asked to eliminate attributes and levels that would not be considered in an acceptable product under any conditions. ACA next presents attributes for evaluation and finally full profiles, two at a time, for evaluation. The choice pairs are presented in an order that increasingly focuses on determining the utility associated with each attribute.

A fourth methodology, choice-based conjoint, requires the respondent to choose a preferred full-profile concept from repeated sets of 3–5 concepts. This choice activity simulates an actual buying situation, thereby giving the respondents a familiar task that mimics actual shopping behavior

The self explicated approach to conjoint analysis offers a simple but robust approach that does not require the development or testing of full-profile concepts. Rather, the conjoint factors and levels are presented to respondents for elimination if not acceptable in products under any condition. The levels of the attributes are then evaluated for desirability. Finally, the relative importance of attributes is derived by dividing 100 points between the most desirable levels of each attribute. The respondent’s reported attribute level desirabilities are weighted by the attribute importances to provide utility values for each attribute level. This is done without the regression analysis or aggregated solution required in many other conjoint approaches. This approach has been shown to provide results equal or superior to full-profile approaches, and requires less rigorous evaluations from respondents.

Most recently, academic researchers have focused on an approach called Hierarchical Bayes (HB) to estimate attribute level utilities from choice data. HB uses information about the distribution of utilities from all respondents as part of the procedure to estimate attribute level utilities for each individual. This approach again allows more attributes and levels to be estimated with smaller amounts of data collected from each individual respondent.

An Example of Full-Profile Conjoint Analysis

In metric conjoint analysis, the solution algorithm involves a dummy variable regression analysis in which the respondent’s preference ratings of the product profile (service or other item) being evaluated serve as the dependent (criterion) variable, and the independent (predictor) variables are represented by the various factorial levels making up each stimulus. In the nonmetric version of conjoint analysis, the dependent (criterion) variable represents a ranking of the alternative profiles and is only ordinal-scaled. The full-profile methods for collecting conjoint analysis data will be illustrated to show how conjoint data are obtained.

The multiple-factor approach illustrated in Figure 14.7 consists of sixteen cards, each made up according to a special type of factorial design. The details of each card are shown on the left side of Figure 14.6.

All card descriptions differ in one or more attribute level(s).1 The respondent is then asked to group the 16 cards (Figure 14.8) into three piles (with no need to place an equal number in each pile) described in one of three ways :

-	Definitely like
-	Neither definitely like nor dislike
-	Definitely dislike

The criterion variable is usually some kind of preference or purchase likelihood rating. Following this, the respondent takes the first pile and ranks the cards in it from most to least liked, and similarly so for the second and third piles. By means of this two-step procedure, the full set of 16 cards is eventually ranked from most liked to least liked.

While it would be easier for the respondent to rate each of the 16 profiles on, a 1-10 ratin scale, the inability or unwillingness of the respondent to conscientiously differentiate between all of the profiles considered typically results in end-piling of ratings where many profiles incorrectly receive the same score values

Again, the analytical objective is to find a set of part-worths or utility values for the separate attribute (factor) levels so that, when these are appropriately added, one can find a total utility for each combination or profile. The part-worths are chosen so as to produce the highest possible correspondence between the derived ranking and the original ranking of the 16 cards. While the two-factor-at-a-time and the multiple-factor approaches, as just described, assume only ranking-type data, one could just as readily ask the respondent to state his or her preferences on (say), an 11-point equal-interval ratings scale, ranging from like most to like least. Moreover, in the multiple-factor approach, a 0-to-100 rating scale, representing likelihood of purchase, also could be used

Figure 14.8 Product Descriptions for Conjoint Analysis (Allergy Medication)

Card 1

Card 2

Card 3

Card 4

Efficacy

No med more effective

Efficacy

No med works faster

Efficacy

Relief all day

Efficacy

Right Formula

Endorsements

Most recom. by allergists

Endorsements

Most recom. by allergists

Endorsements

Most recom. by allergists

Endorsements

Most recom. by allergists

Superiority

Less sedating than Benadryl

Superiority

Rec. 2:1 over Benadryl

Superiority

Relief 2x longer than Benadri

Superiority

Leading long acting OTC

Gardening

Won't quit on you

Gardening

Brand used by millions

Gardening

Relieves allergy symptoms

Gardening

Enjoy relief while garden

Card 5

Card 6

Card 7

Card 8

Efficacy

No med more effective

Efficacy

No med works faster

Efficacy

Relief all day

Efficacy

Right Formula

Endorsements

Most recom. by pharmacist

Endorsements

Most recom. by pharmacist

Endorsements

Most recom. by pharmacist

Endorsements

Most recom. by pharmacist

Superiority

Rec. 2:1 over Benadryl

Superiority

Less sedating than Benadryl

Superiority

Leading long acting OTC

Superiority

Relief 2x longer than Ben

Gardening

Enjoy relief while garden

Gardening

Relieves allergy symptoms

Gardening

Brand used by millions

Gardening

Won't quit on you

Card 9

Card 10

Card 11

Card 12

Efficacy

No med more effective

Efficacy

No med works faster

Efficacy

Relief all day

Efficacy

Right Formula

Endorsements

Nat. Gardening Assoc.

Endorsements

Nat. Gardening Assoc

Endorsements

Nat. Gardening Assoc

Endorsements

Nat. Gardening Assoc.

Superiority

Relief 2x longer than Ben

Superiority

Leading long acting OTC

Superiority

Less sedating than Benadryl

Superiority

Rec. 2:1 over Benadryl

Gardening

Brand used by millions

Gardening

Won't quit on you

Gardening

Enjoy relief while garden

Gardening

Relieves allergy symptoms

Card 13

Card 14

Card 15

Card 16

Efficacy

No med more effective

Efficacy

No med works faster

Efficacy

Relief all day

Efficacy

Right Formula

Endorsements

Prof. Gardeners (Horticul.)

Endorsements

Prof. Gardeners (Horticul.)

Endorsements

Prof. Gardeners (Horticult.)

Endorsements

Prof. Gardeners (Horticul.)

Superiority

Leading long acting OTC

Superiority

Relief 2x longer than Ben

Superiority

Rec. 2:1 over Benadryl

Superiority

Less sedating than Benadryl

Gardening

Relieves allergy symptoms

Gardening

Enjoy relief while garden

Gardening

Won't quit on you

Gardening

Brand used by millions

As may be surmised, the multiple-factor evaluative approach makes greater cognitive demands on the respondent, since the full set of factors appears each time. In practice, if more than six or seven factors are involved, this approach is often modified to handle specific subsets of interlinked factors across two or more evaluation tasks.

Consider the situation in which a manufacturer of over-the-counter allergy medication is interested in measuring consumers’ tradeoffs among the four attributes identified in Figure 14.6.

Figure 14.9 shows a table of the resulting utility values for each of the attribute levels derived for one respondent. These values can be obtained from an ordinary multiple regression program using dummy-variable coding. All one needs to do to estimate the respondent’s utility score for a given concept profile is to add each separate value (the regression coefficient) for each component of the described combination. (The regression’s intercept term may be added in later if there is interest in estimating the absolute level of purchase interest.) For example, to obtain the respondent’s estimated evaluation of card 1, one sums the part-worths :

Figure 14.9 Part-Worth Functions Obtained From Conjoint Analysis of One Respondent

In this instance we obtain an almost perfect prediction of a person’s overall response to card one. Similarly, we can find the estimated total evaluations for the other 15 options and compare them with the respondent’s original evaluations. The regression technique guarantees that the (squared) prediction error between estimated and actual response will be minimized. The information in Figure 14.8 also permits the researcher to find estimated evaluations for all combinations, including the 256 – 16 = 240 options never shown to the respondent. Moreover, all respondents’ separate part-worth functions (as illustrated for the average of all respondents in Table 14.6) can be compared in order to see if various types of respondents (e.g., high-income versus low-income respondents) differ in their separate attribute evaluations. In short, while the respondent evaluates complete bundles of attributes, the technique solves for a set of partworths— one for each attribute level—that are imputed from the overall tradeoffs.

Table 14.6 Average Utilities of All Respondents

Level

Importance %

Efficacy

Importance

14.63

Effective

2.48

Faster

2.37

Relief

2.10

Formula

2.18

Endorsements

Importance

54.24

Allergist

3.83

Pharmacist

3.18

Nat. Garden

2.57

Prof. Garden

2.41

Superiority

Importance

17.76

Less

2.78

Recommend

2.63

Relief

2.71

Leading

2.31

Gardening

Importance

13.37

Quit

2.39

Enjoy

2.74

Millions

2.58

Relief

2.54

These part-worths can then be combined in various ways to estimate the evaluation that a respondent would give to any combination of interest. It is this high leverage between options that are actually evaluated and those that can be evaluated (after the analysis) that makes conjoint analysis a useful tool. It is clear that the full-profile approach requires much sophistication in developing the profiles and performing the regression analyses to determine the utilities. We will now consider the self-explicated model as an approach that provides results of equal quality, but does so with a much easier design and data collection task.

Self-Explicated Conjoint Analysis

The development of fractional factorial designs and required dummy regression for each respondent places a burden on the researcher and respondent alike, especially when the number of factors and levels require that a large number of profiles be presented to the respondent.

The self explicated model provides a simple alternative producing utility score estimates equal to or superior to that of the ACA or full-profile regression models. The self explicated model is based theoretically on the multi-attribute attitude models that combine attribute importance with attribute desirability to estimate overall preference. This model is expressed as

where Ij is the importance of attribute j and Djk is the desirability of level k of attribute j. In this model, Eo, the evaluation of profile for product or service o, is formed by summing the importance weighted desirabilities of the attributes and attribute levels that make up the profile

The Self Explicated Data Collection Task

Initially, all attribute levels are presented to respondents for evaluation to eliminate any levels that would not be acceptable in a product under any conditions. Next, the list of attribute levels is presented and each level is evaluated for desirability (0 –10 scale). Finally, based on these evaluations, the most desirable levels of all attributes are presented in a constant sum question where the relative importances of the attributes are evaluated. Using this information, the attribute importance scores are used to weight the standardized attribute level scores, thereby producing self-explicated utility values for each attribute level. This is done for each respondent and does not require a fractional factorial designs or regression analysis.

As with the full-profile model, these scores can be summed and simulations run to obtain a score for any profile of interest. This simple self-reporting approach is easier for the respondent to complete and straightforward in terms of determining the importance or desirability of attributes and attribute levels (Srinivasan, 1997). An easy to use online implementation of the self explicated model is found at www.qualtrics.com. For this implementation, the conjoint analysis is automatically developed after the attribute level descriptors are entered into the question builder.

Conjoint Reliability and Validity Checks

Irrespective of the method used to carry out a conjoint analysis, it is useful to include the following ancillary analyses: (a) test-retest reliability; (b) a comparison of actual utilities with those of random respondents; and (c) an internal validity check on model-based utilities.

The test retest reliability can be conducted by including a few replicate judgments (drawn from the original set of 16) at a later stage in the interview. The purpose here is to see if the judgments are highly enough correlated, on a test-retest basis, to justify the analysis of the respondent’s data

An internal validity check could, in the case of the allergy medication example, be carried out by collecting a few new evaluations (drawn randomly from the 240 stimulus combinations not utilized in Figure 14.8). These constitute a hold-out sample. Their rank order is to be predicted by the part-worths developed from the calibration sample of 16 combinations. Internal validity checks could be conducted in similar manner for other conjoint methods.

Other Models

So far our discussion has centered on the most widely applied conjoint models a main effects model using rankings or ratings. Other models are available that permit some or all twofactor interactions to be measured (as well as main effects). Interactions occur when attribute levels combine to provide a differential effect. For example, this often happens in food products where combinations of attribute levels seemingly produce more acceptable combinations than would be predicted by the individual attribute levels when considered alone (oatmeal-raisin cookies vs. oatmeal cookies or raisin cookies). These models again make use of various types of fractional factorial designs or combine attributes. Specialized computer programs have been designed to implement them. In short, the users of conjoint analysis currently have a highly flexible set of models and data collection procedures to choose from.

Other Aspects Of Conjoint Analysis

The typical sequence that one goes through to implement a conjoint study involves four steps :

1.	Using one of a variety of data collection procedures just described, obtain sufficient data at the individual respondent level to estimate the part-worths of each person’s utility function
2.	Relate the respondent’s attribute-level part-worth data to other subject background data in an effort to identify possible market segments based on similarities in part-worth functions.
3.	Compose a set of product configurations that represent feasible competitive offerings. These product profiles are entered into a consumer choice simulator, along with the earlier computed individual utility functions.
4.	Use the respondent’s individual part-worth function to compute the utility for each of the competing profiles in the choice simulator. The respondent is then assumed to choose that profile with the highest utility (i.e., the choice process is deterministic).

Use of Visual Aids in Conjoint analysis

Another problem in the application of conjoint measurement is the pragmatic one of getting fairly complex concepts across to the respondent. Verbal descriptions of the type covered in Figure 14.8 are not only difficult for the respondent to assimilate, but also introduce unwanted perceptual differences. As an example, if conjoint analysis was used to test designs for automobiles ; two respondents may have quite different perceptions of the car length and carroominess if verbalizations were used.

Wherever possible, visual props can help to transmit complex information more easily and uniformly than verbal description. As an illustration of the value of visual props, mention can be made of a study involving styling designs for future compact cars. In the course of preparing the questionnaire, rather complex experimental factors such as overall size and interior layout, trunk size and fuel-tank capacity, exterior and interior width, and interior spaciousness and visibility had to be considered. To provide quick and uniform treatment of these style factors, visual props were prepared, as illustrated for two of the attributes in Figure 14.10. (These can be projected on screens in full view of the respondents during the interview or made part of the questionnaire itself.)

Figure 14.10 Illustrations of Visual Props Used on Conjoint Analysis Visual

Visual props work particularly well for the multiple-factor approach, since a relatively large amount of information can be communicated realistically and quickly.

Strategic Aspects of Conjoint Analysis

The output of conjoint analysis is frequently employed in additional analyses. Since most conjoint analysis studies collect full sets of data at the individual respondent level, individual utility functions and importance weights can be computed. This fosters two additional types of analyses: (1) market segmentation and (2) strategic simulation of new factor-level combinations (most often used to test the viability of new product design configurations). Frequently, both kinds of analyses are carried out in the same study

In segmentation studies, the respondents are usually clustered in terms of either their commonality of utility functions or their commonality of importance weights. Having formed thesegments in one of these ways, the analyst can then determine how the segments differ with regard to other background data product-class usage, brand-selection behavior, demographics, and so on.

Strategic simulations are also relatively easy to construct from conjoint analysis data by simply including each individual respondent’s utility function in a computerized choice model. Various combinations of factor levels can then be tried out to see what their share of choices would be under different assumptions regarding competitive offerings and total market demand.

The simulators can employ a variety of consumer choice procedures, ranging from having each consumer simply select the alternative with the highest utility to more elaborate probability of choice rules, where probability is related to utility differences in the set of alternatives under evaluation

Applications of Conjoint Analysis

Conjoint analysis is most conducive to predicting choice behavior when product or service involves a relatively high resource commitment and tends to be “analyzable” by the purchaser (e.g., banking or insurance services, industrial products). Conjoint analysis has already been applied to a wide variety of problems in product design, price elasticity of demand, transportation service design, and the like. Table 14.6 shows a representative list of applications. As can be noted, areas of application cover the gamutproducts and services, as well as consumer, industrial, and institutional markets

Recent Developments in Conjoint Analysis

Conjoint analysis has become a highly popular technique in a relatively short time. Researchers estimate that business firms’ use of conjoint analysis entails several thousand studies each year. With statistical software and conjoint data collection algorithms built into online survey tools (Qualtrics.com), conjoint methodology is easily accessed by any interested user

Software developments in data collection and analysis likewise make it easy to find orthogonal main effects plans. Conjoint methodology has also been extended to encompass use occasion and situation dependence in a series of dual-conjoint designs, called componential segmentation.

Perhaps the most interesting extension of the methodology, however, is the recent application of conjoint to the design of “optimal” products and product lines. Thus, it is feasible to extend conjoint beyond the simulation stage (where one finds the best of a limited set of options) to encompass the identification of the best product (or line) over the full set of possibilities. These may number in the hundreds of thousands or even the millions. In sum, conjoint methodology, like MDS, appears to be moving into the product-design-optimization arena, a most useful approach from a pragmatic managerial viewpoint.

Still, conjoint analysis, like MDS, has a number of limitations. For example, the approach assumes that the important attributes of a product or service can all be identified and that consumers behave rationally as though all tradeoffs are being considered. In some products where imagery is quite important, consumers may not evaluate a product analytically, or, even if they do, the tradeoff model may be only a gross approximation to the actual decision rules that are employed.

In short, MDS and conjoint are still maturing both as techniques that provide intellectual stimulation and as practical tools for product positioning, segmentation, and strategic planning.

Summary

Chapter 14 has focused on four multivariate techniques: factor analysis, cluster analysis, multidimensional scaling, and conjoint analysis.

The factor-analytic method stressed in this chapter was principal-components analysis. This procedure has the property of selecting sets of weights to form linear combinations of the original variables such that the variance of the obtained component scores is (sequentially) maximal, subject to each linear combination’s being orthogonal to previously obtained ones.

The principal components model was illustrated on a set of data from a study conducted by a grocery chain.

Cluster analysis was described in terms of three general questions: (a) selecting a proximity measure; (b) algorithms for grouping objects; and (c) describing the clusters. In addition, an application of clustering was briefly described

MDS methods are designed to portray subjective similarities or preferences as points (or vectors) in some multidimensional space. Psychological distance is given a physical distance representation. We discussed metric and nonmetric MDS methods, and ideal-point and vector preference models. A variety of applications were described to give the reader some idea of the scope of the methodology.

Conjoint analysis was described along similar lines. We first discussed the primary ways of collecting tradeoff data and then showed how such data are analyzed via multiple regression with dummy predictor variables. The importance of fractional factorial designs was discussed, as well as other practical problems in the implementation of conjoint analysis. We next turned to some illustrative applications of conjoint analysis, including the design of new products and services. We then presented a brief description of future developments that could serve to increase the flexibility of conjoint methodology

This chapter, together with Chapter 13, covers the major multivariate analysis techniques and has included brief discussions of lesser-used techniques. We have not discussed suchextensions as canonical correlation of three or more sets of variables or tests for the equality of sums of squares and cross-products matrices. Other advanced, but related procedures such as moderated regression, multiple-partial correlation, discriminant analysis with covariateadjustment, factorial discriminant analysis, to na me a few, have been omitted from discussion.

We have discussed the principal assumption structure of each technique, appropriate problems for applying it, and sufficient numerical applications to give the reader a feel for the kind of output generated by each program.

Our coverage of this vast and complex a set of methods is limited in depth as well as breadth. The fact remains, however, that marketing researchers of the future will have to seek grounding in multivariate methodology, if current research trends are any indication. This grounding will probably embrace three facets: (a) theoretical understanding of the techniques; (b) knowledge of the details of appropriate computer algorithms for implementing the techniques; and (c) a grasp of the characteristics of substantive problems in marketing that are relevant for each of the methods.

References

Aldenderfer, M. S., & Blashfield, R. K. (1984). Cluster analysis. Beverly Hills, CA: Sage.

Ball, G. H., & Hall, D. J. (1964, July). Background information on clustering techniques. (Working paper.) Menlo Park, CA: Stanford Research Institute.

Bijmolt, T. H. A., & Wedel, M. (1995). The effects of alternative methods of collecting similarity data for multidimensional scaling. International Journal of Research in Marketing, 12, 363–371

Carroll, J. D., & Arabie, P. (1980). Multidimensional scaling. In M. R. Rosenzweig & L. W. Porter (Eds.), Annual review of psychology. Palo Alto, CA: Annual Reviews.

Carroll, J. D., & Arabie, P. (1998). Multidimensional scaling. In M. H. Birnbaum (Ed.), Handbook of perception and cognition. Volume 3: Measurement, judgment and decision-making

Carroll, J. D., Green, P. E., & Schaffer, C. M. (1986). Interpoint distance comparisons in correspondence analysis. Journal of Marketing Research, 23, 271–280.

Clausen, S. E. (1998). Applied correspondence analysis: An introduction. Thousand Oaks, CA: Sage.

Coombs, C. H. (1964). A theory of data. New York: Wiley

Green, R. T., & Larsen, T. L. (1985, May). Export markets and economic change. (Working paper 84/85-5-2). Austin: Department of Marketing Administration, University of Texas at Austin.

Green, P. E., Carmone, F. J. and Smith , S. M. (1989). Multidimensional Scaling: Concepts and Applications, Boston: Allyn and Bacon.

Greenacre, M. J. (1993). Correspondence analysis in practice. London: Academic Press

Hoffman, D. L., & Franke, G. R. (1986, August). Correspondence analysis: Graphical representation of categorical data in marketing research. Journal of Marketing Research, 23 213–217.

Johnson, S. C. (1967, September). Hierarchical clustering schemes. Psychometrika, 32, 241–154.

Kim, J.-O., & Mueller, C. W. (1978b). Factor analysis: Statistical methods and practical issues. Beverly Hills, CA: Sage.

Kruskal, J. B., & Wish, M. (1978). Multidimensional scaling. Beverly Hills, CA: Sage.

PC-MDS Statistical Software, Scott M. Smith, Brigham Young University.

Sneath, P. H. A., & Sokal, R. R. (1973). Numerical taxonomy. San Francisco: W.H. Freeman

Sokal, R. R., & Sneath, P. H. A. (1963). Principles of numerical taxonomy. San Francisco: W.H. Freeman

Srinivasan, V. (1997, May). Surprising robustness of the self-explicated approach to customer preference structure measurement. Journal of Marketing Research, 34, 286–291.

Whitlark, D., & Smith, S. (2001, Summer). Using correspondence analysis to map relationships: It’s time to think strategically about positioning and image data. Marketing Research, 2, 23–27.

Whitlark, D., & Smith, S. M. (2003). How many attributes does it take to describe a brand? What is the right question when collecting associate data? Unpublished Paper, Graduate School of Management, Brigham Young University, Provo, UT.

Whitlark, D., & Smith, S. (2004). Measuring brand performance using online questionnaires: Advantages and issues with “pick any” data. (Research Paper.) Provo, UT: Graduate School of Management, Brigham Young University.

Thursday, June 8, 2023

Multivariate Analysis II