Scientific research is directed at the inquiry and testing of alternative explanations of what appears to be fact. For behavioral researchers, this scientific inquiry translates into a desire to ask questions about the nature of relationships that affect behavior within markets. It is the willingness to formulate hypotheses capable of being tested to determine (1) what relationships exist, and (2) when and where these relationships hold.
The first stage in the analysis process is identified to include editing, coding, and making initial counts of responses (tabulation and cross tabulation). In the current chapter, we then extend this first stage to include the testing of relationships, the formulation of hypotheses, and the making of inferences
In formulating hypotheses the researcher uses “interesting” variables, and considers their relationships to each other, to find suggestions for working hypotheses that may or may not have been originally considered. In making inferences, conclusions are reached about the variables that are important, their parameters, their differences, and the relationships among them. A parameter is a summarizing property of a collectivity—such as a population—when that collectivity is not considered to be a sample (Mohr, 1990, p.12).
Although the sequence of procedures, (a) formulating hypotheses, (b) making inferences and (c) estimating parameters is logical, in practice these steps tend to merge and do not always follow in order. For example, the initial results of the data analysis may suggest additional hypotheses that in turn require more and different sorting and analysis of the data. Similarly, not all of the steps are always required in a particular project; the study may be exploratory in nature, which means that it is designed more to formulate the hypotheses to be examined in a more extensive project, than to make inferences or estimate parameters
An Overview Of The Analysis Process
The overall process of analyzing and making inferences from sample data can be viewed as a process of refinement that involves a number of separate and sequential steps that may be identified as part of three broad stages :
|
1. |
Tabulation: identifying
appropriate categories for the information desired, sorting the data by categories,
making the initial counts of responses, and using summarizing measures to
provide economy of description and thereby facilitate understanding. |
|
2. |
Formulating
additional hypotheses: using the inductions derived from the data
concerning the relevant variables, their parameters, their differences, and
their relationships to suggest working hypotheses not originally considered. |
|
3. |
Making
inferences:
reaching conclusions about the variables that are important, their
parameters, their differences, and the relationships among them. |
The Data Tabulation Process
Seven steps are involved in the process of data tabulation :
|
1. |
Categorize. Define appropriate categories
for coding the information collected. |
|
2. |
Editing and Coding Data. Assign codes
to the respondent’s answers. |
|
3. |
Create the Data File. Enter the data
into the computer and create a data file. |
|
4. |
Error Checking
and Handling Missing Data. Check the data file for errors by
performing a simple tabulation analysis to identify errors in coding or data
entry. Once errors are identified, data may be edited or recoded to collapse,
combine or delete responses or categories. |
|
5. |
Generate New
Variables. New
variables may be computed by data manipulations that multiply, sum, or
otherwise transform variables. |
|
6. |
Weight Data
Subclasses. Weights
are often used to adjust the proportionate representation of sample subgroups
so that they match the proportions found in the population. |
|
7. |
Tabulate.
Summarize
the responses to each variable included in the analysis. |
As simple as these steps are from a technical standpoint, data management is most important in assuring a quality analysis and thereby merit an introductory discussion. A more in depth discussion of survey-based data management is provided by Fink (2003, Chap.1).
Defining Categories
The raw input to most data analyses consists of the basic data matrix, as shown in Table 11.1. In most data matrices, each row contains a respondent’s data and the columns identify the variables or data fields collected for the respondent. The analyses of a column of data might include a tabulation of data counts in each of the categories or the computation of the mean and standard deviation. This analysis is often done simply because we want to summarize the meaning of the entire column of values. In so doing we often (willingly) forgo the full information provided by the data in order to understand some of its basic characteristics, such as central tendency, dispersion, or categories of responses. Because we summarize the data and make inferences from it, it is doubly important that the data be accurate.
Tabulation of any sizable array of data often requires that responses be grouped into categories or classes. The identification of response categories early in the study has several advantages. Ideally, it forces the analyst to consider all possible interpretations and responses to the questionnaire. It often leads to improvements in the questionnaire or observation forms. It permits more detailed instruction of interviewers and results in higher consistency in interpreting responses. Editing problems are also reduced.
The definition of categories allows for identification of the database columns and values assigned to each question or variable and to indicate the values assigned to each response alternative. Depending on the data collection method, data code sheets can be prepared and precoded. Data files are often formatted as comma separated variable (CSV) files, meaning that each variable appears in the same relative position for each respondent with a comma separating each of the variables. The major data analysis software programs read data files and then display them in a spreadsheet-like database (see Table 11.1). Often the data are entered directly into a Microsoft Excel spreadsheet for import into the statistical program to be used for analysis. Where data is not collected and formatted electronically, pre-coding of printed questionnaires will eliminate transcription and thereby decreasing both processing errors and costs. Most of today’s computer-based software for telephone (CATI) or internet surveys (Qualtrics.com) automate this entire process. They not only define the question categories in the database but also automatically build the database and record the completed responses as they are submitted. The data may then be analyzed online, exported to Microsoft Excel™ or imported into a dedicated statistical analysis program such as PASW (formerly known as SPSS). Response categories are coded from 1 for the first category to the highest value for the last category. Category values can be recoded to assign different numbers as desired by the researcher.
As desirable as the early definition of categories is, it can sometimes only be done after the data have been collected. This is usually the case when open-end text questions, unstructured interviews, and projective techniques are used.
The selection of categories is controlled by both the purposes of the study and the nature of the responses. Useful classifications meet the following conditions :
|
1. |
Similarity of
response within the category. Each category should contain
responses that, for purposes of the study, are sufficiently similar that they
can be considered homogenous. |
|
2. |
Differences of
responses between categories. Differences in category descriptions
should be great enough to disclose any important distinctions in the
characteristic being examined |
|
3. |
Mutually
exclusive categories. There should be an unambiguous description of
categories, defined so that any response can be placed in only one category. |
|
4. |
Categories
should be exhaustive. The classification schema should provide
categories for all responses. |
The use of extensive open-end questions often provides rich contextual and anecdotal information, but is a practice often associated with fledgling researchers. Open-end questions, of course, have their place in marketing research. However, the researcher should be aware of the inherent difficulties in questionnaire coding and tabulation, not to mention their tendency to be more burdensome to the respondent. All of this is by way of saying that any open-end question should be carefully checked to see if a closed-end question (i.e., check the appropriate box) can be substituted without doing violence to the intent of the question. Obviously, sometimes this substitution should not be made.
Editing and Coding
Editing is the process of reviewing the data to ensure maximum accuracy and clarity. This applies to the editing of the collection forms used for pretesting as well as those for the full-scale project. Careful editing during the pre-test process will often catch misunderstandings of instructions, errors in recording, and other problems so as to eliminate them for the later stages of the study. Early editing has the additional advantage of permitting the questioning of interviewers while the material is still relatively fresh in their minds. Obviously, this has limited application for printed questionnaires, though online or CATI surveys can be edited even when data is being collected.
Editing is normally centralized so as to ensure consistency and uniformity in treatment of the data. If the sample is not large, a single editor usually edits all the data to reduce variation in interpretation. In those cases where the size of the project makes the use of more than one editor mandatory, it is usually best to assign each editor a different portion of the data collection form to edit. In this way the same editor edits the same items on all forms, an arrangement that tends to improve both consistency and productivity.
Typically, interviewer and respondent data are monitored to ensure that data requirements are fulfilled. Each collection form should be edited to ensure that data quality requirements are fulfilled. Regarding data obtained by an interviewer (and to an extent self-report) the following should be specifically evaluated :
|
1. |
Legibility of
entries.
Obviously the data must be legible in order to be used. Where not legible, although
it may be possible to infer the response from other data collected, where any
real doubt exists about the meaning of data it should not be used. |
|
2. |
Completeness
of entries.
On a fully structured collection form, the absence of an entry is ambiguous.
It may mean either that the respondent could not or would not provide the
answer, that the interviewer failed to ask the question, or that there was a
failure to record collected data. |
|
3. |
Consistency of
entries.
Inconsistencies raise the question of which response is correct. (If a respondent
family is indicated as being a non-watcher of game shows, for example, and a
later entry indicates that they watched Wheel of Fortune twice during
the past week, an obvious question arises as to which is correct.)
Discrepancies may be cleared up by questioning the interviewer or by making
callbacks to the respondent. When discrepancies cannot be resolved, discarding
both entries is usually the wisest course of action. |
|
4. |
Accuracy of
entries.
An editor should keep an eye out for any indication of inaccuracy in the data.
Of particular importance is the detection of any repetitive response patterns
in the reports of individual interviews. Such patterns may well be indicative
of systematic interviewer or respondent bias or dishonesty. |
Coding is the process of assigning respondent answers to data categories and numbers are assigned to identify them with the categories. Pre-coding refers to the practice of assigning codes to categories. Sometimes these codes are printed on structured questionnaires and observation forms before the data are collected. Using these predefined codes, the interviewer is able to code the responses when interpreting the response and marking the category into which it should be placed.
Post-coding is the assignment of codes to responses after the data are collected, and is most often required when responses are reported in an unstructured format (open-ended text or numeric input). Careful interpretation and good judgment are required to ensure that the meaning of the response and the meaning of the category are consistently and uniformly matched.
When not using CATI or online data collection technologies, a formal coding manual or codebook is often created and made available to those who will be entering or analyzing the data. The codebook used for a study of supermarkets in the United States is shown in Figure 11.1 as an illustration.
Like good questionnaire construction, good coding requires training and supervision. The editor-coder should be provided with written instructions, including examples. He or she should be exposed to the interviewing of respondents and become acquainted with the process and problems of collecting the data, thus providing aid in its interpretation. The coder also should be aware of the computer routines that are expected to be applied, insofar as they may require certain kinds of data formats.
Whenever possible (and when cost allows) more than one person should do the coding, specifically the post-coding. By comparing the results of the various coders, a process known as determining inter-coder reliability, any inconsistencies can be brought out. In addition to the obvious objective of eliminating data coding inconsistencies, the need for recoding sometimes points to the need for additional categories for data classification and may sometimes mean that there is a need to combine some of the categories. Coding is an activity that should not be taken lightly. Improper coding leads to poor analyses and may even constrain the types of analysis that can be completed.
Qualtrics has an interesting feature that uses a “wizard” to take a survey by selecting random choices and following the various logic paths available. The resulting test data conforms to the “sample size” specified by the researcher and the pre-specified logic and coding can be checked for errors that the researcher has made.
Tabulation may be thought of as the final step in the data collection process and the first step in the analytical process. Tabulation is simply the counting of the number of responses in each data category (often a single column of the data matrix contains the responses to all categories).
The most basic is the simple tabulation, often called the marginal tabulation and familiar to all students of elementary statistics as the frequency distribution. A simple tabulation or distribution consists of a count of the number of responses that occur in each of the data categories that comprise a variable. A cross-tabulation involves the simultaneous counting of the number of observations that occur in each of the data categories of two or more variables. An example is given in Table 11.2. We shall examine the use of cross-tabulations in detail later in the chapter. A cross-tabulation is one of the more commonly employed and useful forms of tabulation for analytical purposes.
The flexibility and ease of conducting computer analysis increases the importance of planning the tabulation analysis. There is a common tendency for the researcher to decide that, because cross-tabulations (and correlations) are so easily obtained, large numbers of tabulations should be run. Not only is this methodologically unsound, but in commercial applications it is often costly in analyst time as well. For 50 variables, for example, there are 1,225 different twovariable cross-tabulations that can be made. Only a few of these are potentially of interest in a typical study.
Formulating Hypotheses
As a beginning point in the discussion of hypotheses testing, we ask: what is a hypothesis? A hypothesis is an assertion that variables (measured concepts) are related in a specific way such that this relationship explains certain facts or phenomena. From a practical standpoint, hypotheses may be developed to solve a problem, answer a question, or imply a possible course of action. Outcomes are predicted if a specific course of action is followed. Hypotheses must be empirically testable. A hypothesis is often stated as a research question when reporting either the purpose of the investigation or the findings. The hypothesis may be stated informally as a research question, or more formally as an alternative hypothesis, or in a testable form known as a null hypothesis. The null hypothesis makes a statement that no difference exists (see Pyrczak, 1995, pp. 75-84).
Research questions state in layman's terms the purpose of the research, the variables of interest, and the relationships to be examined. Research questions are not empirically testable, but aid in the important task of directing and focusing the research effort. To illustrate, a sample research question is developed in the following scenario :
Exhibit 11.1 Development of a Research Question for Mingles
Mingles is an exclusive restaurant specializing in seafood prepared with a light Italian flair. Barbara C., the owner and manager, has attempted to create an airy contemporary atmosphere that is conducive to conversation and dining enjoyment. In the first three months, business has grown to about 70 percent of capacity during dinner hours.
Barbara wants to track customer satisfaction with the Mingles concept, the quality of the service, and the value of the food for the price paid. To implement the survey, a questionnaire was developed using a five-point expectations scale with items scaled as values from -2 to +2. The questionnaire asks, among other things :
|
1. |
Comparing two
sample groups: H0: There is no difference in the value of the food for the price
paid as perceived by first time patrons and repeat patrons. This is tested by
a t-test of the difference in means between two patron groups. |
|
2. |
Predicting
intention to return to Mingles: H0: The perceived quality of service is not
related to the likelihood of returning to Mingles. This is a regression
analysis problem that uses quality of service to predict likelihood of
returning to Mingles. |
|
|
Purpose |
Example |
Decision |
|
Research Question |
Express
the purpose of the research |
What is the perception of Mingles customers regarding the pricevalue of the food? |
None
used |
|
Alternative Hypothesis |
The alternative hypothesis states the specific nature of the hypothesized relationship. i.e., that there is a difference. The alternative hypothesis is the opposite of the null hypothesis. The alternative hypothesis cannot be falsified because a relationship hypothesized to exist may not have been verified, but may in truth exist in another sample. (You can never reject an alternative hypothesis unless you testthe population on all possible samples. |
Mingles is perceived as having superior food value for the price when compared to the average evaluation. |
Not tested because we cannot reject. We may only accept that a relationship exists. |
|
Null Hypothesis |
The null hypothesis is testable in the sense that the hypothesized lack of relationship can be tested. If a relationship is found, the null hypothesis is rejected. The Null hypothesis states that there is no difference between groups (with respect to some variable) or that a given variable does not predict or otherwise explain an observed phenomena, effect or trend. |
There is no difference in perceived food value for the price for Mingles and the average evaluation |
We may reject a null hypothesis (Find a relationship). We may only tentatively accept that no relationship exists. |
The objectives and hypotheses of the study should be stated as clearly as possible and agreed upon at the outset. Objectives and hypotheses shape and mold the study; they determine the kinds of questions to be asked, the measurement scales for the data to be collected, and the kinds of analyses that will be necessary. However, a project will usually turn up new hypotheses, regardless of the rigor with which it was planned and developed. New hypotheses are continually suggested as the project progresses from data collection through the final interpretation of the findings.
In Chapter 2 it was pointed out that when the scientific method is strictly followed, hypothesis formulation must precede the collection of data. This means that according to the rules for proper scientific inquiry, data suggesting a new hypothesis should not be used to test it. New data must be collected prior to testing a new hypothesis.
In contrast to the strict procedures of the scientific method, where hypotheses formulation must precede the collection of data, actual research projects almost always formulate and test new hypotheses during the project. It is both acceptable and desirable to expand the analysis to examine new hypotheses to the extent that the data permit. At one extreme, it may be possible to show that the new hypotheses are not supported by the data and that no further investigation should be considered. At the other extreme, a hypothesis may be supported by both the specific variables tested and by other relationships that give similar interpretation. The converging results from these separate parts of the analysis strengthen the case that the hypothesized relationship is correct. Between these extremes of nonsupport-support are outcomes of indeterminacy: the new hypothesis is neither supported nor rejected by the data. Even this result may indicate the need for an additional collection of information.
In a position yet more extreme from scientific method, Selvin and Stuart (1966) convincingly argue that in survey research, it is rarely possible to formulate precise hypotheses independently of the data. This means that most survey research is essentially exploratory in nature. Rather than having a single pre-designated hypothesis in mind, the analyst often works with many diffuse variables that provide a slightly different approach and perspective on the situation and problem. The added cost of an extra question is so low that the same survey can be used to investigate many problems without increasing the total cost. However, researchers must resist the syndrome of “just one more question”. Often, the one more question escalates into many more questions of the type “it would be nice to know”, which can be unrelated to the research objectives.
In a typical survey project, the analyst may alternate between searching the data (analyzing) and formulating hypotheses. Obviously, there are exceptions to all general rules and phenomena. Selvin and Stuart (1966), therefore, designate three practices of survey analysts :
|
1. |
Snooping. The
process of searching through a body of data and looking at many relations in
order to find those worth testing (that is, there are no pre-designated
hypotheses) |
|
2. |
Fishing. The
process of using the data to choose which of a number of pre-designated variables
to include in an explanatory model |
|
3. |
Hunting.
The process of testing from the data all of a pre-designated set of hypothese |
This investigative approach is reasonable for basic research but may not be practical for decisional research. Time and resource pressures seem to require that directed problem solving be the focus of decision research. Rarely can the decision maker afford the luxury of dredging through the data to find all of the relationships that must be present. Again, it simply reduces to the question of cost versus value
Making
Inferences
Testing hypotheses is the broad objective that underlies all decisional research. Sometimes the population as a whole can be measured and profiled in its entirety. Often, however, we cannot measure everyone in the population but instead must estimate the population using a sample of respondents drawn from the population. In this case we estimate the population “parameters” using the sample “statistics”. Thus, in both estimation and hypothesis testing, inferences are made about the population of interest on the basis of information from a sample.
We often will make inferences about the nature of the population and ask a multitude of questions, such as: Does the sample's mean satisfaction differ from the mean of the population of all restaurant patrons? Does the magnitude of observed the differences between categories indicate that actual differences exist, or are they the result of random variations in the sample?
In other studies, it may be sufficient to simply estimate the value of certain parameters of the population, such as the amount of our product used per household, the proportion of stores carrying our brand, or the preferences of housewives concerning alternative styles or package designs of a new product. Even in these cases, however, we would want to know about the underlying associated variables that influence preference, purchase, or use (color, ease of opening, accuracy in dispensing the desired quantity, comfort in handling, etc.), and if not for purposes of the immediate problem, then for solving later problems. In yet other case studies, it might be necessary to analyze the relationships between the enabling or situational variables that facilitate or cause behavior. Knowledge of these relationships will enhance the ability to make reliable predictions, when decisions involve changes in controllable variables.
The Relationship Between a Population, a Sampling Distribution, and a Sample
In order to simplify the example, suppose there is a population consisting of only five persons. On a specific topic, these five persons have a range of opinions that are measured on a 7-point scale ranging from very strongly agree to very strongly disagree. The frequency distribution of the population is shown in the bar chart of Figure 11.2.
Exhibit 11.2 Population,
Sample, and Sampling Distribution
|
1. |
A Type I error occurs when we incorrectly conclude that a difference exists. This is expressed ɑ, the probability that we will incorectly reject HO, the null hypothesis or sometimes called the hypothesis of no difference. |
|
2. |
A Type II error occurs when we accept a null hypothesis when it is in reality false (we find no difference when a difference really does exist). |
|
3. |
Confidence level: we correctly retain the null hypothesis (we could also say tentatively accept or it could not be rejected). This is equal to the area under the normal curve less the area occupied by ɑ, the significance level |
|
4. |
The power of the test is the ability to reject the null hypothesis when it should be Rejected (when false) becomes larger, researches may choose an ɑ of 10 to increase power Alternatively, sample size may be increased to increase power. Increasing sample size is the preferred option for most market researchers. |
Selecting Tests
Of Statistical Significance
Up to this point, we have considered data analysis at a descriptive level. It is now time to introduce ways to test whether the association observed is statistically significant. In many cases this involves testing hypotheses concerning tests of group means. These types of tests are performed on interval or ratio data using what is known as "parametric tests" and include such techniques as the F, t, and z tests. Often however, we have only nominal or loosely ordinal data and we are not able to meet the rigid assumptions of a parametric test. Cross tabulation analysis with the X2 test is often used for hypothesis testing in these situations. The X2 statistic is from the family of non-parametric methods.
Nonparametric methods are often called distribution-free methods because the inferences are based on a test statistic whose sampling distribution does not depend upon the specific distribution of the population from which the sample is drawn (Gibbons, 1993, p., 2). Thus, the methods of hypothesis testing and estimation are valid under much less restrictive assumptions than classical parametric techniques—such as independent random samples drawn from normal distributions with equal variances, interval level measurement. These techniques are appropriate for many Marketing applications where measurement is often at an ordinal or nominal level.
There are many parametric and nonparametric tests. The one that is appropriate for analyzing a set of data depends on: (1) the level of measurement of the data, (2) the number of variables that are involved, and for multiple variables, how they are assumed to be related.
|
What type of
analysis do you want? |
Level
of Measurement |
||
|
Nominal |
Ordinal |
Interval
(Parametric) |
|
|
Measure of
central tendency |
Mode |
Median |
Mean |
|
Measure of
Dispersion |
None |
Percentile |
Standard
Deviation |
|
One-Sample
test of statistical significance |
Binomial test |
Kolmogorov-Smirnov One-sample
test One-sample
runs test |
t-test Z-test |
Exhibit 11.4 Ignoring
Statistical Power
Much has been written about product failure, and about the failure of advertising campaigns. But, except in rare instances, very little has been said about research failure, or research that leads to incorrect conclusions. Yet research failure can occur even when a study is based on an expertly designed questionnaire, good field work, and sophisticated analysis. The flaw may be inadequate statistical power.
Mistaking chance variation for a real difference is one risk, called Type II error and the 95% criterion largely eliminates it. By doing so, we automatically incur a high risk of Type II errormistaking a real difference for chance variation. The ability of a sample to guard against Type II error is called statistical power.
|
1. |
The concept is
more complicated than statistical significance or confidence limits. |
|
2. |
Consideration of
statistical power would often indicate that we need larger (more costly)
samples than we are now using. |
|
3. |
Numerical objectives
must be specified before a research budget is fixed. |
Parametric And
Non – Parametric Analysis
Figure 11.5 Sampling
Distribution of the Mean ( u ẋ )
If we expand this analysis to construct a confidence interval around the mean of a sample rather than the mean of a population we must rely on a sampling distribution to define our normal distribution. The sampling distribution is defined by the means of all possible samples of size n. Recall that the population mean μ is also the mean of the normally distributed sampling distribution, where as μ z ± σ describes the confidence interval for a population, the value (± α s/√ᶯ)) describes the confidence interval for the sampling distribution. This is the probability that this specified area around the sample mean covers the population mean. It is interesting to note that because n, the sample size, is included in the computation of the "standard error," we may estimate the population mean with any desired degree of precision, simply by having a large enough sample size.
Univariate Hypothesis Testing of Means
Where the Population Variance is Known
Researchers often desire to test a sample mean to determine if it is the same as the population mean. The z statistic describes probabilities of the normal distribution and is the appropriate tool to test the difference between μ, the mean of the sampling distribution, and, the sample mean when the population variance is known. The z statistic may, however, be used only when the following conditions are met :
|
1. |
Individual
items in the sample must be drawn in a random manner |
|
2. |
The population
must be normally distributed. If this is not the case, the sample must be large
(>30), so that the sampling distribution is normally distributed. |
|
3. |
The data must
be at least interval scaled. |
|
4. |
The variance
of the population must be known. |
|
1. |
The null hypothesis (H0) is specified that there is no difference between μ and . Any observed difference is due solely to sample variation. |
|
2. |
The alpha risk
(Type I error) is established (usually .05). |
|
3. |
The z value is calculated by the appropriate z formula : |
|
4. |
The
probability of the observed difference having occurred by chance is
determined from a table of the normal distribution (Appendix B, Table B-1). |
|
5. |
If the
probability of the observed differences having occurred by chance is greater
than the alpha used, then H0 cannot be rejected and it is concluded that the
sample mean is drawn from a sampling distribution of the population having
mean μ. |
|
With a
probability : |
P (t=2.00, df
= n-1=224) = .951 (2 tailed test) 1-P (t=2.00) =
.049 |
|
Confidence
Interval : |
1.96 s/√ᶯ
)= 1.176 |
We therefore reject the null hypothesis. Figure 11.7 shows these results.
As n becomes
large ( ≥ 30) reached at the limit (n = ∞). The t-statistic is widely used in
both univariate and bi-variate market research analyses, due to the relaxed assumptions
over the z - statistic, as follows :
|
1. |
Individual
items in the sample are drawn at random. |
|
2. |
The population
must be normally distributed. If not, the sample must be large (>30). |
|
3. |
The data must
be at least interval scaled. |
|
4. |
The population
variance is not known exactly, but is estimated by the variance of the sample |
|
1. |
Determining
the significance of sample deviations from an assumed theoretical
distribution; that is, does a certain model fit the data. This is typically
called a goodness-of-fit test. |
|
2. |
Determining
the significance of the observed associations found in the cross tabulation
of two or more variables. This is typically called a test of independence.
(Discussed in Chapter 12). |