Discriminant Analysis

From QualtricsWiki

Jump to: navigation, search

Contents

[edit] Discriminant Analysis

A Note on the Interpretation and Analysis
of the Linear Discriminant Model for
Prediction and Classification
Scott M. Smith, Ph.D.

[edit] Introduction

While not on the "hot marketing topics" list, discriminant analysis is a much valued tool for market segmentation. Over the years, the estimation of the linear discriminant function has received much theoretical attention, both in the marketing literature (Dillon 1979; Dillon and Schiffman 1978; Crask and Perreault 1977; Morrison 1969: Frank, Massey and Morrison 1965), and in mathematical statistics (Randles, Broffitt, Ramberg, Hogg 1978; McLachlan 1977; Kraznowski 1975; Fisher and Van Ness 1973; Lachenbruch and Mickey 1968).

This concern for estimation has most often focused on the precision with which the discriminant function correctly classifies sets of observations, rather than with methods to better optimize the function itself. Specifically, methodological research has evaluated such areas as the influence of variable selection (Goldstein and Rabinow 1975: Urbakh 1971), bias in categorization (Krishnaswami and Nath 1968; Lachenbruch 1967; McLachlan 1974), and the validation of rules for classifying sets of observations (Dillon and Goldstein 1978; Hills 1966).

This concern for the classification ability of the linear discriminant function has obscured and even confused the fact that two very distinct purposes and procedures for conducting discriminant analysis exist. The first procedure, discriminant predictive analysis, is used to optimize the predictive functions. The second procedure, discriminant classification analysis, uses the predictive functions derived in the first procedure to either classify fresh sets of data of known group membership, thereby validating the predictive function; or if the function has previously been validated, to classify new sets of observations of unknown group membership.

The prediction and classification procedures referenced in the preceding paragraph need to be defined. In the prediction procedure, t linear discriminant functions are derived from a set of weighted independent variables. These t functions maximally discriminate the t levels of the dependent variable, thus providing a predictive measure of the subject's group membership. Discriminant analysis conducted for predictive purposes is based on an initial set of observations, the group membership of which is known. This discriminant procedure is commonly coupled with an analysis to classify the initial data set. However, it is important to note that discriminant analysis for predictive purposes (i.e., prediction of data having known group membership) involves only the derivation of the linear discriminant function, and not the classification of subjects. The purpose of a classification of observations of known grouping is merely to see how well the derived function predicts group membership using the subject data from which it was derived. The classification procedure associated with the predictive analysis may be thought of as a base line analysis that establishes a standard of comparison for future Discriminant classification analysis. This baseline classification analysis produces a t x t confusion matrix that compares predicted versus actual group membership. This confusion matrix is one measure of how well the derived functions predict group membership. Again, the discriminant classification analysis is in sharp contrast to the predictive analysis.

Discriminant analysis conducted for predictive purposes formulates a linear discriminant function describing the importance of the independent variables in differentiating observations of known group membership. Discriminant analysis conducted for classification purposes validates the predictive discriminant function as a means of classifying fresh observations of unknown group membership sampled from the same populations. In the event of previous validation of the predictive function, the classification analysis is purely for classification purposes, and the Discriminant function used for classification is neither derived nor at issue.

The central research objectives by which discriminant analysis is most often evaluated are to maximize either the discriminating power of the predictive function or the overall correct classification within the confusion matrix. Although these objectives may lead to maximum values, they are often less than optimal when expressed in terms of specific research hypotheses. The results of the classification analysis must be evaluated in light of the specific research objectives if optimization rather than maximization is to result. Specifically, we may ask if classification of the group of interest is maximized. Overall classification may be maximized at the expense of less than maximal classification of the group of interest, especially if the group is a small proportion of the total number of observations classified, as is often the case in classifying members of a market segment. The ancillary questions that must be answered are:

  1. Where do the expected and observed classifications differ? and
  2. What statistical significance lies in the deviation of observed from expected classification?

Given that the distinction between Discriminant analysis as used for prediction and classification has been made, the objectives of this paper are threefold: first, to review the Discriminant model as a predictive tool; second, to expand on the interpretation of the predictive analysis; and third, to consider a series of analyses that may be used to statistically test the results of the classification analysis as presented in the confusion matrix.

[edit] DISCRIMINANT ANALYSIS FOR PREDICTION

Discriminant analysis is based on the linear model of the familiar matrix notation form which may be expanded to:

(1) D_t=\lambda_{t_0}+\lambda_{t_1}x_1+\lambda_{t_2}x_2+...+\lambda_{t_p}x_p

where,

D t = the predicted discriminant score for group t

t = the number of groups differentiated by the t discriminant functions

X = the measured values of the p independent variables used to predict group membership

\lambda_t=(\lambda_{t_0}+\lambda_{t_1}+\lambda_{t_2}+...+\lambda_{t_p}) the vector of weights associated with the p variables that predict category t.


The discriminant analysis when conducted for predictive purposes maximizes the amount of subject variance explained by the linear function. This maximization procedure is the rule for all procedures that comprise the family of general linear models (regression, principal components, and canonical analysis). Discriminant analysis uses a set of p variables with associated weights (Lambdatp ) that are derived in a best fit, linear unbiased fashion to predict the score of the dependent variable, D. These discriminant scores are predictors of group membership that can be used to classify groups of observations that are of either known or unknown group membership. The Discriminant analysis may be viewed as an eigenvalue problem, no different from the eigenvalue problem encountered in solving for the characteristic roots of any set of linear equations. To solve for the characteristic roots, we successively maximize the ratio of the between sum of squares to the within sum of squares for each Lambda t : \lambda_t = \frac{SS_a(W_t)} {SS_w(W_t)}= \frac {V'_tAV_t} {V'_tWV_t}

where,'

A = SSa = the between (among) groups SSCrossProducts matrix
W = SSw = the Pooled within groups SSCrossProducts matrix

V = the eigenvector of weights associated with Lambda t, the first characteristic root
Wt = the vector of discriminant scores on eigenvector X t The vector of characteristic roots Lambda is derived from the matrix equation (2)

(A-Lambda t W)Vt = 0. The setting of V=O for the trivial solution and the transformation of the equation by W</nowiki>-1 (the inverse of the within groups SSCP matrix), produces the characteristic equation | W − 1A − ΛI | = 0 It is this characteristic equation that is differentiated to solve for X t and Vt.
Once Lambda t and Vt are determined, the prediction of Dt is routine, since all values in the predictive formula (1) are known.

The basic question answered in the predictive analysis is: given that groups of observations exist, can we develop t functions that maximally discriminate or explain the difference between the groups? This type of problem situation is common to most areas of marketing research. However, the best examples occur where a clear distinction between the t nominally scaled groups exists.One such example is provided by Evans (1959), who used personality variables as predictors of past brand purchases for Ford and Chevrolet car owners. A Discriminant analysis, if conducted within this problem setting, would attempt to differentiate on a post hoc basis the brand choice behavior for Ford and Chevrolet owners. That is, given two previously identified groups of car owners, can a predictive function be formulated from the independent variables to explain this difference? Albaum and Hawkins (1979) provide yet another example, where a predictive analysis was used to differentiate a sample of fixed and variable rate mortgage holders. Again in this situation, group membership was known prior to the analysis, the sole purpose of which was to derive the predictive function. A predictive analysis is possible in many situations where prior designation of groups exists (e.g., product purchasers versus non-purchasers: heavy half versus light half market segments- innovators versus non-innovators; successful versus non-successful new product ideas, etc.). Again, the research objective is to predict using the set of independent variables, and not to classify consumers of unknown group membership.

Linear model users are often disappointed when the model that predicts group membership well for the original set of objects becomes at best marginal when applied to fresh data drawn from the same population. This is often the case because the predictive models do capitalize on chance and therefore lead to situations where the function may predict group membership of the initial data set far better than for any other sample that could be drawn. Clearly, "testing the procedure on the data that gave it birth is almost certain to overestimate performance. For the optimizing process that chose it from among many possible procedures will have made the greatest use possible of any and all idiosyncrasies of those particular data. Sometimes we say that optimization capitalizes on chance" (Mosteller and Tukey 1968). Optimization based on chance creates a degree of fit, but in the case of the predictive analysis, this fit may be upward biased and not representative of the real world (Morrison 1969). Thus we see that while the predictive analysis explains differences between the t groups described in the current data sample, it does not validate the model as explaining differences in the population as a whole.

Consider one final example of the predictive analysis. Two groups of customers are defined on an a priori basis, these being (1) purchasers, and (2) non-purchasers of an accident insurance product.

The objective of the predictive analysis is to develop an equation that maximally discriminates the two purchase groups using p independent demographic and socioeconomic variables. If we restate this objective in terms of prediction and validation, we desire to develop a set of Discriminant functions that both discriminate between the two sample groups (prediction), and are generalizable as a valid tool for classifying potential customers (classification) in the future.

For the purchaser and non-purchaser groups of the accident insurance product, the Discriminant functions are expressed:

(3) D1 = -19.59 x0 + 10.26 x1 + 8.84 x2 + 2.69 x3 D2 = - 12.02 x0 + 7.87 x1 + 7.33 x2 + 1.87 x3

where : x = constant
x1 = marital status
x2 =age
x3 =occupation

Thus, given the equation and the observed values Xp , the value Dt can be derived.

The functions that discriminate between the purchasers and non-purchasers of the accident insurance product (3) were derived in a step-wise analysis that employed the Wilks Lambda statistic to determine which independent variables should be included in the Discriminant function. The Wilks Lambda criterion maximally discriminates between the t groups by maximizing the multi-variate F ratio in the tests of differences between the t group means.

The derived discriminant coefficients may be interpreted as indicative of the importance of the respective p independent variables entered into the discriminant analysis. Although these coefficients indicate importance, they are not appropriate for assessing the relative importance or discriminatory power of the variables, i.e., the proportion of total discriminating power attributable to a specific variable. Relative importance of the independent variables entered in the predictive function is defined in part by:

(4)I_p = |\Lambda_p(bar{x}_p-bar{x}_{p_2})

where: Ip = the importance of the pth variable

Lambdap= the unstandardized discriminant coefficient for the pth variable

Xpt= the mean of the pth variable for the tth group (Mosteller and Waters, 1973).

To convert Ip the importance measure for the p th variable into a relative importance score, I p must be expressed in terms of the sum of the importance values of all variables. The relative importance of the p th variable, Rp is expressed for the insurance purchasers as (Awh and Waters 1974):

(5) R_p = \frac {I_p} {\sum_{1}^3I_p}


These Rp values computed for the purchaser of the accident insurance product are:

Function 1 Function 2
mean p1 Lambdap Ip Rp mean p2 Lambdap Ip Rp
1.79 10.26 4.53 .53 1.35 7.87 3.47 .53
1.76 8.84 2.49 .29 1.48 7.33 2.07 .31
1.97 2.69 1.53 .18 1.40 1.87 1.07 .16
Sum=8.5 Sum=6.6


Once the meaning of the prediction function is clear, the predictive function can be used to classify observations of either known or unknown group membership.

[edit] CLASSIFICATION OF OBSERVATIONS FROM INITIAL AND NEW DATA SETS

Discriminant analysis conducted for predictive purposes uses an initial data set having known group membership to both derive the Discriminant function and predict group classification. This classification of observation is but an extension of the predictive Discriminant analysis in that the predictive Discriminant scores, Dit, form the basis of the decision rule used to classify this same set of objects into the t groups.

In contrast to the classification of the initial data set, where group membership is known- the same decision rule may be applied to other sets of data. However, when we classify data sets other than the initial set from which the predictive analysis was conducted, we are no longer engaged in predictive Discriminant analysis, but rather in Discriminant classification analysis. It is critical that this distinction is clear. Predictive Discriminant analysis requires no validation procedures be implemented, since derivation of an optimal Discriminant function is the only relevant issue. However, if fresh sets of data with either known or unknown grouping are classified, then the Discriminant function must be validated to be generalizable to these data sets. The following discussion of the methodology for classification and for extending the classification analysis applies equally well to both predictive and classification analyses in that classification methodology is the same in both cases.

D_t= x\prime \sum^{-1} (bar{x}_1-bar{x}_2)

The predictive analysis explained above demonstrated the source of the derived Discriminant scores, Dt, that are used to classify observations. To demonstrate the classification procedure, we must first recognize that the two p dimensional populations of our example are described by the discriminant function, Dt , where Values Dit are computed for each of the i- observations so as to form the t distributions of values in a dimensional space that have sample means or centroids designated as xbar1 and xbar2. For the example problem, the classification analysis determines if observation i belongs to population one or two. Using the midpoint between the two groups defined by C, the correct classification for Dit may be determined by selecting the appropriate decision alternative:

C=[-\frac {1}{2} (bar{x}_1-bar{x}_2)]\prime \sum^{-1} (bar{x}_1-bar{x}_2)

Classify observation i as coming from population one if

x\prime \sum^{-1} (bar{x}_1-bar{x}_2)-1\rho(bar{x}_1+bar{x}_2)\prime \sum^{-1}(bar{x}_1-bar{x}_2)

Otherwise, classify i as population two. Alternatively, the classification rule may be defined as:

D_i=x\prime \sum^{-1} (bar{x}_1-bar{x}_2)

where no correction for the midpoint is made. In this case, the decision criterion is to compute the value Ditfor each of the t functions and classify the observation into the group that has the largest Discriminant score D. Computational form (6) is the basis for most algorithms found in the standard statistical packages.

The classification rules described above are commonly used in both predictive and classification analyses when group membership is known to develop a t x t matrix, designated a confusion matrix. Although this confusion matrix shows the frequency of correct and incorrect classification resulting from the decision rule, it has not been subject to the further analysis necessary to test for the presence of specific relationships or even overall significance. (Note that if a classification analysis with unknown grouping of objects is run, then a confusion matrix cannot be constructed, thus showing the critical nature of the validation analysis.)

[edit] Confusion -Matrix Analysis

The computation of the confusion matrix has traditionally ended the Discriminant analysis procedure. However the confusion matrix, when viewed as a contingency table, is subject to a variety of analyses that may be directed toward unanswered questions. Specifically, given the level of observed correct classification;

  1. What level of overall classification is expected from chance alone, and is this classification significantly different from observed classification? (An analysis of the aggregate confusion matrix)
  2. Which groups are best classified by the Discriminant function, and is each respective group classified significantly better than expected by chance alone? (Analysis of individual rows of the confusion matrix)
  3. Within each group, does the proportion of subjects correctly classified or misclassified differ significantly from chance? (Analysis of individual cells of the confusion matrix)

[edit] Analysis Level I: The Aggregate Confusion Matrix

The confusion matrix derived from the analysis of the accident insurance purchasers was evaluated with respect to the above stated questions.

Figure 1
Confusion "Matrix for Accident Insurance Purchasers
Frequency, Row %
Chi-Square Contrib.
Predicted
Purchase
Predicted
Non-Purchase
Row Total
Row Percentage
Actual Purchase n= 22
66.7
15.41
N=11
33.3
6.46
N=33
11.1
Actual Non-Purchase n=66
24.9
1.92
N=199
75.1
.80
N=265
88.9
Column Totals
Column Percentage
88
29.5
21.0
70.5
298
100

Percent of Cases Correctly Classified = 221 / 298 = 74.16%
Chi-Square = 24.599 df = l, Significance < .001

Overall correct classification was observed in 74.16% of all subjects surveyed. This observed classification was found to be significant at the .001 level (X 2= 24.59, df = 1) and (Q = 69.58, df = 1). Thus, observed classification is significantly different from expected chance classification.

In addition to testing for overall significance of a single confusion matrix, tests may be used to differentiate alternative Discriminant models defining the same population. Operationally, this is done by selecting the function with the largest Q statistic, since this identifies the function with the greatest discriminating ability.

[edit] Analysis Level II:Tests of Group Differences

Morrison (1969) considered the question of how well variables discriminate by formulating a likelihood ratio to estimate chance classification. This estimate of chance classification is the basis for further tests of specific relations critical to a rigorous analysis. However, expected classification, or tests involving expected classification of specific groups, are rarely reported in the literature.

Morrison's likelihood analysis provides a criterion that may be used to compare the proportion of correctly classified observations with the proportion expected by chance. This proportion, designated the proportional chance criteria, or Cpro (Morrison 1969), is expressed as:

Cpro = p alpha + (1 - p) (1 - alpha) = (.295) (.111) + (.705) (.889) = .6594

where,

  • alpha = the proportion of customers in the sample categorized as purchasers
  • ·p = the true proportion of purchasers in the sample
  • (1-alpha) = the proportion of the sample classified as non-purchasers
  • (1-p) = the true proportion of non-purchasers in the sample

This likelihood analysis states that 65.94 % of the overall sample is expected to receive correct classification by chance alone. The proportional chance criterion, Cpro, has been used mainly as a point of reference for subjective evaluation (Morrison 1969), rather than the basis of a statistical test to determine if the expected proportion differs from the observed proportion that is correctly classified. Notable exceptions are found in Albaum, Best, and Hawkins (1975), and Smith (1979).

This relationship between chance and observed proportions can be tested using a Z statistic of the form:

\cfrac{Pcc-Cpro} {\sqrt{\frac{(Cpro)-(1-Cpro)}{n}}}=\cfrac{.7416-.6594} {\sqrt{\frac{(.6594)(.3406)} {298}}}=2.99

where Pcc is the percent of observations correctly classified Cpro p alpha + (1-P) (1-alpha)

Thus for the example problem, the difference between expected and actual overall correct classification is significantly different at the .01 level. This overall test of significance suggests that further analysis should be conducted to determine the source of the divergence from chance expectations.

Divergence may be present in any of the confusion matrix cells (i.e., purchasers or non-purchasers, that are either correctly or incorrectly categorized), and thus each may be tested to determine whether its proportion differs from chance.

[edit] Analysis Level III: Classification and Misclassification Within Groups

The analysis to determine the source of deviation is conducted using the maximum chance criterion, designated Cmax (Morrison 1969). Cmax is the minimum expected correct classification for a selected group of interest. The computation of Cmax is based on the assumption that all observations are categorized as coming from that group: e.g., given that all 298 purchasers and non-Purchasers were classified as purchasers, then the maximum correct classification, Cmax, would be expressed:

Cmax = \frac {Total Purchasers} {Total Customers} = \frac {33} {298}

Because we are interested in the correct classification of insurance purchasers, the test of classification involves asking if the 66.67% correct insurance purchaser classification differs significantly from the 11.1% maximum expected chance classification. A Z statistic is used to test this relationship as shown for the example analysis.

Z_{1_1}= \cfrac {Observed Correct Classification - Cmax} {sqrt{\frac {(Cmax)(1-Cmax)} {n}}} = \cfrac {.667-.111} {sqrt {\frac {(.111)(.889)} {33}}}=10.17*

  • Significant at the .001 level.

This test may be conducted for the other cells in the confusion matrix:

Z_{12}=\cfrac {.333-.111} {sqrt{\frac {(.111)(.889)} {33}}}= 4.06
Z_{21}=\cfrac {.249-.889} {sqrt{\frac {(.889)(.111)} {265}}}= -33.16
Z_{22}=\cfrac {.751-.889} {sqrt{\frac {(.889)(.111)} {265}}}= -7.15


Thus cell Z11 shows that observed classification is significantly greater than is expected to occur by chance classification alone. The analysis of cells (1,2) and (2,1) shows that observed and expected misclassification results differ in that purchasers are misclassified into cell (1,2) less often than expected by chance, and non-purchasers are misclassified into cell (2,2) more often than expected by chance. Thus the discriminant functions appear to shift the classification of subjects toward the purchaser categories, as demonstrated by significantly greater than expected classification in the upper and left portions of the confusion matrix.

[edit] SUMMARY

Two objectives have been fulfilled by this paper. The first objective of the paper was to show that differences in the application and the requirements for discriminant analysis exist. These are often misinterpreted, especially with respect to the validation of the predictive Discriminant analysis. These differences are summarized as follows.

Stages of Analysis


Predictive
Discriminant
Analysis
Classification
Analysis of
Initial Data
Set of Known
Groupings
Classification
Analysis of
New Data Set
of Known
Groupings
Classification
Analysis of
New Data Set
of Known
Groupings
Purpose Derive Discriminant function
using initial data set:
No classification involved
Determine how
well discriminant
function classifies
(biased)
1) Classify data
using classification
rule derived
from predictive
function
2) May be part
of validation
analysis of
initial predictive
function
1) Classify data
using classification
rule derived
from predictive
function
2) May be part
of validation
analysis of
initial predictive
function
Requirements Assumptions of
linear discriminant
model:
No validation
required
No validation
required
Validation required Initial predictive
function must
have been
previously validated


The second objective of this paper has been to demonstrate the increased rigor in Discriminant analysis that can be implemented if classification of data sets of known groupings are implemented. The use of these techniques will enhance both the analysis and interpretation of the classification analysis, particularly when the predictive function is being validated as a tool for classification.

The clarification of the alternative uses of the Discriminant analysis along with the possibility of increased rigor will greatly enhance both the analysis and interpretation of empirical and managerial problems.


REFERENCES

Albaum, G., R. Best, and D. Hawkins (1975), "Applying Discriminant Analysis to Unipolar Semantic Scaling Data," American Institute of Decision Sciences Western Meetings.

Albaum, G., and D. Hawkins (1979), "Differences between Consumers of Variable-Rate and Fixed-Rate Residential mortgages," in Proceedings of the Association for Consumer Research, J.C. Olson et. al., eds., San Francisco, California.

Awh, R.Y., and D. Waters (1974), "A Discriminant Analvsis of Economic, Demographic, and Attitudinal Characteristics of Bank Change-Card Holders: A Case Study, 29, The Journal of Finance 29, pp. 973-980.

Crask, M.R., and W.D. Perreault, Jr. (1977), "Validation of Discriminant Analvsis in Marketing Research," Journal of Market Research 14 (February), pp.60-68.

Dillon, W.R. (1979), ';The PerforTnance of the Linear Discriminant Function in Non-Optimal Situations and the Estimation of Classification Error Rates: A Review of Recent Findings,7' Journal of Marketing Reserach 16 (August), pp. 370-391.

Dillon, W.R., and M. Goldstein (1978), "On the Performance of Some Multinomial Classification Rules," Journal of the Anerican Statistical Association 73 (June), pp. 305-313.

Dillon, W.R., and L. Schiffman (1978), ':Appropriateness of Linear Discriminant and Multinomial Classification Analysis in Marketing Research," Journal of Marketing Research 15 (February), pp. 103-112.

Evans, F.B. (1959), "Psychological and Objective Factors in the Prediction of Brand Choice: Ford vs. Chevrolet," Journal of Business 32 (October), pp. 340-369.

Fisher, L., and J.W. Van Ness (1973), "Admissible Discriminant Analysis," Journal of the American Statistical Association 68, pp. 603-607.

Frank, R.E., W.F. Brassy, and D.G. Morrison (1965), ';Bias in Multiple Discriminant Analysis,"' Journal of Marketing, Research 2 (August), pp. 250-258.

Goldstein, M. , and "T. Rabinowitz (1975), "Selection of Variates for the Two Group Classification Problem," Journal of the American Statistical Association 70, pp. 776-781.

Hills, M. (1966). "Allocation Rules and their Error Rates," Journal of theRoval Statistical Societv B28, pg. 1.

Krishnaswami, P., and R. Nath (1968), "Bias in Multinomial Classification, Journal of the American Statistical Association 63, pp. 298-303.

Krzanowski, W.J. (1975), "Discrimination and Classification Using Both Binary and Continuous Variables," Journal of the American Statistical Association 70, pp. 782-790.

Lachenbruch, P.A. (1967), "An Almost Unbiased Method of Obtaining Confidence Intervals for the Probability of Misclassification in Discriminant Analysis," Biometrics. 23, pp. 639-645.

Lachenbruch, P.A., and M.R. Mickey (1968), "Estimation of Error Rates in Discriminant Analysis," Technometrics 10, pp. 1-11.

McLachlan, G.J. (1974), "Estimation of the Errors of Misclassification on the Criterion of Asymptotic Mean Square Error," Technometrics 16 (May), pp. 255-256.

McLachlan, G.J. (1977), "'Estimating the Linear Discriminant Function from Initial Samples Containing a Small Number of Unclassified Observations," Journal of the American Statistical Association 72, pp. 403-406.

Morrison, D.G. (1969), "On Interpretation in Discriminant Analvsis,"' Journal of Marketing Research 6 (May), pp. 156-163.

Mosteller, F., and J.W. Tukey (1968), "Data Analysis, Including Statistics," in ' The Handbook of Social Pscyhology, Vol. 2, G. Lindsey and E. Aronson, eds., Reading, MA: Addison-wesley, pp. 80-203.

Mosteller, F., and D.F. Wallace (1963), "Influence in an Authorship Problem," Journal of the American Statistical Association 58 (June), pp. 275-309.

Randles, R.H., J.D. Brofitt, J.S. Ramberg, and R.V. Hogg (1978), "Discriminant Analysis Based on Ranks," Journal of the American Statistical Association 73, pp. 379-384.

Smith,, S.M. (1979). "Product Aggregation as a Mediating Variable in the Segmentation of Consumer and Geographic Markets,'.' Unpublished Doctoral Dissertation, Pennsylvania State University, University Park, PA.

Urbakh, V.U. (1971), "Linear Discriminant Analysis: Loss of Discriminating Power when a Variate is Omitted," Biometrics 27, pp. 531-534.