Guide
Cluster analysis, like reduced space analysis, is concerned with data matrices in which the variables have not been partitioned beforehand into criterion versus predictor subsets.
The most common use of cluster analysis is for classification. That is, subjects are separated into groups such that each subject is more like other subjects in its group than it is to subjects outside the group. Cluster analysis is thus concerned ultimately with classification and represents a set of techniques which are part of the field of numerical taxonomy (Frank and Green, [1968]; Punj and Stewart [1983]; Aldenderfer and Blashfield [1984]). We will initially focus on clustering procedures that result in the assignment of each subject to one and only one class. Subjects within a class are usually assumed to be indistinguishable from one another. Thus, we assume that the underlying structure of the data involves an unordered set of discrete classes. In some cases we may also view these classes as hierarchical in nature, with some classes divided into subclasses. Clustering procedures can be viewed as “preclassificatory” in the sense that the researcher has not used prior judgment to partition the subjects (rows of the data matrix). However, it is assumed that some of the objectives are heterogeneous; that is, that “clusters” exist. This presupposition of different groups is based on commonalities within the set of independent variables. This assumption is different from that made in the case of discriminant analysis or automatic interaction detection, where the dependent variable is used to formally define groups of objects and the distinction is not made on the basis of profile resemblance in the data matrix itself. Thus, given that no information on group definition is formally evaluated in advance, the major problems of cluster analysis will be discussed as follows:
- What measure of intersubject similarity is to be used and how is each variable to be “weighted” in the construction of such a summary measure?
- After intersubject similarities are obtained, how are the classes to be formed?After the classes have been formed, what summary measures of each cluster are appropriate in a descriptive sense; that is, how are the clusters to be defined?
- Assuming that adequate descriptions of the clusters can be obtained, what inferences can be drawn regarding their statistical significance?
Our previous discussion of clustering analysis has tended to emphasize the tandem approach of dimensional and nominal (class-like) representation of data structures. In addition to using multidimensional scaling techniques for reduced space analysis, a number of other nonlinear approaches have been developed, including nonlinear factor analysis [McDonald, 1962], polynomial factor analysis [Carroll 1969], correspondence analysis [Carroll, Green and Schaffer, 1986]. Space does not permit anything but brief mention of this interesting work. We do consider in some detail, however, a combination qualitative-quantitative approach to an important problem in reduced space analysis–the interpretation of data structures.
As mentioned earlier, even a pure class structure–where class membership accounts for all of the information in the data–can be represented spatially. More commonly, however, we consider cluster analysis as a more appropriate technique for characterizing such data. On the other hand, other data structures are inherently dimensional, so that measures of proximity are assumed to be able to vary rather continuously throughout the whole matrix of proximities. Pure typal and pure dimensional structures represent only two extremes. Since all proximity matrices (that obey certain properties [Gower, 1966]) can be represented spatially, it would seem of interest to consider data structures in terms of the restrictions placed on the points as they are arranged in that space. This motivation underlies many of the most recent developments in cluster analysis.Torgerson [1965] was one of the first researchers to become interested in the problem of characterizing data as “mixtures” of discrete class and quantitative variables. Several varieties of such structures can be obtained:
- Data consisting of pure and unordered class structure. Dimensional representation of such data would consist of points at the n vertices of an n-1 dimensional simplex where interpoint distances are all equal. For example, three classes could be represented by an equilateral triangle in two-space, four classes by a regular tetrahedron in three-space, and so on.
- Data consisting of concentrated masses of points, corresponding to classes, where interclass distances are unequal, thus implying the existence of latent dimensions underlying class descriptions.
- Data consisting of hierarchical sets of attributes where some classes are nested within other classes, e.g., cola and non-cola drinks within the diet-drink class.
- Data consisting of dimensional variables nested within discrete classes, e.g., sweet to non-sweet cereals within the class of “processed” shape (as opposed to “natural” shape) cereals.
- Data consisting of mixtures of ideal (mutually exclusive) classes so that one may find, for example, points in the interior of an equilateral triangle whose vertices represent three unordered classes.
- Data consisting of pure dimensional structure in which, theoretically, all of the space can be filled up by points. Insert Figure 5-7. While the above categorizations are neither exclusive nor exhaustive, they are illustrative of the variety of data structures that could be obtained in the analysis of “objective” data or subjective (similarities) data of the sort described in the preceding sections. From the viewpoint of cluster analysis, some of the above structures could produce elongated, parallel clusters in which average intracluster distance need not be smaller than intercluster distances. Moreover, one could have structures in which the clusters curve or twist around one another along some manifold embedded in a higher dimensional space [Shepard and Carroll, 1966]. Figure 5-8 shows three types of data structures as related to the above categories [Torgerson, 1965]. The first panel illustrates the case of three unordered discrete classes. The second panel illustrates the case of discrete class structure where class descriptors are assumed to be orderable. The third panel shows the case of three discrete classes and an orthogonal variable which is quantitative. Points occur only along the solid lines of the prism. The fourth panel illustrates the case where objects are made up of mixtures of discrete classes plus an orthogonal quantitative dimension. In this case all objects lie on or within the boundaries of the curve prism while “pure” cases would lie at one of the three edges with location dependent upon the degree of the quantitative variable which each possesses.Research in cluster analysis and related techniques is proceeding in new directions for dealing with heretofore intractable data structures. The continued development and refinement of interactive display devices should further these efforts by enabling the researcher to “visualize” various characteristics of the data array as a guide to the selection of appropriate grouping methods.
The key element of all clustering techniques discussed so far is the mutually exclusive and exhaustive nature of the clusters developed. While in most cases, managers view segments as mutually exclusive and hierarchical in nature, cases do exist where segments are mutually exclusive. Indeed, consumers may well fit into several segments. Overlapping clustering is a new clustering model which relaxes the exclusivity constraint of most other hierarchical and non-hierarchical cluster models.
As an example of a cluster analysis of brands of soft drinks, Tab may be perceived as fitting into clusters identifying diet drink, cola, and used by women, whereas Diet Pepsi would fit into only the first two benefit clusters. Brands might compete across product categories. V8 drink would compete against other vegetable/fruit drinks, as well as against soft drinks and even as a between meals snack. A cluster of toothpaste users might show that Aqua-Fresh toothpaste appeals to the fresh breath, decay prevention, and brighteners clusters, while Crest may appeal to only the decay prevention benefit cluster. Overlapping clustering simply allows for patterns of overlapping to be considered. Arabie [1977], Shepard and Arabie [1979], Arabie and Carroll [1980], Arabie, Carroll, DeSarbo and Wind [1981] outline methods for overlapping clustering, but point out that limitations do occur in practice.
First, it is difficult to develop an algorithm that effectively considers all possible cluster overlap options, especially if the sample size is large. Second, most overlapping clustering algorithms produce too many clusters with excessive overlap. A high degree of overlap results in poor configuration recovery, or in other words, a great mathematical model that is difficult to visualize from the data.Shepard and Arabie [1979] provide a detailed explanation of their ADCLUS (for “additive clustering”) model. The ADCLUS model represents a set of m clusters which may or may not be overlapping. Each cluster is assigned a numerical weight, wk, where k=1,…,m. The similarity between any pair of points is predicted in the model as the sum of the weights of those clusters that contain the pair. Arabie and Carroll [1980] and Arabie, Carroll, DeSarbo and Wind [1981] further develop the ability to fit the ADCLUS by presenting the MAPCLUS (for MAthematical Programming CLUStering) algorithm. This implementation appears to meet the needs of clustering items in more than a single cluster. In addition, clusters may be added, deleted, or modified to produce constrained solutions [Carroll and Arabie, 1980], and estimate (in a regression sense) the importance of new sets of clusters in explaining variance in the data. The importance of overlapping clustering is self evident, particularly in applications where clusters are not mutually exclusive, but are overlapping. This reality reflects the existence of multi-attribute decision rules in decision making behavior, divergent product application or use scenarios, and even joint decisions made by multiple users within the same household.
This section has considered a companion objective of the scaling of similarities and preference data–the use of metric and nonmetric approaches in data reduction and taxonomy. We have pointed out that many of the multidimensional scaling programs can serve useful functions as types of nonmetric factoring procedures. Moreover, clustering procedures are often a helpful adjunct in data analysis when one desires to group objects (or variables) according to their relative similarity. We first discussed metric approaches to reduced space analysis, more specifically focusing on principal components. This was followed by brief descriptions of nonmetric analogues to factor analysis, including several of the algorithms originally discussed in the context of similarities and preference data. We then turned to a description of clustering methods and addressed the topics of association measures, grouping algorithms, cluster descriptions and statistical inference. This led to presentation of some pilot research utilizing cluster analysis, in examining the performance structure of the automobile market. We concluded the section with a description of the general problem of portraying data structures that consist of mixtures of categorical and dimensional variables and a discussion of the usefulness of overlapping clustering. (These sections have been taken from Multidimensional Scaling: Concepts and Applications by Paul E. Green, Wharton School)