next up previous
Next: Multidimensional Scaling Up: Proximity Visualization of Abstract Data Previous: Introduction

Subsections

   
Multivariate Visualization Techniques

Visual exploration of multivariate data is of great interest in Statistics and Information Visualization. A number of methods have been proposed in both fields, ranging from the very useful to the quirky. This chapter introduces a few of the most established multivariate visualization techniques by example. The criterion for selection was generality, and suitability for the non-interactive and flat medium of paper. The effectiveness of the methods is compared, and evaluated relative to their limitations.

   
Running Example


 
Table 2.1: Details of the cars data table


country model name mpg weight ratio hp disp. cyl.
USA Buick Estate Wagon 16.9 4.360 2.73 155 350 8
USA Ford Country Squire Wagon 15.5 4.054 2.26 142 351 8
USA Chevy Malibu Wagon 19.2 3.605 2.56 125 267 8
USA Chrysler LeBaron Wagon 18.5 3.940 2.45 150 360 8
USA Chevette 30.0 2.155 3.70 68 98 4
Japan Toyota Corona 27.5 2.560 3.05 95 134 4
Japan Datsun 510 27.2 2.300 3.54 97 119 4
USA Dodge Omni 30.9 2.230 3.37 75 105 4
Germany Audi 5000 20.3 2.830 3.90 103 131 5
Sweden Volvo 240 GL 17.0 3.140 3.50 125 163 6
Sweden Saab 99 GLE 21.6 2.795 3.77 115 121 4
France Peugeot 694 SL 16.2 3.410 3.58 133 163 6
USA Buick Century Special 20.6 3.380 2.73 105 231 6
USA Mercury Zephyr 20.8 3.070 3.08 85 200 6
USA Dodge Aspen 18.6 3.620 2.71 110 225 6
USA AMC Concord D/L 18.1 3.410 2.73 120 258 6
USA Chevy Caprice Classic 17.0 3.840 2.41 130 305 8
USA Ford LTD 17.6 3.725 2.26 129 302 8
USA Mercury Grand Marquis 16.5 3.955 2.26 138 351 8
USA Dodge St Regis 18.2 3.830 2.45 135 318 8
USA Ford Mustang 4 26.5 2.585 3.08 88 140 4
USA Ford Mustang Ghia 21.9 2.910 3.08 109 171 6
Japan Mazda GLC 34.1 1.975 3.73 65 86 4
Japan Dodge Colt 35.1 1.915 2.97 80 98 4
USA AMC Spirit 27.4 2.670 3.08 80 121 4
Germany VW Scirocco 31.5 1.990 3.78 71 89 4
Japan Honda Accord LX 29.5 2.135 3.05 68 98 4
USA Buick Skylark 28.4 2.670 2.53 90 151 4
USA Chevy Citation 28.8 2.595 2.69 115 173 6
USA Olds Omega 26.8 2.700 2.84 115 173 6
USA Pontiac Phoenix 33.5 2.556 2.69 90 151 4
USA Plymouth Horizon 34.2 2.200 3.37 70 105 4
Japan Datsun 210 31.8 2.020 3.70 65 85 4
Italy Fiat Strada 37.3 2.130 3.10 69 91 4
Germany VW Dasher 30.5 2.190 3.70 78 97 4
Japan Datsun 810 22.0 2.815 3.70 97 146 6
Germany BMW 320i 21.5 2.600 3.64 110 121 4
Germany VW Rabbit 31.9 1.925 3.78 71 89 4


Several multivariate visualization techniques have been presented with a challenge in the form of the cars data table [Henderso81], which is reproduced in Table 2.1, and is also mentioned in Section B.1 of the appendices. This data table contains a record of 38 cars manufactured in the period 1978-79, with the following attributes:

1.
primary country of the manufacturer
2.
model name
3.
miles per gallon - a measure of petrol efficiency assessed on the race track
4.
weight in thousands of lbs
5.
drive ratio in the highest gear
6.
horsepower
7.
engine displacement in cubic inches
8.
number of cylinders
The first attribute is measured on a nominal scale (see Section 1.2.3), the second is a label (see Section 1.2.8), the remaining attributes are quantitative (see Section 1.2.1). The task set out for each visualization method is that of bringing out the differences and similarities between cars on the basis of their drive parameters. We felt that including the first two attributes would prejudice this analysis, and cause cars from the same manufacturer or just the same country to appear more similar.

   
Parallel Coordinates

A single row $\vec{u}_i^T=(u_{i1},\ldots,u_{iq})$ of a data table with q attributes, measured on any scale apart from nominal (see Section 1.2.3), can be thought of as a point in a q-dimensional Cartesian coordinate system, with the abscissa on the ath axis given by uia. For q>3 such configurations of points cannot be directly visualized; the method of Parallel Coordinates overcomes this limitation by arranging axes vertically, and spacing them uniformly across the plane [Inselber85]. Point $\vec{u}_i$ in this coordinate system is a polygonal line connecting the corresponding abscissas on the parallel axes.

It is apparent from the parallel coordinates visualization of the cars data table in Figure 2.1(a) that the last three attributes are substantially correlated. Moreover, lines representing the individual rows cross over between mpg, weight, drive ratio, and horsepower attributes, suggesting that these attributes might be negatively correlated in pairs. Inverting the mpg and drive ratio axes leads to a much clearer visualization in Figure 2.1(b), which could be improved further by permuting the order of axes. The need for such a high level of customisation presents the ultimate obstacle in effectively visualizing many variables with this method, made worse if the number of observations, and hence lines, is large.

  
Figure 2.1:Parallel coordinates

\epsfig{file=images/parallel-init.eps, width=2.7in}
(a) original

\epsfig{file=images/parallel-mod.eps, width=2.7in}
(b) correlated


 
Figure 2.2:Andrews plot


\epsfig{file=images/andrews.eps, width=4in}



   
Andrews Plot

In an Andrews plot each row $\vec{u}_i^T=(u_{i1},\ldots,u_{iq})$ of a data table with q attributes is represented by a line, similarly to parallel coordinates. In this case it is a curve defined by the following trigonometric function [Andrews72]:

 \begin{displaymath}f_{\vec{u}_i}(t)=\underbrace{\frac{u_{i1}}{\sqrt{2}}+u_{i2}\s...
...os(t)+u_{i4}sin(2t)+u_{i5}cos(2t)+\ldots}_{\displaystyle q}
\end{displaymath} (2.1)

plotted over the interval $t\in(-\pi,\pi)$. It is recommended that the most important attributes are associated with the low frequency terms, as they determine the overall shape of the curve. This might entail an iterative and exploratory approach to determine a satisfactory assignment, in the same way as for parallel coordinates.

Let $\bar{\vec{u}}=\sum_{i=1}^n\vec{u}_i$ denote the mean of the n rows $\vec{u}_i^T$ of the data table; function (2.1) preserves this mean:

\begin{displaymath}f_{\bar{\vec{u}}}(t)=\frac{1}{n}\sum_{i=1}^nf_{\vec{u}_i}(t)
\end{displaymath} (2.2)

so that the plot of $\bar{\vec{u}}$ is a pointwise average of the plots for individual rows. Another useful property of (2.1) is that it preserves the Euclidean distance $\Vert\vec{u}_i-\vec{u}_j\Vert$ between pairs of points in the q-dimensional space:

\begin{displaymath}\int_{-\pi}^{\pi}\left(f_{\vec{u}_i}(t)-f_{\vec{u}_j}(t)\righ...
...-\vec{u}_j\Vert^2=\pi\sum_{a=1}^q\left(u_{ia}-u_{ja}\right)^2
\end{displaymath} (2.3)

Thus, close points will result in similar plots, and plots for distant points will be distinct. These features are useful for detecting clusters and outliers, and are common to the parallel coordinates technique. Andrews plots have a number of other characteristics, especially helpful in statistical analysis of the underlying data [Andrews72].

Figure 2.2 is an Andrews plot of the cars data table. There seem to be two extreme clusters of cars. The remaining observations fall between the extremes, and form a loose cluster, which can be separated from the first two at t=-1 and t=2. Additional insight could be gained by plotting these clusters separately, and in fact it is recommended that no more than 10 points $\vec{u}_i$ are plotted at a time for a detailed examination [Andrews72].

   
Multidimensional Scaling

Like parallel coordinates and Andrews plots, Multidimensional Scaling can also be used to visualize multivariate data [Borg97,Cox94]. However, the original q axes and coordinates of points $\vec{u}_i=(u_{i1},\ldots,u_{iq})^T$ do not enter the visualization directly. Instead, a configuration of points $\vec{x}_i=(x_{i1},\ldots,x_{ip})^T$ is found in a space of lower dimension p<q, such that all inter-point distances $\Vert\vec{x}_i-\vec{x}_j\Vert$ match as closely as possible the original distances $\Vert\vec{u}_i-\vec{u}_j\Vert$. A two- or three-dimensional embedding is an obvious choice for visualization; higher values of p can be useful for statistical analysis. A more elaborate description of this method is presented in Chapter 3.

It might be helpful to envisage the process of multidimensional scaling in two dimensions as wrapping a surface - an elastic sheet - around points $\{\vec{u}_i\}$ in the original high dimensional space, and taking $\vec{x}_i$ as the projection of $\vec{u}_i$ onto this surface. In effect a non-linear mapping between the two configurations is established, and it is likely to be superior for purposes of visualization to rotating a rigid plane in the high dimensional space to find the closest fit to $\{\vec{u}_i\}$, a procedure known as Principal Components Analysis [Pearson01].

 
Figure 2.3: Multidimensional scaling


\begin{picture}(1.06,0.469709)(-0.00280374,-0.00280374)%
\put(0.0211885,0.1496...
...{\textsf{BMW 320i}}%
\put(0.97237,0.188864){\textsf{VW Rabbit}}%
\end{picture}


 
Figure 2.4: Scatterplot matrix


\epsfig{file=images/scattermatrix.eps, width=5in}



A two-dimensional multidimensional scaling configuration for the cars data set is presented in Figure 2.3. Inspection of the corresponding Andrews plot in Section 2.3 led to the conclusion that there are three clusters of cars. These clusters are apparent from Figure 2.3, and can be readily verified to group cars with 8 cylinders on the left hand side of the figure, 6 and 5 in the middle, and 4 on the right. In effect a map is constructed that charts individual cars based on the overall similarity of their drive parameters - a Proximity Visualization, in other words.

Scatterplot Matrix

A scatterplot matrix is a collection of scatterplots organised analogously to a covariance matrix, with variable a plotted against variable b in the ath row and bth column of the matrix [Clevelan84]. The diagonal plots can show the distribution of individual variables, or simply be placeholders for variable names, as is the case for the scatterplot matrix representation of the cars data table in Figure 2.4. Individual scatterplots can reveal correlations between variables, for example linearity, and the complete matrix can be useful for an initial exploration of a data set. However, the display becomes overwhelming with anything more than a few variables; lack of a unified representation of data is also a serious drawback.

Definite correlations between attributes of the cars data table can be seen from Figure 2.4. For example, the weight of a car is proportional to its horsepower, engine displacement, and the number of cylinders, and inversely proportional to its drive ratio and mileage per gallon. Thus, the decision to invert the parallel coordinates for the last two attributes was justified in Section 2.2. The number of cylinders attribute stands out as having only four levels, and separating most other attributes into distinct clusters.

Iconographic Displays

In an iconographic display each icon or glyph represents a single row of a data table. Icons can be arranged in a grid, as in Figures 2.5(a) and 2.5(b), to enable a systematic assessment of similarities and differences between the rows, and also between the attributes. Alternatively, the position of glyphs in the plane can be driven by two of the attributes, providing their spatial interpretation is meaningful. An iconographic display can be combined with the corresponding proximity visualization, by using icons instead of labelled points, to give the resulting visual representation a degree of redundancy.


 
Figure 2.5: Iconographic visualizations

\epsfig{file=images/stars.eps, width=\textwidth}

(a) star glyphs


 

\epsfig{file=images/chernoff.eps, width=\textwidth}

(b) Chernoff faces


Star Glyphs

A star is composed of equally spaced radii, as many as the number of attributes in the data table, stemming from the centre. The length of the rightmost spike is proportional to the value of the first attribute for a given row; the remaining attributes are assigned to their spikes counter clockwise in this manner [Fienberg79]. The result of applying this prescription to the cars data table is shown in Figure 2.5(a).

The clarity of a star display will suffer as the number of attributes increases, and grouping correlated attributes to provide smooth transitions between spikes might be beneficial. The similarity or dissimilarity of a pair of stars can be appreciated visually; however, gaining a proper overview of a large data table can become a tedious task. This sort of processing is best left to the computer, so that proximity between rows of a data table can be represented in a direct spatial form, as in Section 2.4.

Stars of Figure 2.5(a) can be classified into a few tight groups. This clustering would become more obvious with the aid of automatic or interactive sorting of stars, to bring the similar ones together. However, this will amount to carrying out multidimensional scaling, as pointed out earlier. An interesting observation is that roughly circular stars, e.g. the one for `Ford Mustang Ghia', appear in the middle of the proximity visualization of Figure 2.3; many more analogies can be found in both visualizations.

Chernoff Faces

Chernoff faces take advantage of the natural familiarity and recognition of human faces [Chernoff73]. Each facial feature represents one variable; obviously, some features are more prominent, and a possible assignment in the decreasing order of importance is: In total, 15 attributes can be represented, and additional variables could be encoded by making faces asymmetric [Flury81]. The trouble is that the appearance of a face will vary with the order of assignment of variables to facial expressions, and perceived similarity of faces will be affected.

Figure 2.5(b) is a collection of Chernoff faces representing the rows of the cars data table. Only the first six facial features are used, and the rest is set to a neutral expression. Overall, faces are as effective at portraying similarities as star glyphs. The differences between rows can be detected; however, their magnitude is much more difficult to judge, without detailed knowledge of the assignment of attributes to facial features. Also, it is not possible to tell anymore which icons represent average or extreme observations, e.g. compare `Ford Mustang Ghia' and `VW Rabbit'. Additionally, Chernoff faces share disadvantages of star glyphs, and thus are inferior.

Summary

The advantage of multidimensional scaling over other multivariate visualization techniques is that it is independent of the number of variables. As long as it is possible to ascertain the high dimensional distance between observations, by using dissimilarity coefficients of Section 1.2 for example, a low dimensional embedding can be found. The type of variables is also immaterial, and even heterogeneous data can be visualized with the aid of the general dissimilarity coefficient (1.8), including nominal variables, which elude other multivariate visualization methods.

The multidimensional scaling technique scales well with the number of observations, since labelled points or small icons constitute the visual representation of individual observations. Thus, identification of observations is provided, and their actual relationships are represented by proximity. With more observations, the density of icons will increase; however, their relative proximity will be unaffected. Therefore, an informative overview of the data set is presented, highlighting clusters and outliers. Interesting groups of observations can then be analysed separately with this or other multivariate visualization techniques.


next up previous
Next: Multidimensional Scaling Up: Proximity Visualization of Abstract Data Previous: Introduction

© 2001 Wojciech Basalaj