next up previous
Next: Case Studies Up: Proximity Visualization of Abstract Data Previous: Large Scale Visualization

Subsections

   
Proximity Grid

This chapter is concerned with Proximity Grid - a visualization technique that allows information to be presented with high density. Each element of an abstract data collection is represented within a single cell of the grid, and thus considerable detail can be shown without overlap. The proximity relationships are preserved by clustering similar elements in the grid, and keeping dissimilar ones apart. Proximity Grid is thus a counterpart of Multidimensional Scaling in a discrete domain. Four algorithmic solutions to this problem are presented, and evaluated in terms of output quality and running time. A recommendation for algorithm selection is made based on the trade-off between these criteria.

Origins of the Problem

An experimental evaluation of MDS arrangements versus random grid arrangements, in the context of browsing a collection of images, demonstrated the superiority of the former for a basic search task [Rodden99a]. However, in a post-experimental questionnaire subjects reported overlap of image thumbnails in MDS arrangements to cause them difficulty in finding a target image. Some actually preferred the regularity of a random grid, because it made scanning the collection easier. An arrangement method that combined the best features of both configuration types - here referred to as Proximity Grid - was proposed.


 
Figure 5.1: 100 images of Kenya arranged by visual similarity. These images have been taken from the Corel stock photograph collection


\epsfig{file=images/kenya-kyst.eps, width=9.25cm}


(a) MDS arrangement generated by the KYST algorithm $(\sigma =0.0692)$



  \epsfig{file=images/kenya-swo.eps, width=9.25cm}


$12\times 12$ proximity grid generated by the SWO algorithm $(\sigma =0.0789)$



Because the MDS analysis does not take the size of configuration elements into account, its direct use for visualization may lead to partial or complete overlap of icons representing the objects from a collection. Figure 5.1(a) is an MDS visualization of an image collection, with many thumbnails having their detail obscured. By forcing icon centres to lie on a grid, a minimum amount of separation can be ensured, while still keeping similar icons (objects) together. The effect of transforming the MDS configuration to a proximity grid can be seen in Figure 5.1(b).

The problem of overlap is less serious in three dimensions. The ability to view a 3D visualization from different viewpoints aids in separating the icons, and thus alleviates the problem to a large extent. Also, the size of icons can be reduced to limit overlap, and compensated by the natural use of zooming and panning. These counter measures are not suitable for a flat representation, whose primary use is to provide an informative overview of a data collection. Therefore, we restrict our attention to two-dimensional grids, but note that the theory and algorithms can easily be extended to higher dimensionality.

Algorithms

The purpose of the algorithms described below is to position a set of n objects in a grid, such that similar objects occupy neighbouring cells, and are separated from dissimilar objects, where the dissimilarities between all pairs of objects are collected in matrix $\vec{\Delta}$, as in Chapter 3. The size of the grid, i.e. the number of cells, can be greater than the number of objects. As the number of spare cells rises, there is more freedom in placing the objects, and a more faithful representation of pairwise similarity can be achieved. In the limiting case the grid will become so sparse that it will be equivalent to a continuous MDS configuration, especially when taking the discrete nature of computer displays into account. Therefore, the accuracy of a grid algorithm at a certain level of grid density can be assessed against the corresponding continuous configuration.

The quality of a proximity grid $\vec{Y}\in\mathds{N}^2$, where $\mathds{N}$ denotes the set of natural numbers, can be established by evaluating one of the loss functions discussed in Section 3.2 for a given $\vec{Y}$ and $\vec{\Delta}$. An algorithm for generating proximity grids is essentially a numerical minimisation procedure for such a loss function. As a consequence of specifying this minimisation problem over a discrete domain - $\mathds{N}^2$ - the choice of a heuristic is limited, as any method relying on derivatives cannot be applied.

When using Stress (3.11) small dissimilarities will not be modelled faithfully, because even a large relative error will have negligible impact on the value of (3.11); see Section 3.2.4 for a discussion and comparison with Energy (3.13). This property is actually an advantage in the case of a discrete (grid) configuration, because similar objects cannot be closer than the cell separation, and will not be penalised for it. Consequently, we have chosen Stress as the basis of our grid algorithms. KYST% latex2html id marker 18999
\setcounter{footnote}{2}\fnsymbol{footnote} is a publicly available MDS program that works with Stress-1 (3.9) [Kruskal78b], which is equivalent to Stress after a suitable scaling transformation (see Section 3.2.3 for details). Hence, we used KYST for generating continuous MDS configurations corresponding to proximity grids.

   
Greedy

The idea behind this algorithm is to start with a continuous MDS configuration, and discretise it so that object locations are snapped to the grid. Naturally, it is possible for two objects to be mapped to the same grid cell, and many strategies could be employed to resolve this. Since we wanted to avoid introducing another numerical optimisation process, we adopted a greedy approach. This will only give an approximate solution, but the computational cost is low, and the amount of introduced error can be controlled by adjusting the grid size.

The location yi of each object i in the configuration is considered in turn. If the closest grid cell c to yi is unoccupied, i is allocated to it. Otherwise, a close by empty cell e is found by performing a spiral search starting from cell c. The second cell visited by the square spiral is the second closest cell to yi, taking the Euclidean distance into account. This determines the orientation of the spiral in an unbiased way. Figure 5.2 is an example of a spiral search when c is below and to the left of yi; the remaining three cases are analogous. If there are at least as many grid cells as objects this procedure is guaranteed to find e, though this might not be the closest empty cell to yi.


 
Figure 5.2: Example of spiral search in a grid. The grey triangle represents the set of all possible positions of point yi that would result in this particular orientation of the spiral. The black circle denotes the optimal cell c for this point


\epsfig{file=images/spiral-search.eps, width=8cm}



Once e is found three different strategies can be adopted, as illustrated in Figure 5.3:
1.
empty - i is allocated to e
2.
swap - the object allocated to c is relocated to e, and i is allocated to c
3.
bump - objects allocated to cells on the line from c to e are relocated outwards by one cell, and i is allocated to c. These cells can be determined by applying the line drawing algorithm of computer graphics.

 
Figure 5.3: Examples of different grid allocation strategies. The solid black circle denotes the optimal cell, and the empty circle shows the empty cell found during the spiral search
\epsfig{file=images/strategy-empty.eps, width=3.2cm} \epsfig{file=images/strategy-swap.eps, width=3.2cm}


\epsfig{file=images/strategy-bump.eps, width=4cm}

(a) empty (b) swap (c) bump



Note that a bump is equivalent to a swap when e is adjacent to c.

The order in which objects are considered is important, and should be independent of permutations of the input configuration. We define a complete weighted graph with a vertex for each object in the collection, and take each edge weight to be the dissimilarity between the corresponding pair of objects. The Minimum Spanning Tree (MST) calculated for this graph then defines such a unique ordering of objects (see Section 4.2.1). The shortest MST edges link the most closely related objects. Considering them first in the empty strategy will ensure that they occupy neighbouring cells. The same effect will be achieved by considering them last in the swap and bump strategies.

To identify the best strategy we have performed a rigorous evaluation, which is presented in Section 5.3.1. The bump strategy is significantly better in terms of Stress (3.11) for proximity grids that it generates. This can be attributed to its tendency of spreading error (3.3) over many objects, rather than positioning some optimally, and delegating potentially large error to others, as is the case for the other strategies. Henceforth, we refer to the greedy algorithm with the bump strategy as Greedy1.

Improved Greedy

One obvious area of improvement to Greedy1 is to replace the spiral search by an exact procedure to find the closest empty cell to the original object location yi. A brute force approach is to calculate the distance between yi and the centre of every cell in the grid, sort these distances in the ascending order, and pick the first corresponding cell that is empty. The same effect can be achieved by evaluating the distances lazily, by considering increasingly larger sub-grids around c. We start with the eight cells that immediately surround c, and enter them into a heap [Press92], with the heap condition based on their distance to yi, and the closest remaining cell to yi as the root. At each step the root cell r is examined, if it is empty the search is terminated; otherwise, it is removed from the heap. If r falls on the boundary of the current sub-grid, the sub-grid is enlarged in this direction by entering a new layer of cells into the heap. The heap condition is subsequently restored. We refer to this modified grid algorithm as Greedy2.

``Squeaky Wheel'' Optimization

Section 5.2.1 highlighted that the order in which objects are considered for inclusion in the grid by the greedy algorithm is important. In Greedy1 and Greedy2 we decided to derive this order from the data themselves, so that closely related objects would be given preferential treatment. In general, this will not give the best arrangement in terms of Stress (3.11), and there will be other orderings that will give better results. One could envisage a procedure that will continually permute the order of objects, run the greedy algorithm with it, and evaluate Stress in search of the minimum. An obvious candidate is a Genetic Algorithm [Goldberg89] (also see Section 3.3.4) or Simulated Annealing [Dowsland95] (also see Section 3.3.6). However, a much better use for these techniques is to search directly for the best allocation of objects to grid cells, as in Section 5.2.4, bypassing the greedy and continuous MDS steps.

``Squeaky Wheel'' Optimization (SWO) [Joslin99] is an effective technique for coupling a greedy algorithm with a reordering (reprioritisation) scheme. Once a solution is constructed, it can be evaluated to find objects that contribute the most to the value of the loss function, and these objects can be promoted in the sequence, so that they will be handled earlier and probably better on the subsequent greedy run. This construct/analyse/prioritise cycle continues for a set number of iterations, or until an acceptable solution is found. Typically, the sequence evolves so that easy to deal with objects sink to the back, whereas troublesome ones stay at the front.

In our SWO algorithm the construction phase is essentially the entire Greedy2, with the empty allocation strategy (see Section 5.2.1). The SWO algorithm will automatically achieve the effect of spreading the representation error (3.3), and an explicit mechanism in the form of the bump strategy will actually interfere with it. The initial sequence is the ascending MST order, which is the appropriate choice for the empty strategy (see Section 5.2.1).

Every object is subsequently analysed, and the distance from the centre of its grid cell c to the original continuous location yi constitutes its blame factor. If c is the closest cell to yi the blame factor is set to 0. The prioritisation phase consists of a partial bubble sort. On a single pass from the front of the sequence to the back, every object that has a positive blame factor is swapped with its predecessor, and its blame is reduced by 1. The net effect is that objects move forward in the sequence proportionally to their blame. The SWO cycle is repeated a fixed number of times (SWO.iterationlimit parameter), and the grid configuration with the lowest value of Stress (3.11) constitutes the solution.

   
Genetic Algorithm

For a brief introduction to Genetic Algorithms (GAs) the reader is referred to Section 3.3.4. It has been relatively straightforward to adapt the GA for continuous MDS to a Proximity Grid one, which is a clear demonstration of how generic this heuristic strategy is. The details of the selection scheme and that of forming a new generation of the chromosome population remain unchanged, and are given in the final three paragraphs of Section 3.3.4. The only exception is that the fitness of a chromosome is based on Stress (3.11) instead of Energy (3.13). The low-level issues of selecting the coding of the problem and associated crossover and mutation operators have to be reconsidered, and are discussed below.

An allocation of n objects to k grid cells can be compactly represented by a permutation of n cell identifiers, taken from an alphabet of cardinality k. Thus, a non-binary coding seems to be the most appropriate for this problem [Goldberg89]. A solution is encoded as a vector $\vec{v}$ of n cardinal numbers, where object i is allocated to cell vi, determined by counting cells sequentially from top left to bottom right. A pair of solutions $\vec{v}^{(p_1)}$ and $\vec{v}^{(p_2)}$ is recombined by applying the cycle crossover (CX) operator [Goldberg89], to yield a new pair of chromosomes $\vec{v}^{(c_1)}$ and $\vec{v}^{(c_2)}$. This operator ensures that the value v(c1)i of each child gene i comes from the first parent v(p1)i or the other v(p2)i, and that the whole solution is feasible, i.e. no two elements of $\vec{v}^{(c_1)}$ are the same; $\vec{v}^{(c_2)}$ is simply the complement of $\vec{v}^{(c_1)}$. Inheriting a gene from a particular parent might require another gene to be taken from the same parent, and so on, in order to meet the above conditions. This dependency is circular, and CX operates by copying each cycle from a randomly selected parent.

The most appropriate form of mutation to use here is exchange mutation [Reeves95]. An object i is drawn at random from the range [1,n], and a new cell a is randomly selected for it from the range [1,k]. If there exists j such that vj=a, i.e. cell a is already occupied, vj and vi exchange their values; otherwise vi simply becomes a. As argued in Section 3.3.4, the relative importance of crossover and mutation changes throughout the optimisation process, and an adaptive rate of mutation is likely to work best. Thus, whenever CX produces a pair of chromosomes that is identical to the input pair, exchange mutation is performed on both chromosomes. This form of mutation reintroduces diversity only when needed, and does not interfere with crossover otherwise.

Comparison

To evaluate the performance of each algorithm, we used a test bed consisting of 53 data collections: 2 dissimilarity matrices, 2 image collections, 6 graphs, and 43 data tables (see Section B.1 in the appendices for details). The number of objects in a collection ranged from 9 to 506, with the median value of 107. For each collection we applied KYST to calculate a continuous MDS configuration, representing a minimum of Stress (3.11). Since KYST is based on the steepest descent technique [Press92], and therefore can only be expected to find a local minimum, we decided to run it 10 times from a random initial configuration, and only use the final configuration with least Stress. These continuous configurations served as input to Greedy1, Greedy2, and SWO algorithms. Subsequently, these three algorithms and GA were set to generate proximity grids with varying levels of density for each data collection. Grids from 10 to 100% full at the 10% density increments were used, and the resulting Stress recorded for each. We only considered square grids, and so the size was rounded up to the nearest square number. For a small data collection this could result in identical grid sizes for different density levels, and we avoided duplicating the experiment in these cases.

Parameters of the algorithms were fixed to the following values:

   
Statistical Analysis

Our experiment consists of a group of related measurements, made for every data collection in the test bed. We decided to rely on the Friedman test [Siegel56] for the analysis, for the same reasons as given in Section 3.5. Before proceeding with the main experiment we had to establish which allocation strategy (see Section 5.2.1) is best, and we present our findings in the first subsection. The remaining subsections are concerned with the main experiment. Each experimental group consists of 40 measurements of Stress (3.11), one for each combination of grid density level and algorithm. There are three possible ways of grouping these measurements and structuring the analysis: by algorithm, by grid density, and complete.

Allocation Strategy Analysis

To determine whether empty, swap, or bump allocation strategy (see Section 5.2.1) is best to use with the greedy algorithm, we have implemented all three, and tested with the test bed of data collections. The denser the grid the more allocation clashes will occur, and require the intervention of a particular allocation strategy. Thus, we focused on the densest square grids possible: $m\times m$, where $m=\lceil\sqrt{n}\,\rceil$, and n is the number of objects to be positioned in the proximity grid.

Table 5.1 reports the results of the Friedman test with the null hypothesis H0 of no difference between strategies with regard to Stress (3.11), and the alternative hypothesis H1 of at least one strategy having a different ranking from the others. The values of both Friedman's chi-square statistic and Kendall's coefficient of concordance are statistically significant at a level exceeding 0.001, which allows H0 be rejected in favour of H1. Having done so, we can apply the Tukey Multiple Comparison procedure to test for significant differences between strategy rankings in pairs [Keselman77].

As can be gleaned from Table 5.2, all Tukey comparisons are significant at the 5% level, implying that there are definite differences between strategies, and allowing a complete ordering to be derived: bump has the lowest rank, meaning that it consistently achieves the lowest Stress, empty comes second, and swap performs the worst. Non-parametric tests will not establish what the actual differences are, however. A linear regression of bump measurements $\vec{m}^{(bump)}$ to ones for empty and swap in turn can provide an answer: $\vec{m}^{(i)}\approx a_i\vec{m}^{(bump)}$, where $\vec{m}^{(j)}$ is the vector of Stress measurements for all 53 data collections under condition j. The best least-squares fit is for aempty=1.05 and aswap=1.23, meaning that empty is 5% worse than bump, and swap is 23% worse, when considering 100% dense grids. These differences will gradually diminish as the grids get sparser, and disappear completely, when no allocation clashes occur, and there is no need to employ one of these strategies. Nonetheless, bump appears to be the best strategy, and we have based Greedy1 and Greedy2 on it.


 
Table 5.1: Friedman test results for allocation strategies


strategy rank sum1 mean rank2
empty 102.0 1.93
swap 147.5 2.78
bump 68.5 1.29

statistic  
$\chi_r^2$ 3 60.76
W 4 0.573


1 rank sum is the sum of the relative order in terms of Stress for a particular strategy over all 53 cases
2 mean rank is the corresponding rank sum divided by the number of cases
3 $\chi_r^2$ is Friedman's chi-square statistic; the critical value at p=0.001 level is 13.82
4 W is Kendall's coefficient of concordance; it ranges between 0 (no agreement between cases) to 1 (complete agreement); the critical value at p=0.001 level is 0.13


 
Table 5.2: Tukey Multiple Comparison results for allocation strategies


rank comparison difference1 Q 2
swap>bump 79.0 10.85
swap>empty 45.5 6.25
empty>bump 33.5 4.60


1 the difference in rank sums; see Table 5.1
2 Q is Tukey's statistic for multiple comparisons; the critical value at p=0.05 level is 3.31

Algorithm Analysis


There are 53 groups, one for each data collection, of 4 measurements of Stress (3.11) for the Greedy1, Greedy2, SWO, and GA algorithm respectively. Such a block of measurements is made for a particular level of grid density, and we analyse them in the descending order, starting with grids that are 100% full. The second row of Table 5.3 reports the results of the Friedman test with the null hypothesis H0 of no difference between algorithms with regard to Stress at 100% grid density, and the alternative hypothesis H1 of at least one algorithm having a different ranking from the others. The values of both Friedman's chi-square statistic and Kendall's coefficient of concordance are statistically significant at a level exceeding 0.001, which allows H0 be rejected in favour of H1. The Tukey Multiple Comparison procedure can now be performed to test for significant differences between algorithm rankings in pairs.


 
Table 5.3: Friedman test results for the algorithm grouping

density Greedy1 Greedy2 SWO GA $\chi_r^2$ 1 W 2
100% 3.57 3.42 2.00 1.02 141.55 0.890
90% 3.63 3.35 2.00 1.02 142.74 0.898
80% 3.70 3.28 2.00 1.02 144.49 0.909
70% 3.59 3.33 2.00 1.08 134.77 0.848
60% 3.66 3.28 1.92 1.13 134.90 0.848
50% 3.63 3.20 1.94 1.23 120.18 0.756
40% 3.72 3.13 1.92 1.23 125.71 0.791
30% 3.69 3.14 1.87 1.30 119.76 0.753
20% 3.48 2.82 1.81 1.89 65.76 0.414
10% 3.13 2.69 1.78 2.40 34.06 0.214


1 $\chi_r^2$ is Friedman's chi-square statistic; the critical value at p=0.001 level is 16.27
2 W is Kendall's coefficient of concordance; the critical value at p=0.001 level is 0.102
The figures in columns 2-5 are mean ranks across 53 observations

All pairwise comparisons, except for the one involving Greedy1 and Greedy2, are significant at the p<0.05 level, meaning that GA achieves the best performance at 100% grid density, followed by SWO, and the greedy algorithms are tied in last place. We decided to perform a one-tailed Wilcoxon signed-rank test, because it is considerably more powerful [Siegel56], to verify the tie of the greedy algorithms. It produced a p value of 0.055, which just misses our nominal 5% limit, and supports the Tukey comparison.

The analysis of the remaining grid densities is analogous. The results of the Friedman test in every case allow H0 be rejected in favour of H1 at the 0.001 significance level. The outcome of the Multiple Tukey comparisons is essentially the same for densities between 100 and 40%, i.e. GA always comes first, followed by SWO, and the greedy algorithms jointly in last place. However, the Wilcoxon test for each of the greedy pairs is significant at the 5% level, except for 70% density with p=0.06. Thus overall, there is a difference between the greedy algorithms, with Greedy2 performing significantly better.

For 30 and 20% density Tukey comparisons between GA and SWO are no longer significant at the 5% level. The comparison between Greedy1 and Greedy2 at 30% density is not significant either. However, the corresponding Wilcoxon tests are significant at the 5% level, and so the same ordering of the algorithms holds: GA first, then SWO, Greedy2, and Greedy1 last.

The situation is less clear at 10% density. The only significant Tukey comparisons are: Greedy1-SWO, Greedy1-GA, Greedy2-SWO. Additionally, the Wilcoxon test is significant at the 5% level for Greedy1-Greedy2, but is not significant for GA-SWO; the test for Greedy2-GA produces a p value of 0.061. If the last result is accepted, the following ordering can be inferred: SWO and GA are tied in first place, followed by Greedy2, and Greedy1 is last.


 
Figure 5.4: Linear regression comparison of grid algorithms. The closer a combination of algorithm and grid density is to 1 (KYST) the better


\epsfig{file=images/regression.eps, width=8.25cm}


To get a sense of the actual differences in performance between algorithms, we have fitted a linear regression to the measurements for each combination of algorithm and density. The values of Stress (3.11) for KYST, which can be seen to correspond to grid density of 0, thus serving as a suitable reference, were used as the model predictor:

\begin{displaymath}\vec{m}^{(\mathit{algorithm},\mathit{density})}\approx a_{\mathit{algorithm},\mathit{density}}\vec{m}^{(\mathit{KYST})}
\end{displaymath}

where $\vec{m}^{(i)}$ is the vector of Stress measurements for all 53 data collections under condition i. The value of ai that gives the best least-squares fit of the model is plotted in Figure 5.4 for each level of density, with separate series for each algorithm. The further ai is from 1 the greater the difference of the corresponding Stress measurements between condition i and KYST, and the value indicates the overall ratio. The chart is in good agreement with the statistical analysis conducted above. The performance of Greedy1 and Greedy2 is similar, with a slight advantage going to Greedy2, as confirmed by the Wilcoxon test. GA achieves the best performance, and is significantly better than SWO, which in turn is significantly better than the greedy algorithms. The performance of the various algorithms appears to converge around the 10% density level, and approximates that of KYST.

Grid Density Analysis

Now we consider Greedy1, Greedy2, SWO, and GA separately. For each we have a block with 53 groups of 10 measurements corresponding to different levels of grid density. Table 5.4 reports the results of four Friedman tests with the null hypothesis H0 of no difference between grid densities with regard to Stress (3.11), and the alternative hypothesis H1 of at least one density having a different ranking from the others. The values of both Friedman's chi-square statistic and Kendall's coefficient of concordance are statistically significant at a level exceeding 0.001 for each algorithm, which allows H0 be rejected in favour of H1. Moreover, the coefficient of concordance is very close to its maximum value of 1, implying that there is an almost complete agreement between the 53 cases, especially for Greedy1, Greedy2, and SWO. Once H0 has been rejected, post-hoc testing in the form of Tukey Multiple Comparisons can be applied, to identify significant differences between grid density rankings in pairs.


 
Table 5.4: Friedman test results for the grid density grouping


density Greedy1 Greedy2 SWO GA
100% 9.70 9.72 9.74 9.74
90% 8.88 8.86 8.88 8.82
80% 8.14 8.16 8.14 8.05
70% 7.07 7.05 7.05 6.88
60% 6.04 6.04 6.08 5.89
50% 5.04 5.06 5.09 5.13
40% 4.08 4.10 4.01 4.03
30% 3.04 3.00 3.02 2.72
20% 2.02 2.02 2.00 2.11
10% 1.00 1.00 1.00 1.64
$\chi_r^2$ 1 466.10 467.21 470.19 433.88
W 2 0.977 0.979 0.986 0.910


1 $\chi_r^2$ is Friedman's chi-square statistic; the critical value at p=0.001 level is 27.88
2 W is Kendall's coefficient of concordance; the critical value at p=0.001 level is 0.058
The figures in rows 2-11 are mean ranks across 53 observations

The outcome of the post-hoc test for Greedy1, Greedy2, and SWO is identical. All pairwise comparisons are significant at the 5% level, except for all adjacent density levels and $(100\%,80\%)$, $(90\%,70\%)$. However, the Wilcoxon test for each of these pairs shows the difference in ranking to be significant at a level exceeding 0.001. These results imply that there is a consistent gradation of performance for these algorithms depending on grid density, as can be witnessed from the observed ranks in columns 2-4 of Table 5.4, and the high value of Kendall's coefficient of concordance.

Tukey comparisons for all grid density pairs of the GA algorithm are significant at the 5% level, except for all adjacent density levels and $(100\%,80\%)$, $(70\%,50\%)$, $(60\%,40\%)$, and $(30\%,10\%)$. Again, the Wilcoxon test shows each of these differences to be significant at the p<0.001 level. Thus, the performance of GA also depends on grid density; however, for densities of 30% and below the last column of Table 5.4 demonstrates the relationship to be less consistent between all the data collections. This effect is due to the fact that GA is a stochastic minimisation heuristic, and can produce suboptimal solutions. Therefore, there are a number of cases where a test measurement is inferior to one for higher density. This situation is more likely for sparse grids, which put small constraints on the configuration, and the corresponding global minima of Stress are close. A solution to this problem would be to run the algorithm a number of times, as we do for KYST, and use the best result. However, we did not want to give an unfair advantage to GA over the other algorithms, especially when taking into account the necessary increase of the computational effort required to complete the experiment.

Complete Analysis

To complete the analyses we present the results for a block of 53 groups of all 40 measurements. Table 5.5 details the outcome of the Friedman test, with the mean ranks of the various combinations of algorithm and density shown in descending order. Unsurprisingly, the values of both Friedman's chi-square statistic and Kendall's coefficient of concordance are statistically significant at a level exceeding 0.001, allowing the null hypothesis of no difference between conditions with regard to Stress (3.11) to be rejected. There are too many possible pairs of conditions for a Tukey Multiple Comparison procedure to be carried out. However, it would not yield any new insights in addition to the previous two analyses.


 
Table 5.5: Friedman test results for the complete grouping


density algorithm mean rank
100% Greedy1 39.04
100% Greedy2 38.77
90% Greedy1 37.30
90% Greedy2 36.64
80% Greedy1 35.52
80% Greedy2 34.76
100% SWO 33.57
70% Greedy1 32.07
70% Greedy2 31.59
90% SWO 30.90
80% SWO 28.88
60% Greedy1 28.62
100% GA 28.06
60% Greedy2 27.91
70% SWO 25.58
90% GA 24.29
50% Greedy1 23.96
50% Greedy2 23.28
60% SWO 22.32
80% GA 21.65
40% Greedy1 19.05
50% SWO 18.66
70% GA 18.56
40% Greedy2 18.35
60% GA 15.57
40% SWO 14.88
30% Greedy1 13.84
50% GA 13.45
30% Greedy2 12.93
30% SWO 10.51
40% GA 10.39
20% Greedy1 8.63
20% Greedy2 7.84
30% GA 7.11
20% SWO 6.34
20% GA 5.75
10% GA 4.32
10% Greedy1 3.70
10% Greedy2 3.22
10% SWO 2.20
$\chi_r^2$ 1 1941.28
W 2 0.939


1 $\chi_r^2$ is Friedman's chi-square statistic; the critical value at p=0.001 level is 72.05
2 W is Kendall's coefficient of concordance; the critical value at p=0.001 level is 0.035


Greedy1 always appears before Greedy2 in Table 5.5, meaning that it has a higher overall rank, and therefore Stress, at each level of density. This is in agreement with the algorithm analysis. SWO at 100% density performs better than Greedy2 at 80%, the same relation holds for 90 and 70%, 70 and 60%, 60 and 50% respectively. At 40% density and below Greedy2 closes in on SWO, but always performs worse. Again, these observations are consistent with the algorithm analysis, and Figure 5.4 in particular. GA at 100% density is better than SWO at 80%; this relation also holds for 90 and 70%, 80 and 60%, 70 and 50%, 50 and 40%, 40 and 30% respectively. GA comes best at 20 and 30% density, but is the worst performer at 10%. The last observation does not agree with the algorithm analysis, which takes precedence as it is based on pairwise statistical tests with a documented confidence level.

There is a clear division of algorithmic complexity between the greedy algorithms, SWO, and GA. The time taken to execute the entire test is approximately 35 seconds, 2 hours 17minutes, and 80 days respectively% latex2html id marker 19628
\setcounter{footnote}{3}\fnsymbol{footnote}. This translates to a ratio of 0.004 to 1 to 800. The first number follows from the fact that SWO is equivalent to repeating a simplified version of Greedy2 1000 times (SWO.iterationlimit parameter), and considering that initialisation costs dominate the simple computation of Greedy1 and Greedy2. GA is almost three orders of magnitude slower than SWO because of the sheer number of Stress (3.11) evaluations needed for the population of solutions to converge.

Qualitative Evaluation

In Section 5.3.1 we have been able to systematically rank the algorithms based on their performance in terms of loss function (3.11). However, it is also important to show the extent to which proximity grids calculated by the algorithms differ visually, for if there is hardly any difference between them subjectively then the objective results are of little significance. We have selected a sample data collection from the test bed for this purpose. It is a graph with 16 vertices and 21 edges, which appears in Section B.1 of the appendices under the name network.

The dissimilarity between a pair of vertices in a graph is taken to be proportional to their graph theoretic distance - shortest path length (see Section 1.2.6) - so that adjacent vertices are the most similar of all pairs of vertices. Thus, a configuration with a low value of Stress (3.11) will have uniform edge lengths, and will preserve the graph topology well: Figure 5.12 is a drawing of the sample graph generated by KYST. However, the number of edge crossings, another important graph drawing aesthetic [Battista94], will not be minimised in general, as it might be in conflict with the previous criteria. There is some correlation between these aesthetics, since configurations with high Stress also exhibit a high degree of edge and vertex crossover, as illustrated by Figures 5.5 and 5.6.

 
Figure 5.5: $4\times 4$ proximity grids, 100% density. $\sigma(\mathit{Greedy1})=0.2173,\,\sigma(\mathit{Greedy2})=0.2257,\,\sigma(\mathit{SWO})=0.1246,\,\sigma(\mathit{GA})=0.095$


\begin{picture}(3,3)(-0.5,-0.5)%
{ \color[gray]{0.75} \multiput(-0.5,-0.5)(0,1...
...ine(1,-1){0.670017}}%
\put(2.10435,2.7913){\line(1,-2){0.7913}}%
\end{picture}

\begin{picture}(3,3)(-0.5,-0.5)%
{ \color[gray]{0.75} \multiput(-0.5,-0.5)(0,1...
...e(3,-1){2.55728}}%
\put(0.194145,2.87057){\line(3,-2){2.61171}}%
\end{picture} \begin{picture}(3,3)(-0.5,-0.5)%
{ \color[gray]{0.75} \multiput(-0.5,-0.5)(0,1...
...){\line(0,-1){0.533333}}%
\put(3,2.76667){\line(0,-2){1.53333}}%
\end{picture} \begin{picture}(3,3)(-0.5,-0.5)%
{ \color[gray]{0.75} \multiput(-0.5,-0.5)(0,1...
...7){\line(0,-1){0.533333}}%
\put(3,1.23333){\line(0,2){1.53333}}%
\end{picture}

(a)Greedy1
(b) Greedy2 (c) SWO (d) GA


 
Figure 5.6: $5\times 5$ proximity grids, 64% density. $\sigma(\mathit{Greedy1})=0.1632,\,\sigma(\mathit{Greedy2})=0.1345,\,\sigma(\mathit{SWO})=0.0566,\,\sigma(\mathit{GA})=0.0449$


\begin{picture}(4,4)(-0.5,-0.5)%
{ \color[gray]{0.75} \multiput(-0.5,-0.5)(0,1...
...4){\line(1,1){0.587521}}%
\put(3.29167,2){\line(1,0){0.416667}}%
\end{picture}

\begin{picture}(4,4)(-0.5,-0.5)%
{ \color[gray]{0.75} \multiput(-0.5,-0.5)(0,1...
...4){\line(1,1){0.587521}}%
\put(3.29167,2){\line(1,0){0.416667}}%
\end{picture} \begin{picture}(4,4)(-0.5,-0.5)%
{ \color[gray]{0.75} \multiput(-0.5,-0.5)(0,1...
...e(1,0){0.416667}}%
\put(3.20624,2.79376){\line(1,-1){0.587521}}%
\end{picture} \begin{picture}(4,4)(-0.5,-0.5)%
{ \color[gray]{0.75} \multiput(-0.5,-0.5)(0,1...
...\line(1,-1){0.587521}}%
\put(3,0.708333){\line(0,-1){0.416667}}%
\end{picture}


 
Figure 5.7: $6\times 6$ proximity grids, 44% density. $\sigma(\mathit{Greedy1})=\sigma(\mathit{Greedy2})=0.0478,\,\sigma(\mathit{SWO})=0.0371,\,\sigma(\mathit{GA})=0.0344$


\begin{picture}(5,5)(-0.5,-0.5)%
{ \color[gray]{0.75} \multiput(-0.5,-0.5)(0,1...
...{\line(1,0){0.3}}%
\put(4.24749,2.75251){\line(1,-1){0.505025}}%
\end{picture}

\begin{picture}(5,5)(-0.5,-0.5)%
{ \color[gray]{0.75} \multiput(-0.5,-0.5)(0,1...
...{\line(1,0){0.3}}%
\put(4.24749,2.75251){\line(1,-1){0.505025}}%
\end{picture} \begin{picture}(5,5)(-0.5,-0.5)%
{ \color[gray]{0.75} \multiput(-0.5,-0.5)(0,1...
...{\line(1,0){0.3}}%
\put(4.24749,2.75251){\line(1,-1){0.505025}}%
\end{picture} \begin{picture}(5,5)(-0.5,-0.5)%
{ \color[gray]{0.75} \multiput(-0.5,-0.5)(0,1...
...\line(1,0){0.3}}%
\put(4.24749,0.752513){\line(1,-1){0.505025}}%
\end{picture}


 
Figure 5.8: $7\times 7$ proximity grids, 33% density. $\sigma(\mathit{Greedy1})=\sigma(\mathit{Greedy2})=\sigma(\mathit{SWO})=0.0323,\,\sigma(\mathit{GA})=0.0298$


\begin{picture}(6,6)(-0.5,-0.5)%
{ \color[gray]{0.75} \multiput(-0.5,-0.5)(0,1...
...4){\line(1,1){0.422529}}%
\put(5.40833,3){\line(1,0){0.183333}}%
\end{picture}

\begin{picture}(6,6)(-0.5,-0.5)%
{ \color[gray]{0.75} \multiput(-0.5,-0.5)(0,1...
...4){\line(1,1){0.422529}}%
\put(5.40833,3){\line(1,0){0.183333}}%
\end{picture} \begin{picture}(6,6)(-0.5,-0.5)%
{ \color[gray]{0.75} \multiput(-0.5,-0.5)(0,1...
...4){\line(1,1){0.422529}}%
\put(5.40833,3){\line(1,0){0.183333}}%
\end{picture} \begin{picture}(6,6)(-0.5,-0.5)%
{ \color[gray]{0.75} \multiput(-0.5,-0.5)(0,1...
...){\line(1,-1){0.422529}}%
\put(5.40833,1){\line(1,0){0.183333}}%
\end{picture}


 
Figure 5.9: $8\times 8$ proximity grids, 25% density. $\sigma(\mathit{Greedy1})=\sigma(\mathit{Greedy2})=0.0339,\,\sigma(\mathit{SWO})=0.0313,\,\sigma(\mathit{GA})=0.0294$


\begin{picture}(7,7)(-0.5,-0.5)%
{ \color[gray]{0.75} \multiput(-0.5,-0.5)(0,1...
...e(1,1){0.340034}}%
\put(6.32998,3.67002){\line(1,-1){0.340034}}%
\end{picture}

\begin{picture}(7,7)(-0.5,-0.5)%
{ \color[gray]{0.75} \multiput(-0.5,-0.5)(0,1...
...e(1,1){0.340034}}%
\put(6.32998,3.67002){\line(1,-1){0.340034}}%
\end{picture} \begin{picture}(7,7)(-0.5,-0.5)%
{ \color[gray]{0.75} \multiput(-0.5,-0.5)(0,1...
...e(1,1){0.340034}}%
\put(6.32998,3.67002){\line(1,-1){0.340034}}%
\end{picture} \begin{picture}(7,7)(-0.5,-0.5)%
{ \color[gray]{0.75} \multiput(-0.5,-0.5)(0,1...
...e(0,-2){1.06667}}%
\put(6.32998,2.67002){\line(1,-1){0.340034}}%
\end{picture}


 
Figure 5.10: $9\times 9$ proximity grids, 20% density. $\sigma(\mathit{Greedy1})=\sigma(\mathit{Greedy2})=\sigma(\mathit{SWO})=0.0272,\,\sigma(\mathit{GA})=0.0261$


\begin{picture}(8,8)(-0.5,-0.5)%
{ \color[gray]{0.75} \multiput(-0.5,-0.5)(0,1...
...e(1,1){0.257538}}%
\put(7.37123,3.62877){\line(1,-1){0.257538}}%
\end{picture}

\begin{picture}(8,8)(-0.5,-0.5)%
{ \color[gray]{0.75} \multiput(-0.5,-0.5)(0,1...
...e(1,1){0.257538}}%
\put(7.37123,3.62877){\line(1,-1){0.257538}}%
\end{picture} \begin{picture}(8,8)(-0.5,-0.5)%
{ \color[gray]{0.75} \multiput(-0.5,-0.5)(0,1...
...e(1,1){0.257538}}%
\put(7.37123,3.62877){\line(1,-1){0.257538}}%
\end{picture} \begin{picture}(8,8)(-0.5,-0.5)%
{ \color[gray]{0.75} \multiput(-0.5,-0.5)(0,1...
...ne(1,1){0.257538}}%
\put(6.46957,5.76521){\line(2,-1){1.06085}}%
\end{picture}


 
Figure 5.11: $13\times 13$ proximity grids, 9% density. $\sigma(\mathit{Greedy1})=\sigma(\mathit{Greedy2})=\sigma(\mathit{SWO})=0.0245,\,\sigma(\mathit{GA})=0.0229$


\begin{picture}(12,12)(-0.5,-0.5)%
{ \color[gray]{0.75} \multiput(-0.5,-0.5)(0...
...e(2,2){0.927555}}%
\put(10.6783,5.66086){\line(2,-1){0.643452}}%
\end{picture}

\begin{picture}(12,12)(-0.5,-0.5)%
{ \color[gray]{0.75} \multiput(-0.5,-0.5)(0...
...e(2,2){0.927555}}%
\put(10.6783,5.66086){\line(2,-1){0.643452}}%
\end{picture} \begin{picture}(12,12)(-0.5,-0.5)%
{ \color[gray]{0.75} \multiput(-0.5,-0.5)(0...
...e(2,2){0.927555}}%
\put(10.6783,5.66086){\line(2,-1){0.643452}}%
\end{picture} \begin{picture}(12,12)(-0.5,-0.5)%
{ \color[gray]{0.75} \multiput(-0.5,-0.5)(0...
...ine(0,3){1.48333}}%
\put(10.6783,9.33914){\line(2,1){0.643452}}%
\end{picture}


 
Figure 5.12: Continuous MDS configuration. $\sigma=0.0209$

\begin{picture}(999,999)(-0.5,-0.5)%
\put(24,800){\makebox(0,0){\textsf{a}}}\p...
...128){68.1221}}%
\put(871.857,501.429){\line(178,-100){76.2857}}%
\end{picture}



 
Figure 5.13: $13\times 13$ proximity grid for the GA algorithm rotated with Procrustes Analysis to fit the MDS configuration

\begin{picture}(999,999)(-0.5,-0.5)%
\put(62,792){\makebox(0,0){\textsf{a}}}\p...
...,152){76.1672}}%
\put(892.749,488.867){\line(155,-57){45.5025}}%
\end{picture}



As the basis for our qualitative evaluation we are using the experimental setup of Section 5.3.1. Since the graph is small, there are only 7 distinct grid lengths at the 10% density increments: 4, 5, 6, 7, 8, 9, and 13; Figures 5.5-5.11 present the respective configurations for all algorithms. Visually Greedy1 and Greedy2 are very messy, with a lot of crossover and a poor preservation of topology for high densities. For sparse grids these algorithms produce similar or identical results to SWO. Generally, GA and SWO configurations are comparable, but the advantage goes to the former for spreading graph drawings across the grid, and hence making better use of available space. The continuous MDS configuration in Figure 5.12 is rotated to its principal axes (see Section 4.3), and thus elongated in the horizontal direction. Since SWO and the greedy algorithms use this configuration as a starting point, they are unable to significantly alter its shape. The grid drawings at the 10% density level, including the GA one when factoring out the rotation (see Figure 5.13), very closely resemble the continuous configuration, with small differences in Stress between all of them.


 
Table 5.6: Assessment of the sample graph


grid length Greedy1 Greedy2 SWO GA
4 23 30 4 4
5 16 13 2 4
6 4 4 3 1
7 2 2 2 2
8 3 3 2 2
9 2 2 2 2
13 2 2 2 2
rank total1 16 16 8 8

(a) number of edge and vertex crossovers


  grid length Greedy1 Greedy2 SWO GA
4 0.2173 0.2257 0.1246 0.0950
5 0.1632 0.1345 0.0566 0.0449
6 0.0478 0.0478 0.0371 0.0344
7 0.0323 0.0323 0.0323 0.0298
8 0.0339 0.0339 0.0313 0.0294
9 0.0272 0.0272 0.0272 0.0261
13 0.0245 0.0245 0.0245 0.0229
rank total 19 19 14 7

(b) Stress



1 the rank total for a given algorithm is the sum of its relative ranking over all grid lengths; if two or more algorithms are tied at a given grid length they all share the lowest available rank, and the next available rank is incremented by the number of tied algorithms less one, i.e. as if the tie had not occurred
The rank of a value is denoted in greyscale

Table 5.6(a) conveniently summarises the number of edge and vertex crossovers for all combinations of grid length and algorithm. The performance of both SWO and GA is very good, with very few crosses even for dense grids. They achieve an identical rank total, calculated by summing their respective ranking over all grid lengths. Greedy1 and Greedy2 also have the same rank total between them, however, they do not cope with dense grids well, and produce a large number of crosses. Table 5.6(b) summarises Stress for all configurations, which is indicative of how well graph topology is preserved. GA is consistently the best, closely followed by SWO. The results for Greedy1 and Greedy2 are similar, and they achieve the same rank total. For dense grids there is an enormous difference between these two pairs of algorithms, which gradually reduces, and diminishes completely for sparse grids. These quantitative results are in agreement with the impressions reported in the previous paragraph.

Discussion


 
Figure 5.14: Histogram of Stress minima calculated by KYST for all collections in the test bed


\epsfig{file=images/grid-histogram.eps, width=8.25cm}



The data collections contained in the test bed are varied in type, size, and complexity. Stress (3.11) is normalised for the size of a collection (see Section 3.2.3), and its value at a minimum will depend on the inherent dimensionality or complexity of data. Figure 5.14 is a histogram of the test results for KYST, with each measurement rounded to the nearest bucket. A high value of Stress is indicative of high dimensionality of data, and suggests that a two-dimensional representation is not very faithful. Such data is seldom found in practice, and the small number of data collections with Stress over 0.1 reflects this. Other Stress values are fairly evenly distributed, which indicates the test bed to be a representative sample from a population of small-to-medium sized data collections. The size of the sample is large enough to warrant the use of statistical inference, and allows the results to be generalised to the whole population.

Both objective and subjective evaluations identify a significant difference between the greedy (Greedy1, Greedy2) and iterative (SWO, GA) algorithms, and confirm an intuitive expectation that the more computational effort is put in the better the results are. Statistical inference demonstrates Greedy2 to be the better of the greedy algorithms, which again is to be expected, as an exact rather than an approximate method for finding the closest empty cell is used. GA generates the best proximity grids in terms of both objective and subjective quality, but is closely matched by SWO for sparser grids at a fraction of the computational cost.

Dense proximity grids provide visualizations with the highest information density, since there are none or very few empty areas. At the same time stringent constraints are imposed on the preservation of object proximity, and the use of a global optimisation heuristic, for example GA, is essential to find a good allocation of objects to cells. Grids of medium density (20-70%) achieve a useful amount of separation, and allow fairly large object representations with no overlap: see Figure 5.1(b) for an example of a 70% proximity grid. The increase in Stress (3.11) over a continuous MDS configuration is moderate, and a hybrid technique like SWO is capable of matching global optimisation for quality. A sparse grid is not a very large departure from the corresponding continuous configuration, and a greedy approach is capable of transforming the latter to the former. Alternatively, a hybrid technique can be run for a small number of iterations, to overcome any potential misallocation. Sparse grids only give a limited amount of separation, but nonetheless can be useful for configurations with complete overlap or clumps of points.

Obviously, separation between icons can be increased by enlarging the canvas area, while keeping the icon size fixed. However, GA will use available space on a given canvas most efficiently, i.e. require partition into the smallest number of cells. SWO will need a larger grid size to provide an equally accurate representation of proximity relationships; the sparsest grid will be attributed to the greedy algorithm, in general. Thus, when populating the canvas with icons, GA will afford the largest icon size without overlap. In a particular application, the information density provided by each method has to be weighed against its computational requirement, the two criteria being in complete conflict.

Related Work

A Self-Organising Map (SOM) [Kohonen89] is a neural network that learns a mapping from an n-dimensional space to an m-dimensional lattice, which preserves proximity relations. The network is comprised of two fully connected layers: n continuous-valued inputs, and a lattice, usually two-dimensional, of output units. The training consists of presenting successive input tuples, and updating the winner neuron, i.e. the output unit with the weight vector closest to the current input, as well as its neighbours. Once the training has been completed, the mapping can be read out by enumerating every input, and noting the lattice coordinates of the winner neuron. With a sufficient number of output units for a given input set, this algorithm can generate a one-to-one mapping between them, i.e. a proximity grid. However, this method will not work if identical or nearly identical tuples are to be represented separately. A SOM can only process a data table with quantitative attributes (see Section 1.2.1), and so is not as widely applicable as the algorithms presented in this chapter.
next up previous
Next: Case Studies Up: Proximity Visualization of Abstract Data Previous: Large Scale Visualization

© 2001 Wojciech Basalaj