Multidimensional scaling is effective at visualizing abstract data collections; however, it is computationally intensive, and becomes very time consuming for large collections of objects. Cluster analysis can identify the most important objects, so that more time can be devoted to representing them, over less structurally significant objects. An incremental multidimensional scaling procedure is proposed that has lower time and space requirements, achieved by gradually introducing objects into a visualization based on their importance. As a result, it becomes practical to visualize data collections two orders of magnitude larger than with standard multidimensional scaling. The validity of this concept has been confirmed by means of a rigorous evaluation.
Proximity Grid is a novel visualization technique, especially suited to the design of user interfaces, as it can provide a display with high information density. Icons representing objects from a data collection are arranged in a grid, and thus can occupy their respective cells completely, without overlap. Rather than assigning icons and the corresponding objects arbitrarily to grid cells, they are positioned so that proximity relationships between objects are preserved as well as possible. Unlike multidimensional scaling, this is a combinatorial problem, and different heuristics are needed for solving it. A number of algorithms of varied complexity have been presented, and the trade-off between their effectiveness and responsiveness established.
The evaluation framework, used in characterising algorithms for generating each type of proximity visualization, is innovative in its own right. A number of real world data collections have been selected to serve as input for each algorithm under test. The objective quality of the resulting visualizations was recorded, so that a ranking of the algorithms could be established with statistical analysis. It would have been possible to present these series of measurements in the form of charts, and draw conclusions based on visual inspection, as is customary in such empirical studies. However, no matter how useful visualization is, it cannot be applied blindly, and in this case the use of statistical inference has yielded a more concise and objective decision.
This dissertation is concerned with visualization of abstract data, and is rich in examples as a consequence. A sample output of every algorithm has been provided, so that the extent and character of the differences between algorithms can be easily ascertained. Such a qualitative evaluation naturally complements the statistical analysis. Also, a number of case studies have been presented that illustrate the broad range of applications of proximity visualization.
The overall thesis is that proximity visualization is more generic than other information visualization techniques, as any conceivable data type can be represented. Even heterogeneous data can be accommodated by means of the general dissimilarity coefficient. For data with strong temporal or spatial semantics a specialised technique is more natural, and likely to provide a superior visual representation. However, an abstract collection of objects is best described and visualized in terms of object proximity.
© 2001 Wojciech Basalaj