A standard method for modelling proximity relations is Multidimensional Scaling (MDS) analysis. The result is usually a two- or three-dimensional configuration of points - each representing a single element from a collection, with inter-point distances approximating the corresponding proximities. The quality of this approximation can be expressed as a loss function, and the optimal arrangement can be found by minimising it numerically - a procedure known as least-squares metric MDS. This work presents a number of algorithmic instances of this problem, using established function optimisation heuristics: Newton-Raphson, Tabu Search, Genetic Algorithm, Iterative Majorization, and Simulated Annealing. Their effectiveness at minimising the loss function is measured for a representative sample of data collections, and the relative ranking is established. The popular classical scaling method serves as a benchmark for this study.
The computational cost of traditional MDS makes it unsuitable for visualizing a large data collection. Incremental multidimensional scaling solves this problem by considering only a carefully chosen subset of all pairwise proximities. Elements that make up cluster diameters at a certain level of the single link cluster hierarchy are identified, and are subject to standard MDS, in order to establish the overall shape of the configuration. The remaining elements are positioned independently of one another with respect to this skeleton configuration. For very large collections the skeleton configuration can itself be built up incrementally. The incremental method is analysed for the compromise between solution quality and the proportion of proximities used, and compared to Principal Components Analysis on a number of large database tables.
In some applications it is convenient to represent individual objects by compact icons of fixed size, for example the use of thumbnails when visualizing a set of images. Because the MDS analysis only takes the position of icons into account, and not their size, its direct use for visualization may lead to partial or complete overlap of icons. Proximity Grid - an analogue of MDS in a discrete domain - is proposed to overcome this deficiency. Each element of an abstract data collection is represented within a single cell of the grid, and thus considerable detail can be shown without overlap. The proximity relationships are preserved by clustering similar elements in the grid, and keeping dissimilar ones apart. Algorithms for generating such an arrangement are presented, and compared in terms of output quality to one another, as well as standard MDS.
© 2001 Wojciech Basalaj