next up previous
Next: Bibliography Up: Proximity Visualization of Abstract Data Previous: Partial Derivatives of Energy

Subsections

Test Bed

   
Small Data Sets

A small data set can either be a graph (g), image collection (i), dissimilarity matrix (m), or a data table (t).


Table B.1: Small data collections


name size type source
aud 18 m Mietta E. Lennes, Department of Phonetics, University of Helsinki
bridges 108 t UCI Machine Learning Repository (MLR) [Blake98]
crcars 406 t D. Donoho [Donoho83]
cars 38 t H. V. Henderson [Henderso81]
gd98c 62 g Graph Drawing '98 Contest
corel-244 100 i Kerry Rodden, Computer Laboratory, University of Cambridge
corel-385 100 i Kerry Rodden
cpu-performance 209 t MLR
dermatology 366 t MLR
detroit 13 t StatLib Datasets Archive
echocardiogram 131 t MLR
ecoli 107 g Karen Eilbeck, Biochemistry Division, University of Manchester
FFT 18 m Mietta E. Lennes
flags 194 t MLR
gd99c 105 g Graph Drawing '99 Contest
gene2sc 112 t MLR
glass 214 t MLR
GPA1component 27 g Karen Eilbeck
group 16 t Opera Group, Computer Laboratory, University of Cambridge
haberman 306 t MLR
heart 270 t MLR
house-votes 435 t MLR
housing 506 t MLR
humandevel 130 t MLR
image 210 t MLR
imports-85 205 t MLR
ionosphere 351 t MLR
iris 150 t MLR
Kellog 23 t T. Cox [Cox94]
letters 26 t F. Labelle's Dimensionality Reduction page
letters-back 24 t F. Labelle
liver-disorders 345 t MLR
misc 206 t MLR
network 16 g J. B. Kruskal [Kruskal78a], Figure 3: ``Input for Corn Biomass Network''
odmg-schema 22 g The Object Database Standard version 2.0 [Cattell97]
o-ring-erosion 23 t MLR
places 329 t StatLib
planets 9 t F. Labelle
post-operative 90 t MLR
protein 25 t Handbook of Small Data Sets [Hand94]
query 20 t MLR
retention 270 t U.S. Department of Agriculture
servo 167 t MLR
shuttle 15 t MLR
Skulls 40 t T. Cox
solar-flare 323 t MLR
soybean 307 t MLR
tae 151 t MLR
usa-sales 26 t American Automobile Manufacturers' Association (AAMA)
usa-share 26 t AAMA
wine 178 t MLR
Yoghurt 12 t T. Cox
zoo 101 t MLR
minimum 9    
maximum 506    
mean 145    
median 107    

   
Medium Data Sets

All medium-sized data sets in the test bed are tabular, and come from the UCI Machine Learning Repository[Blake98].


Table B.2: Medium-sized data tables


name size
australian 690
breast-cancer 699
cloud-1 1024
cloud-2 1024
cmc 1473
credit-screening 690
german 1000
pima-indians-diabetes 768
tic-tac-toe 958
vehicle 846
vowel 990
water-treatment 527
yeast 1484
minimum 527
maximum 1484
mean 936
median 958

   
Large Data Sets

All large data sets in the test bed are tabular. The number of attributes (columns) is also given, as it is relevant for Principal Components Analysis (see Section 4.3), which was applied to them.


Table B.3: Large data tables


name rows cols source
abalone 4177 8 UCI Machine Learning Repository (MLR) [Blake98]
letter 20000 16 MLR
mfeat 2000 6 MLR
pageblock 5473 10 MLR
sat 6435 36 MLR
segment 2310 18 MLR
nbody-1 15000 12 Jarrod Hurley, Institute of Astronomy, University of Cambridge
nbody-2 14898 12 Jarrod Hurley
shuttle 58000 9 MLR
spambase 4601 57 MLR
minimum 2000    
maximum 58000    
mean 13289    
median 5954    


next up previous
Next: Bibliography Up: Proximity Visualization of Abstract Data Previous: Partial Derivatives of Energy

© 2001 Wojciech Basalaj