Date of Award
Campus Access Dissertation
Doctor of Philosophy (PhD)
Dan A. Simovici
Various factors influence data analysis complexity and performance, including the size of the data, its dimensionality, as well as the data distribution and the relations between attributes. In this thesis, the effects of ultrametricity of dissimilarity spaces are studied in the context of classification and clustering accuracy. A new measure of ultrametricity is introduced, along with a weak variant for reduced computation. Furthermore, it is shown that clustering quality can be improved through dissimilarity transformations, which provably impact ultrametricity. Additionally, models based on projections, including an approximate distance-preserving embedding based on hashing and a projection onto parallel coordinates, are used to enhance the performance of similarity search and anomaly detection in large high-dimensional datasets. Finally, information-theoretic properties of the data are exploited to find patterns of interest in images.
Vetro, Rosanne, "Utilizing Ultrametric Properties, Projections and Entropy in Data Mining" (2015). Graduate Doctoral Dissertations. 223.