Date of Award

6-1-2015

Document Type

Campus Access Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Science

First Advisor

Dan A. Simovici

Second Advisor

Catalin Zara

Third Advisor

Wei Ding

Abstract

Various factors influence data analysis complexity and performance, including the size of the data, its dimensionality, as well as the data distribution and the relations between attributes. In this thesis, the effects of ultrametricity of dissimilarity spaces are studied in the context of classification and clustering accuracy. A new measure of ultrametricity is introduced, along with a weak variant for reduced computation. Furthermore, it is shown that clustering quality can be improved through dissimilarity transformations, which provably impact ultrametricity. Additionally, models based on projections, including an approximate distance-preserving embedding based on hashing and a projection onto parallel coordinates, are used to enhance the performance of similarity search and anomaly detection in large high-dimensional datasets. Finally, information-theoretic properties of the data are exploited to find patterns of interest in images.

Comments

Free and open access to this Campus Access Dissertation is made available to the UMass Boston community by ScholarWorks at UMass Boston. Those not on campus and those without a UMass Boston campus username and password may gain access to this dissertation through resources like Proquest Dissertations & Theses Global or through Interlibrary Loan. If you have a UMass Boston campus username and password and would like to download this work from off-campus, click on the "Off-Campus UMass Boston Users" link above.

Share

COinS