Date of Award

5-2020

Document Type

Campus Access Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Science

First Advisor

Nurit Haspel

Second Advisor

Wei Ding

Third Advisor

Kourosh Zarringhalam

Abstract

Over the past 20 years, data has increased on a large scale in various fields and researchers are facing new challenges to work with this huge volume of data. My passion for high-performance computing leads me to span my research across different fields in computer science. In this dissertation, we introduced high-performance solutions to problems in machine learning and bioinformatics like crater detection on Mars images and structural variation detection on DNA/RNA genome data. We first show different parallel heterogeneous implementations of the Berlekamp-Massey Algorithm which plays an important role in error correction of NAND flash memories which can be found almost everywhere from flash drives to large scale enterprise servers. Then, we introduce a semi-supervised learning technique to learn a novel weighted-distance metric named WDM which learns labeled information from the training set and identifies groups among the samples from the test set to form a metric space. The effectiveness of this novel metric evaluated intensively on classification tasks and on both CPU and GPU parallel platforms. This dissertation continues with the introduction of a bidirectional context-based deep learning framework that learns bidirectional context-based features from both craters and its surrounding features using deep convolutional classification and segmentation models to identify efficiently sub-kilometer craters in high-resolution panchromatic images. Finally, we designed two computational methods to detect cancer inter-chromosomal rearrangements and gene fusions on DNA and RNA sequencing data, respectively. These methods split the candidate reads into windows and then represent it using the binary format to reduce the huge dimension of the search space and then search the location of the breakpoint in the reference genome using Jaccard distance. This dissertation, in its entirety, offers efficient solutions to a few problems in coding theory, machine learning, and bioinformatics. In the empirical study, our approaches achieved the state of the art performance using multiple real-world datasets.

Comments

Free and open access to this Campus Access Dissertation is made available to the UMass Boston community by ScholarWorks at UMass Boston. Those not on campus and those without a UMass Boston campus username and password may gain access to this dissertation through resources like Proquest Dissertations & Theses Global or through Interlibrary Loan. If you have a UMass Boston campus username and password and would like to download this work from off-campus, click on the "Off-Campus UMass Boston Users" link above.

Share

COinS