Date of Award


Document Type

Campus Access Dissertation

Degree Name

Doctor of Philosophy (PhD)


Computer Science

First Advisor

Nurit Haspel

Second Advisor

Marc Pomplun

Third Advisor

Wei Ding


The field of machine learning, which aims to develop computer algorithms that improve with experience, has widely assisted scientists in understanding of a vast and diverse array of biological phenomena in recent years. Through the analysis of large and complex datasets by efficient and intelligent algorithms, huge advancements have been made in understanding the biological processes taking place in the cell and the underlying causes of many diseases and abnormalities. Consequently the development of new drugs and treatments have become possible.

This thesis presents machine learning solutions for three biological problems. The first problem is focused on building models to predict the structural similarity of a docked protein complex to its native form. Using a set of physico-chemical features and evolutionary conservation, these models not only rank candidate complexes relative to each other, but also outperform the built-in scoring functions of the docking programs used to generate the complexes. The second problem studies how point mutation can impact the structure and consequently the stability of a protein by employing machine learning methods to predict the change in the free energy of the protein. This approach, which has the potential of providing insight on the effects of multiple mutations of amino acids besides single mutations, does not require costly calculations of energy functions that rely on atomic-level statistical mechanics and molecular energetics. In the third part of this work, a method to identify reads from paired-end sequencing data containing inter-chromosomal translocation or insertion breakpoints is proposed. The huge search space in this problem is examined by applying a distance-preserving embedding algorithm to solve the approximate nearest neighbor problem. Experimental validation and comparison with similar existing methods shows the advantages of this approach in detecting breakpoints efficiently and accurately.


Free and open access to this Campus Access Dissertation is made available to the UMass Boston community by ScholarWorks at UMass Boston. Those not on campus and those without a UMass Boston campus username and password may gain access to this dissertation through resources like Proquest Dissertations & Theses Global or through Interlibrary Loan. If you have a UMass Boston campus username and password and would like to download this work from off-campus, click on the "Off-Campus UMass Boston Users" link above.