Date of Award

12-2024

Document Type

Campus Access Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computational Sciences

First Advisor

Nurit Haspel

Second Advisor

David Degras

Third Advisor

Changmeng Cai, Filip Jagodzinski

Abstract

My PhD research consists of two parts. In the first part we developed and published research on Protein InDels and in the second part we developed and published work on integration of omics datasets. The work is described in the same order. Part 1: The effects of amino acid insertions and deletions (InDels) remain a rather under-explored area of structural biology. These variations oftentimes are the cause of numerous disease phenotypes. In spite of this, research to study InDels and their structural significance remains limited, primarily due to a lack of experimental information as InDels are more difficult to model as compared to substitutions. Therefore, there is a dire need for methods that model InDels in silico due to difficulty and cost of InDel modeling using wet-lab experiments. The first half of the dissertation attempts to fill the gap by modelling InDels computationally. This is followed by comparison of computationally generated InDels with their wet-lab counterparts. The results demonstrate the method's ability to generate InDel mutants that are structurally similar to PDB InDel mutants. Part 2: Advancements in the field of next generation sequencing have lead to the generation of vast volume of data, with the challenge often being how to combine and reconcile results from different omics studies such as epigenome and transcriptome. In the second half of the thesis we combined gene expression data with DNA methylation to elucidate the relationship between different types of assay results. These data are integrated using a statistical technique called sparse canonical correlation analysis and this is followed by elucidating subtypes in the latent space. Our approach is validated using multiple cancer datasets and achieved better performance in terms of survival analysis than subtypes identified by both single and multi-omics studies. This paves way for enhanced categorization of cancer data which is crucial to precision medicine.

Comments

Free and open access to this Campus Access Thesis is made available to the UMass Boston community by ScholarWorks at UMass Boston. Those not on campus and those without a UMass Boston campus username and password may gain access to this thesis through Interlibrary Loan. If you have a UMass Boston campus username and password and would like to download this work from off-campus, click on the

Share

COinS