Date of Award
Campus Access Dissertation
Doctor of Philosophy (PhD)
The large scale of biological data becoming available in recent years requires advanced computational methods capable of analyzing these complex, high-dimensional datasets to investigate biological processes and lead to new discoveries. There has been an increase in the utilization of machine learning in biology and proteomics to build predictive models of the underlying biological processes.
This dissertation provides machine learning solutions for three problems related to proteins and their structures. The first problem is to investigate how mutations in a protein sequence can affect its structure stability by using machine learning methods to predict the free energy changes comparing the mutated and not mutated (wild type) proteins. In this project, we compare three machine learning models for predicting the mutation effect. The second problem is focused on exploring protein dynamics and conformational changes. We employ a hybrid algorithm that combines Monte-Carlo sampling and a robotics-based method called RRT* to find conformational pathways using rigidity analysis. We also use a topological data analysis algorithm called mapper to find the intermediate conformations by clustering the conformations that are generated most by our algorithm. The last problem is about classifying protein families. In this part, we propose a method comprising two steps of dimensionality reduction and classification. We present a variational autoencoder for the first step and a convolutional neural network classifier for the second step.
Dehghanpoor, Ramin, "Machine Learning Based Algorithms to Investigate Protein Structure" (2022). Graduate Doctoral Dissertations. 797.