Date of Award


Document Type

Open Access Dissertation

Degree Name

Doctor of Philosophy (PhD)


Computer Science

First Advisor

Nurit Haspel

Second Advisor

Kourosh Zarringhalam

Third Advisor

Dan Simovici


Every organism contains a few hundred to thousands of proteins. A protein is made of a sequence of molecular building blocks named amino acids. Amino acids will be referred to as residues. Every protein performs one or more functions in the cell. In order for a protein to do its job, it requires to bind properly to other partner proteins. Many genetic diseases such as cancer are caused by mutations (changes) of specific residues which cause disturbances in the functions of those proteins.

The problem of prediction of protein binding site is a crucial topic in computational biology. A protein is usually made up of 50 to a few thousand residues. A contact site can occur within a protein or with other proteins. By having a robust and accurate model for identifying residues that are involved in the binding site, scientists can investigate the impact of critical mutations and residues that can cause genetic diseases.

The main focus of this thesis is to propose a machine learning model for predicting the binding site between two proteins. By extracting structural information from a protein, we can have additional knowledge of binding sites. This structural information can be converted into a penalty matrix for a graphical model to be learned from the protein sequence. The second part of this thesis is mostly focused on motion planning algorithms for proteins and simulation of the protein pathway changes using a Monte Carlo based method. Later, by applying a novel geometry based scoring function, we cluster the intermediate conformations into corresponding subsets that may indicate interesting intermediate states.