Date of Award


Document Type

Campus Access Thesis

Degree Name

Master of Science (MS)



First Advisor

Todd Riley

Second Advisor

Jill Macoska

Third Advisor

Linda Huang


Understanding transcription factor (TF) binding preferences is vital to understanding gene regulation. Many computational methods have been developed to model these preferences from experimental binding data, but the most common methods have a serious deficiency in that they cannot model insertions and deletions (indels) in a binding site. Profile Hidden Markov models (pHMMs) are probabilistic models that, unlike other methods, can include indels. PHMMs can be used on their own to model binding, or incorporated into more complex HMM topologies that can accommodate variable-length spacers. We performed computational analyses of the binding preferences of two TFs using HMMs. First, we used six different HMM topologies to model the binding motif of Gcn4 and compared their accuracies. From this analysis, we found complex dependencies between the variable-length spacer and the half-sites, which cannot be modeled by the HMM topologies currently in use. We also developed a new methodology for comparing different HMM topologies for a TF in order to choose the optimal one that is accurate, biologically sound, and contains as few parameters as possible. In our second application, we used pHMMs to analyze how mutations in p53 that affect its ability to dimerize (cooperativity) may also affect its binding preferences. We used pHMMs to characterize the binding motif of wild-type p53, as well as variants that demonstrate lower cooperativity than wild-type. We also analyzed microarray data for each variant, and observed differential gene expression associated with cooperativity. However, our models are very similar for low and high-cooperativity p53, which suggests that differences in binding specificity are not responsible for the differential gene expression between the variants. We provide strong evidence that, instead, the differences in gene expression are caused by an overall reduction in binding affinity in those variants. Importantly, we show that it is critical to choose a model that accommodates the unique binding characteristics of a TF when modeling its binding preferences.


Free and open access to this Campus Access Thesis is made available to the UMass Boston community by ScholarWorks at UMass Boston. Those not on campus and those without a UMass Boston campus username and password may gain access to this thesis through resources like Proquest Dissertations & Theses Global or through Interlibrary Loan. If you have a UMass Boston campus username and password and would like to download this work from off-campus, click on the "Off-Campus UMass Boston Users" link above.