Date of Award


Document Type

Campus Access Dissertation

Degree Name

Doctor of Philosophy (PhD)


Computer Science

First Advisor

Marc Pomplun

Second Advisor

Nurit Haspel

Third Advisor

Xiaohui Liang


Machine learning is an important multidisciplinary field of research, which aims to construct models that learn from data and make predictions based on it. Such methods have been widely used in understanding and analyzing human behavioral and physical attributes. In the first part of this thesis, two dimensions of implementing machine learning algorithms for solving two important real world problems are discussed. The first problem focuses on modeling human physical characteristics (e.g., walking) from accelerometer data measured by smartphones. We build highly accurate models that can recognize human daily activities and can identify users based on their gait characteristics. The second problem is modeling of human eye-movement behavior, specifically in order to identify different individuals during reading activity. The highly specific characteristics of human cognition and behavior during the reading process reflected in human eye-movement features make them very suitable for user identification. Our approach dramatically outperforms previous methods, making it possible to build eye-movement biometric systems for user identification and personalized interfaces.

The second part of this thesis studies deep learning solutions for three visual scene perception and object recognition problems. The goal is to investigate to which extent deep convolutional neural networks resemble the human visual system for scene perception and object recognition in three problems: (1) classification of scenes based on their global properties, (2) deploying a multi-resolution technique for object recognition, and (3) evaluating the influence of the high-level context of scene grammar on object and scene recognition. The first problem proposes to derive global properties of a scene as high-level scene descriptions from deep features of convolutional neural networks in scene classification tasks. The second problem shows that fine-tuning the Faster-RCNN (the state-of-the-art object recognition network) to multi-resolution data inspired by the human multi-resolution visual system improves the network performance and robustness over a range of spatial frequencies. Finally, the third problem studies the effects of violating the high level scene syntactic and semantic rules on human eye-movement behavior and deep neural scene and object recognition networks.


Free and open access to this Campus Access Dissertation is made available to the UMass Boston community by ScholarWorks at UMass Boston. Those not on campus and those without a UMass Boston campus username and password may gain access to this dissertation through resources like Proquest Dissertations & Theses Global or through Interlibrary Loan. If you have a UMass Boston campus username and password and would like to download this work from off-campus, click on the "Off-Campus UMass Boston Users" link above.