Date of Award


Document Type

Open Access Dissertation

Degree Name

Doctor of Philosophy (PhD)


Computer Science

First Advisor

Wei Ding

Second Advisor

Ping Chen

Third Advisor

Nurit Haspel


In the era of massive data, one of the most promising research fields involves the analysis of large-scale Spatio-temporal databases to discover exciting and previously unknown but potentially useful patterns from data collected over time and space. A modeling process in this domain must take temporal and spatial correlations into account, but with the dimensionality of the time and space measurements increasing, the number of elements potentially contributing to a target sharply grows, making the target's long-term behavior highly complex, chaotic, highly dynamic, and hard to predict. Therefore, two different considerations are taken into account in this work: one is about how to identify the most relevant and meaningful features from the original Spatio-temporal feature space; the other is about how to model complex space-time dynamics with sensitive dependence on initial and boundary conditions.

First, identifying strongly related features and removing the irrelevant or less important features with respect to a target feature from large-scale Spatio-temporal data sets is a critical and challenging issue in many fields, including the evolutionary history of crime hot spots, uncovering weather patterns, predicting floodings, earthquakes, and hurricanes, and determining global warming trends. The optimal sub-feature-set that contains all the valuable information is called the Markov Boundary. Unfortunately, the existing feature selection methods often focus on identifying a single Markov Boundary when real-world data could have many feature subsets that are equally good boundaries. In our work, we design a new multiple-Markov-boundary-based predictive model, Galaxy, to identify the precursors to heavy precipitation event clusters and predict heavy rainfall with a long lead time. We applied Galaxy to an extremely high-dimensional meteorological data set and finally determined 15 Markov boundaries related to heavy rainfall events in the Des Moines River Basin in Iowa. Our model identified the cold surges along the coast of Asia as an essential precursor to the surface weather over the United States, a finding which was later corroborated by climate experts.

Second, chaotic behavior exists in many nonlinear Spatio-temporal systems, such as climate dynamics, weather prediction, and the space-time dynamics of virus spread. A reliable solution for these systems must handle their complex space-time dynamics and sensitive dependence on initial and boundary conditions. Deep neural networks' hierarchical feature learning capabilities in both spatial and temporal domains are helpful for nonlinear Spatio-temporal dynamics modeling. However, sensitive dependence on initial and boundary conditions is still challenging for theoretical research and many critical applications. This study proposes a new recurrent architecture, error trajectory tracing, and accompanying training regime, Horizon Forcing, for prediction in chaotic systems.

These methods have been validated on real-world Spatio-temporal data sets, including one meteorological dataset, three classics, chaotic systems, and four real-world time series prediction tasks with chaotic characteristics. Experiments' results show that each proposed model could outperform the performance of current baseline approaches.