Date of Award

5-31-2017

Document Type

Campus Access Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Science

First Advisor

Duc A. Tran

Second Advisor

Dan Simovici

Third Advisor

Gabriel Ghinita

Abstract

For a storage system to keep pace with increasing amounts of data, a natural solution is to deploy more servers to expand storage capacity and mitigate server bottleneck. Due to the large quantity, these servers need to be placed at geographically distributed locations, causing inevitable communication costs. Subsequently, an important design problem is how to best partition the data across the servers. To minimize cross-server traffic, the mainstream approach is data-centric, where data with similar content are assigned to the same server. It is however difficult to effectively quantify content similarity in cases where the content has many attributes or belongs to incomparable categories. In contrast, this dissertation advocates a query-centric storage approach where the only input information is queries and the data partitioner is aimed to assign data often queried together on the same server. This approach avoids the assumption on the existence of a content similarity measure, thus applicable to both similarity search and non-similarity search. Following this approach, if all queries are given in advance, an optimal partitioner can be found by solving a classic hypergraph partitioning problem. The focus this dissertation is the online setting: as queries arrive in a stream manner, how to revise the current partition incrementally to obtain the best partition for future queries. Contributions are (1) a formal formulation of this unexplored problem as a multi-objection optimization problem, (2) an evolutionary algorithm framework to explore Pareto-optimal partitioning solutions, and (3) an investigation on greedy online algorithms. Two case studies are considered: query-centric partitioning of an online social network and query-centric partitioning of a general distributed network. The findings are substantiated with evaluations using real-world datasets.

Comments

Free and open access to this Campus Access Dissertation is made available to the UMass Boston community by ScholarWorks at UMass Boston. Those not on campus and those without a UMass Boston campus username and password may gain access to this dissertation through resources like Proquest Dissertations & Theses Global or through Interlibrary Loan. If you have a UMass Boston campus username and password and would like to download this work from off-campus, click on the "Off-Campus UMass Boston Users" link above.

Share

COinS