Date of Award

5-2019

Document Type

Campus Access Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Science

First Advisor

Bo Sheng

Second Advisor

Ming Ouyang

Third Advisor

Honggang Zhang

Abstract

Nowadays large volumes of data are being generated continuously from different resources in various formats. To efficiently analyze the data, Big Data Process- ing Platforms have been developed to replace the traditional systems like SQL. The new platforms are usually built upon a cluster of machines, with one machine working as the master and the others working as slaves. The master node, as a central manager, arranges the data to be stored on each slave and coordinates the resources on slaves to process the data in a distributed manner. In such a complex and distributed system, resource management is extremely critical for the system performance and security is another big concern especially when participating nodes are possible to be compromised.

This dissertation mainly focuses on three challenges in resource management of big data processing platforms. First, the status of a cluster is dynamic. The performance of a cluster can be affected by many reasons, such as hardware failure and service interference. The prefixed setting cannot arrange the resource dynamically. Second, the data flow is complicated. Many popular systems pro- cess an application in multiple stages, and one stage may depend on the result of earlier stages. Traditional scheduling policies such as FIFO cannot handle the re- sources well. Third, data exchanging is frequent and time-exhausting. Especially in an IoT cluster, the small devices communicate through WIFI, which suffers from signal interference and generates a significant network delay. In addition, most modern clusters are heterogeneous. The machines have various processing and storage capabilities, which makes above problems even more challenging. This thesis develops new resource management strategies to solve previous challenges respectively. The newly developed systems can arrange data storage automatically, manage resources to each stage intelligently and transfer data packet efficiently in heterogeneous environments.

This dissertation also studies the security of big data processing platforms from the aspect of resource abuse and data verification. In a cluster of thousands of machines, it is common for several of them to be attacked, and the compromised nodes can launch severe attack without being noticed. These attacks may abuse the resources to degrade system performance or falsify the original data to generate misleading results. In this thesis, we design such kind of attacks, and provide solutions against them.

Comments

Free and open access to this Campus Access Dissertation is made available to the UMass Boston community by ScholarWorks at UMass Boston. Those not on campus and those without a UMass Boston campus username and password may gain access to this dissertation through resources like Proquest Dissertations & Theses Global or through Interlibrary Loan. If you have a UMass Boston campus username and password and would like to download this work from off-campus, click on the "Off-Campus UMass Boston Users" link above.

Share

COinS