Date of Award


Document Type

Campus Access Dissertation

Degree Name

Doctor of Philosophy (PhD)


Computer Science

First Advisor

Gabriel Ghinita

Second Advisor

Dan Simovici

Third Advisor

Xiaohui Liang


High-speed research networks developed significantly in the past decade, and they are becoming an essential infrastructure component for supporting large-scale data-intensive scientific projects. With the increased use of high-speed research networks, a large amount of network measurement data is becoming available at flow granularity. Such data has many uses, such as performance analysis, diagnostics, security incident response, etc. However, collecting and sharing such data raises important concerns related to individual users’ privacy. Once the network data are released, personal details like website browsing access patterns may be extracted from the network flows. Such details can be inferred even when payloads are suppressed, by inspecting only flow metadata. In this thesis, I propose a framework for collecting and sanitizing network flow data to provide various stakeholders (e.g. network researchers, engineers) with accurate aggregate query results, while preserving the privacy of individual users of the network. Using big data tools and techniques, as well as the de-facto standard in data protection (differential privacy) the proposed framework is capable to process high-rate and high- volume network flows in an efficient manner. The framework uses state-of-the-art adaptations of differential privacy techniques customized for network flow sanitization, as well as efficient data storage and organization strategies on top of prominent BigData technologies like Apache Hadoop, HBase and MapReduce. We further enhance the capabilities of the framework by devising a series of algorithms that construct differentially private synopsis of datasets in order to improve the accuracy of users’ queries over that data. The differentially private synopsis of the data is a well studied problem in the field of privacy research, and it has numerous applications in geo-spatial decomposition. We highlight how our data independent methods based on Sparse Vector Techniques and optimal analysis of error components are offering better accuracy to this problem than existing published literature approaches.


Free and open access to this Campus Access Dissertation is made available to the UMass Boston community by ScholarWorks at UMass Boston. Those not on campus and those without a UMass Boston campus username and password may gain access to this dissertation through resources like Proquest Dissertations & Theses Global or through Interlibrary Loan. If you have a UMass Boston campus username and password and would like to download this work from off-campus, click on the "Off-Campus UMass Boston Users" link above.