Date of Award
Campus Access Dissertation
Doctor of Philosophy (PhD)
High-speed research networks developed significantly in the past decade, and they are becoming an essential infrastructure component for supporting large-scale data-intensive scientific projects. With the increased use of high-speed research networks, a large amount of network measurement data is becoming available at flow granularity. Such data has many uses, such as performance analysis, diagnostics, security incident response, etc. However, collecting and sharing such data raises important concerns related to individual users’ privacy. Once the network data are released, personal details like website browsing access patterns may be extracted from the network flows. Such details can be inferred even when payloads are suppressed, by inspecting only flow metadata. In this thesis, I propose a framework for collecting and sanitizing network flow data to provide various stakeholders (e.g. network researchers, engineers) with accurate aggregate query results, while preserving the privacy of individual users of the network. Using big data tools and techniques, as well as the de-facto standard in data protection (differential privacy) the proposed framework is capable to process high-rate and high- volume network flows in an efficient manner. The framework uses state-of-the-art adaptations of differential privacy techniques customized for network flow sanitization, as well as efficient data storage and organization strategies on top of prominent BigData technologies like Apache Hadoop, HBase and MapReduce. We further enhance the capabilities of the framework by devising a series of algorithms that construct differentially private synopsis of datasets in order to improve the accuracy of users’ queries over that data. The differentially private synopsis of the data is a well studied problem in the field of privacy research, and it has numerous applications in geo-spatial decomposition. We highlight how our data independent methods based on Sparse Vector Techniques and optimal analysis of error components are offering better accuracy to this problem than existing published literature approaches.
Niculaescu, Oana-Georgiana, "A Differentially-Private and Efficient Framework for Collecting and Processing Network Flow Data" (2019). Graduate Doctoral Dissertations. 528.