Date of Award

12-31-2021

Document Type

Campus Access Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Science

First Advisor

Dan Simovici

Second Advisor

Marc Pomplun

Third Advisor

Nurit Haspel

Abstract

Graph data mining techniques can be applied to data structures that are modeled as graphs/networks. In this work, we explore several techniques around centrality measures for sets of vertices in graphs, and techniques for bipartite graphs and recommendation systems. Some aspects related to fairness in classification systems are also investigated.

Vertex centrality has important applications in fields that deal with data that can be modeled as graphs: social networks, computer networks, and biological networks to name a few. In this work we focus on several centrality measures.

Betweenness centrality of a set of vertices can be seen as a measure of the control of flow of information that the set has within a network. We introduce the notion of saturated betweenness centrality set, which is defined as a set whose betweenness centrality will not increase by adding more members, but will decrease if any of its members is removed. We examine various properties of saturated betweenness centrality sets. These findings allow us to introduce an algorithm to optimally detect associations with high control of flow of information.

We define the Laplacian centrality of a set of vertices as a measurement of the aggregate influence that these vertices have on the connectivity of the graph. We characterize sets of vertices whose removal have maximal effect on graph connectivity and we propose an Apriori-based algorithm for finding these sets.

Bipartite graphs can be used to model many real-world relationships, with applications in many domains such as medicine and social networks. We propose several properties and measures related to bipartite graphs. We present an application of maximal bicliques of bipartite graphs to recommendation systems that makes use of the notion of biclique similarity of a set of vertices in order to recommend items to users in a certain order of preference.

Fairness in data mining is a matter of great importance. We discuss several notions related to fairness and we investigate how two fairness approaches applied at the training time of a binary classification algorithm impact the results.

Comments

Free and open access to this Campus Access Dissertation is made available to the UMass Boston community by ScholarWorks at UMass Boston. Those not on campus and those without a UMass Boston campus username and password may gain access to this dissertation through resources like Proquest Dissertations & Theses Global or through Interlibrary Loan. If you have a UMass Boston campus username and password and would like to download this work from off-campus, click on the "Off-Campus UMass Boston Users" link above.

Share

COinS