Date of Award
Campus Access Dissertation
Doctor of Philosophy (PhD)
Accurately discovering knowledge from a huge volume of short messages generated daily on social media platforms is a critical challenge. Conventional topic models like Latent Dirichlet Allocation (LDA) and its variants that are widely used to automatically extract thematic information from regular-sized documents fail to discover essential information from short texts. Short text documents such as tweets, compared to regular-sized documents such as news articles, lack word co-occurrence information, which leads to very sparse and high dimensional vector representations. This extreme sparsity brings challenges to applying the conventional topic models on social media short texts. In this study, a novel heuristic topic model denoted as the Hashtag-Cluster-based Aggregation model (HCA) is developed to address this sparseness problem. This heuristic topic model treats tweets as semi-structured texts and uses hashtag relations to aggregate related tweets and create larger text documents for topic modeling. At an application level, the HCA model is used to study the topic of public conversations on Twitter about prescription opioids and joint discussions about prescription opioids and marijuana. Monitoring the topic of these discussions at a population scale, such as among the population of Twitter users, can help policymakers and public health officials to better understand the public perception about prescription opioids, study the association between prescription opioids and marijuana, detect abuse behaviors, and surveil the trend of related incidents. The proposed HCA model is evaluated using TF-IDF, GloVe, and FastText word embeddings in comparison to the Hashtag-based Aggregation model (HA), the most common heuristic hashtag-based Twitter aggregation model in the literature. Findings of the model evaluation proved the impact of including the hashtag information in topic modeling by generating more coherent topics than the HA model, which is based on aggregating tweets with the same hashtags.
Rahafrooz, Maryam, "Social Media Text Analytics: An Application to Prescription Opioids-Related Conversations" (2021). Graduate Doctoral Dissertations. 668.