Author ORCID Identifier
0009-0001-3231-5133
Date of Award
Summer 8-31-2025
Document Type
Open Access Thesis
Degree Name
Master of Science (MS)
Department
Computer Science
First Advisor
Ping Chen
Second Advisor
Wei Ding
Third Advisor
Dan Simovici
Abstract
Large Language Models have improved significantly in the past couple of years due to the adoption of transformers. However, transformers still find it challenging to process videos due to limited context size caused by their quadratic computing cost. Therefore, we studied a booming field in machine learning which powers applications like social scene analysis and video surveillance systems called Group Activity Recognition (GAR). We found that recent models were able to achieve more than 90% accuracy on popular datasets like the Volleyball dataset, however, it turned out that even they relied on transformers.
Therefore, in this work, we developed a special kind of operation called Sequence Unfolding & Folding that exploits the absence of long-range dependencies in group activity dataset samples to deal with the quadratic compute cost of transformers. We employed our technique directly above all the transformer blocks in the current SOTA in GAR called Bi-Causal, and found it to become linear in performance with negligible drop in accuracy.
We also analyzed Bi-Causal behavior by replacing the vanilla multihead attention layers in all the transformer encoders with the recently introduced Lightning Attention - 2 architecture having linear performance. We found it continuously running into issues like high GPU memory usage and NaN values when used directly and hence we also developed a custom safety layer for the same.
Recommended Citation
Gondkar, Yash U., "SUGAR: A Sequence Unfolding Based Transformer Model for Group Activity Recognition" (2025). Graduate Masters Theses. 919.
https://scholarworks.umb.edu/masters_theses/919
Included in
Algebra Commons, Analysis Commons, Artificial Intelligence and Robotics Commons, Other Computer Sciences Commons, Other Mathematics Commons
Comments
Free and open access to this Campus Access Thesis is made available to the UMass Boston community by ScholarWorks at UMass Boston. Those not on campus and those without a UMass Boston campus username and password may gain access to this thesis through Interlibrary Loan. If you have a UMass Boston campus username and password and would like to download this work from off-campus, click on the “Off-Campus Users” button.