Author ORCID Identifier

0009-0001-3231-5133

Date of Award

Summer 8-31-2025

Document Type

Open Access Thesis

Degree Name

Master of Science (MS)

Department

Computer Science

First Advisor

Ping Chen

Second Advisor

Wei Ding

Third Advisor

Dan Simovici

Abstract

Large Language Models have improved significantly in the past couple of years due to the adoption of transformers. However, transformers still find it challenging to process videos due to limited context size caused by their quadratic computing cost. Therefore, we studied a booming field in machine learning which powers applications like social scene analysis and video surveillance systems called Group Activity Recognition (GAR). We found that recent models were able to achieve more than 90% accuracy on popular datasets like the Volleyball dataset, however, it turned out that even they relied on transformers.

Therefore, in this work, we developed a special kind of operation called Sequence Unfolding & Folding that exploits the absence of long-range dependencies in group activity dataset samples to deal with the quadratic compute cost of transformers. We employed our technique directly above all the transformer blocks in the current SOTA in GAR called Bi-Causal, and found it to become linear in performance with negligible drop in accuracy.

We also analyzed Bi-Causal behavior by replacing the vanilla multihead attention layers in all the transformer encoders with the recently introduced Lightning Attention - 2 architecture having linear performance. We found it continuously running into issues like high GPU memory usage and NaN values when used directly and hence we also developed a custom safety layer for the same.

Comments

Free and open access to this Campus Access Thesis is made available to the UMass Boston community by ScholarWorks at UMass Boston. Those not on campus and those without a UMass Boston campus username and password may gain access to this thesis through Interlibrary Loan. If you have a UMass Boston campus username and password and would like to download this work from off-campus, click on the “Off-Campus Users” button.

Recommended Citation

Gondkar, Yash U., "SUGAR: A Sequence Unfolding Based Transformer Model for Group Activity Recognition" (2025). Graduate Masters Theses. 919.
https://scholarworks.umb.edu/masters_theses/919

Download

Included in

Algebra Commons, Analysis Commons, Artificial Intelligence and Robotics Commons, Other Computer Sciences Commons, Other Mathematics Commons

COinS

Graduate Masters Theses

SUGAR: A Sequence Unfolding Based Transformer Model for Group Activity Recognition

Author ORCID Identifier

Date of Award

Document Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Abstract

Comments

Recommended Citation

Included in

Search

Browse

Author Corner

Links

Contact Us

Graduate Masters Theses

SUGAR: A Sequence Unfolding Based Transformer Model for Group Activity Recognition

Author

Author ORCID Identifier

Date of Award

Document Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Abstract

Comments

Recommended Citation

Included in

Share

Search

Browse

Author Corner

Links

Contact Us