Date of Award

5-31-2026

Document Type

Open Access Thesis

Degree Name

Master of Science (MS)

Department

Computer Science

First Advisor

Kenneth Fletcher

Abstract

Offline evaluation underpins model selection in recommender systems, yet historical interaction logs are shaped by prior recommendation policies. Because users only provide feedback on exposed items, logged data entangles user preferences with exposure mechanisms, leading to exposure bias and potentially misleading model comparisons. Counterfactual estimators such as IPS, SNIPS, CRM, and DR offer principled corrections, but their empirical reliability across datasets and exposure regimes remains insufficiently under- stood. We present a systematic, cross-scale study of counterfactual evaluation in recommender systems. Comparing IPS, SNIPS, CRM, and DR on datasets with randomized exposure (Yahoo! R3, Coat, and KuaiRec), we analyze estimator behavior from small benchmarks to large-scale logs. We introduce practical stability diagnostics, including Effective Sample Size (ESS) and clipping sensitivity analysis, to identify high-variance regimes and expose when performance gains are unreliable. Our results characterize bias–variance trade- offs under varying overlap conditions and reveal when commonly used estimators break down in practice. We provide a standardized evaluation protocol and actionable guidelines, strengthening the methodological foundations of offline recommender evaluation under exposure bias.

Comments

Free and open access to this work is made available to the UMass Boston community by ScholarWorks at UMass Boston. Those not on campus and those without a UMass Boston campus username and password may gain access to this work through Interlibrary Loan. If you have a UMass Boston campus username and password and would like to download this work from off-campus, click on the “Off-Campus Users” button.

Share

COinS