Date of Award


Document Type

Campus Access Dissertation

Degree Name

Doctor of Philosophy (PhD)


Computational Sciences

First Advisor

Rahul V. Kulkarni

Second Advisor

Kourosh Zarringhalam

Third Advisor

Nurit Haspel


Reinforcement learning (RL) is an important field of research in machine learning that is increasingly being applied to complex optimization problems in physics. In parallel, concepts from physics have contributed to important advances in RL with developments such as entropy regularized RL. While these developments have led to advances in both fields, obtaining analytical solutions for optimization in entropy regularized RL is currently an open problem. In this work, we establish a mapping between entropy regularized RL and research in non-equilibrium statistical mechanics focusing on Markovian processes conditioned on rare events. We do this through the Bayesian inference perspective to RL, a.k.a. the control-as-inference framework. In the long-time limit, we apply approaches from large deviation theory to derive exact analytical results for the optimal policy and optimal dynamics in Markov Decision Process (MDP) models of reinforcement learning.

In the case of stochastic dynamics, the results derived lead to optimal policies if we assume that the agent also has control over the system dynamics. For the case that the agent has no control over system dynamics, direct application of results derived can lead to risk-taking optimistic policies, which is undesirable. To address this issue, current approaches involve a constrained optimization procedure which fixes system dynamics to the original dynamics, however this approach is not consistent with the unconstrained Bayesian inference framework. In this work, we develop an exact mapping from the constrained optimization problem in entropy regularized RL to a different optimization problem which can be solved using the unconstrained Bayesian inference approach. We show that the objective functions optimized are equivalent for both problems, thus our results lead to the exact solution for the optimal policy in entropy regularized RL with fixed stochastic dynamics through Bayesian inference.

The results obtained lead to a novel analytical and computational framework for entropy regularized RL, which is tested through experiments and validated against ground truth solutions. The framework enables the development of novel algorithms for entropy regularized RL that leverage the insights learnt. The mapping established in this work connects current research in reinforcement learning and non-equilibrium statistical mechanics, thereby opening new avenues for the application of analytical and computational approaches from one field to cutting-edge problems in the other.


Free and open access to this Campus Access Dissertation is made available to the UMass Boston community by ScholarWorks at UMass Boston. Those not on campus and those without a UMass Boston campus username and password may gain access to this dissertation through resources like Proquest Dissertations & Theses Global or through Interlibrary Loan. If you have a UMass Boston campus username and password and would like to download this work from off-campus, click on the "Off-Campus UMass Boston Users" link above.