🤖 AI Summary
A short proof shows the discrete-time Hamilton–Jacobi–Bellman (HJB) / Bellman optimality equation is nothing more than the dual of a simple linear program. Model the control problem by occupancy measures x_t over state-action pairs, a “sum-over-actions” marginalization matrix that gives state marginals, and transition matrices that map each state-action to a distribution over next states. The primal LP minimizes total expected loss ∑_t c_t^T x_t subject to nonnegativity and flow constraints that enforce the dynamics and initial state distribution. Introducing dual variables v_t for those flow constraints yields a dual LP whose constraints are exactly v_t(s) ≤ c_t(s,a) + (P_{t,a} v_{t+1}) for every state-action pair (s,a). Maximizing the dual therefore constructs the cost-to-go vectors v_t and, at optimum, the inequalities become the familiar Bellman equalities v_t(s) = min_a [c_t(s,a) + E_{s'|s,a} v_{t+1}(s')].
The result matters because it demystifies HJB as standard linear duality, clarifying why value functions exist, how greedy policies arise (choose actions attaining the minima; complementary slackness gives occupancy measures), and why LP formulations can be used for control. The same viewpoint extends to stochastic settings and, under limiting assumptions, recovers continuous HJB PDEs — offering a unifying computational and theoretical bridge between dynamic programming, linear programming, and stochastic control.
Loading comments...
login to comment
loading comments...
no comments yet