Dynamic programming (3) Discrete DPs

April 10, 2023 · 1 min · 37 words · Me | Suggest Changes

Let \(s_t\) denotes the state variable, \(a_t\) denotes the action, \(\beta\) denotes a discount factor. Note that \(r(a_t, s_t)\) can be interpreted as a current reward that is a function of the current action and current state.