dynamic programming

The McCall Search Model

Set up An agent decides whether to delay her employment later (keep searching) or take the job (settle). An agent’s action is binary (keep searching or take the job). If she takes the job, she will get constant wage indefinitely. If she rejects the job, she receives unemployment compensation c and reconsider her choice next period. The probability of observing wage offer is uniformly distributed. There are n states of wages with equal probability....

Dynamic programming (4) Example

Consider a simple consumption-saving model, where action (a) is defined by the amount of savings each period, state (s) defined by the current stock, reward be the utility which depends on consumption (c=s-a). Suppose that state is updated where the output is drawn from a uniform distribution on {0, . . . , B}. Let the global upper bound of storage be M. State space State space is \(n = M+B+1\) dimension....

Dynamic programming (3) Discrete DPs

Let \(s_t\) denotes the state variable, \(a_t\) denotes the action, \(\beta\) denotes a discount factor. Note that \(r(a_t, s_t)\) can be interpreted as a current reward that is a function of the current action and current state.

Dynamic programming (2) Rewards

Rewards The goal of the agent is to maximize the cumulative sum of the rewards of the long-run. The rewards could be arbitrarily chosen number that summarizes how one wants the agent to behave under specific state, action, and subsequent state. The rewards function, formally represented by \(R(s)\) or \(R(s,a)\), or \(R(s, a, s^\prime)\) can depend on current state, the subsequent state as well as the action taken by the agent taken in the current state....

Dynamic programming (1) MDP

The agent and the environment In finite Markov Decision Process (MDP), we have three sets, a set of states, a set of actions, and a set of rewards. The learner or decision maker is called agent and the outside system that the agent interacts with is called environment. Everyperiod, the agent takes actions and correspondingly the environment reacts to produce new states to the agent. Each period, the environment presents \(S_t\) from a set \( S\)....