Dynamic programming (4) Example

Consider a simple consumption-saving model, where action (a) is defined by the amount of savings each period, state (s) defined by the current stock, reward be the utility which depends on consumption (c=s-a). Suppose that state is updated where the output is drawn from a uniform distribution on {0, . . . , B}. Let the global upper bound of storage be M.

State space

State space is $n = M+B+1$ dimension. $$ S = \lbrace{0, \cdots, M+B \rbrace} $$

The set of feasible actions at s

The set of actions is $m = M+1$ dimensions. $$ A = \lbrace{0, \cdots, M \rbrace} $$

Reward function

$$ r(s, a) = u(c)= u(s-a) = c^{\alpha} = (s-a)^{\alpha} $$

Transition probabilities

$$ Q(s, a, s^\prime) := \begin{cases} \frac{1}{B+1} & \text{if } a \leq s^\prime \leq a + B \newline 0 & \text{otherwise} \end{cases} $$

Defining instances

R is $n \times m $ reward array, Q is $n \times m \times n$ array. Set R[s,a] = u(s-a) if $a\leq s$ and negative infinity otherwise.

Julia codes

using BenchmarkTools, Plots, QuantEcon, Parameters
SimpleOG = @with_kw (B = 10, M = 5, α = 0.5, β = 0.9)

function transition_matrices(g)
    # unpack g to corresponding parameters
    (B, M, α, β) = g
    # define reward
    u(c) = c^α
    # define number of possible states (n) 
    n = B + M + 1
    # define number of possible actions (m)
    m = M + 1

    # set R, a reward matrix be n by m matrix, where 
    # row indicates state and column indicates actions
    R = zeros(n, m)
    # set Q, a transition matrix, a n by m by n matrix
    Q = zeros(n, m, n)

    # do the interation to produce all possible outcomes
    # for a predefined M
    for a in 0:M
        # for all rows of each column a and across a:a+B, apply same probabilities
        Q[:, a + 1, (a:(a + B)) .+ 1] .= 1 / (B + 1)
        # create reward function 
        # where it takes positive utility when consumption is positive, 
        # otherwise takes -inf
        for s in 0:(B + M)
            R[s + 1, a + 1] = (a≤s ? u(s - a) : -Inf)
        end
    end

    # return transition matrix Q and reward matrix R
    return (Q = Q, R = R)
end

SimpleOG = @with_kw (B = 10, M = 5, α = 0.5, β = 0.9)

# unpack SimpleOG()
g = SimpleOG()

# Store return values to Q, 16 by 6 by 16 array and R, 16 by 6 array
Q, R = transition_matrices(g);


# DiscreteDP takes Reward array, transition probability array Q and discount factor 
ddp=DiscreteDP(R, Q, g.β)

# solve for the discrete dynamic programming
results=solve(ddp, PFI)

Output:

julia>  results.v
16-element Vector{Float64}:
19.017402216959916
20.017402216959912
20.431615779333004
20.749453024528783
21.040780991093477
21.308730183524606
21.54479816102439
21.769281810799857
21.982703576083242
22.18824322823849
22.384504796519902
22.57807736386172
22.761091269771104
22.943767083452705
23.115339958706517
23.277617618874896

julia> results.sigma .-1
16-element Vector{Int64}:
0
0
0
0
1
1
1
2
2
3
3
4
5
5
5
5

julia> results.num_iter
3

julia> results.mc
Discrete Markov Chain
stochastic matrix of type Adjoint{Float64, Matrix{Float64}}:
[0.09090909090909091 0.09090909090909091 … 0.0 0.0; 0.09090909090909091 0.09090909090909091 … 0.0 0.0; … ; 0.0 0.0 … 0.09090909090909091 0.09090909090909091; 0.0 0.0 … 0.09090909090909091 0.09090909090909091]

Note that we set B=10, M= 5. Thus the number of possible state amounts to 16 (1+10+5). Note that the lowest possible state starts with 0, and thus we add 1 to 15.

results.v : value function with each possible starting state
results.num_ter: number of iterations
results.sigma .-1 : optimal policy with each possible state
results.mc: it gives dynamics of state when it follows optimal policy

plot(x=1:16, y=results.v, Geom.line(), 
    Geom.point(), 
    Guide.ylabel("value"), 
    Guide.xlabel("state"))

Reference

https://julia.quantecon.org/dynamic_programming/discrete_dp.html

State space#

The set of feasible actions at s#

Reward function#

Transition probabilities#

Defining instances#

Julia codes#

Reference#