Consider a simple consumption-saving model, where action (a) is defined by the amount of savings each period, state (s) defined by the current stock, reward be the utility which depends on consumption (c=s-a). Suppose that state is updated where the output is drawn from a uniform distribution on {0, . . . , B}. Let the global upper bound of storage be M.
State space
State space is \(n = M+B+1\) dimension. $$ S = \lbrace{0, \cdots, M+B \rbrace} $$
The set of feasible actions at s
The set of actions is \(m = M+1\) dimensions. $$ A = \lbrace{0, \cdots, M \rbrace} $$
Reward function
$$ r(s, a) = u(c)= u(s-a) = c^{\alpha} = (s-a)^{\alpha} $$
Transition probabilities
$$ Q(s, a, s^\prime) := \begin{cases} \frac{1}{B+1} & \text{if } a \leq s^\prime \leq a + B \newline 0 & \text{otherwise} \end{cases} $$
Defining instances
R is \(n \times m \) reward array, Q is \(n \times m \times n\) array. Set R[s,a] = u(s-a) if \(a\leq s\) and negative infinity otherwise.
Julia codes
using BenchmarkTools, Plots, QuantEcon, Parameters
SimpleOG = @with_kw (B = 10, M = 5, α = 0.5, β = 0.9)
function transition_matrices(g)
# unpack g to corresponding parameters
(B, M, α, β) = g
# define reward
u(c) = c^α
# define number of possible states (n)
n = B + M + 1
# define number of possible actions (m)
m = M + 1
# set R, a reward matrix be n by m matrix, where
# row indicates state and column indicates actions
R = zeros(n, m)
# set Q, a transition matrix, a n by m by n matrix
Q = zeros(n, m, n)
# do the interation to produce all possible outcomes
# for a predefined M
for a in 0:M
# for all rows of each column a and across a:a+B, apply same probabilities
Q[:, a + 1, (a:(a + B)) .+ 1] .= 1 / (B + 1)
# create reward function
# where it takes positive utility when consumption is positive,
# otherwise takes -inf
for s in 0:(B + M)
R[s + 1, a + 1] = (a≤s ? u(s - a) : -Inf)
end
end
# return transition matrix Q and reward matrix R
return (Q = Q, R = R)
end
SimpleOG = @with_kw (B = 10, M = 5, α = 0.5, β = 0.9)
# unpack SimpleOG()
g = SimpleOG()
# Store return values to Q, 16 by 6 by 16 array and R, 16 by 6 array
Q, R = transition_matrices(g);
# DiscreteDP takes Reward array, transition probability array Q and discount factor
ddp=DiscreteDP(R, Q, g.β)
# solve for the discrete dynamic programming
results=solve(ddp, PFI)
Output:
julia> results.v
16-element Vector{Float64}:
19.017402216959916
20.017402216959912
20.431615779333004
20.749453024528783
21.040780991093477
21.308730183524606
21.54479816102439
21.769281810799857
21.982703576083242
22.18824322823849
22.384504796519902
22.57807736386172
22.761091269771104
22.943767083452705
23.115339958706517
23.277617618874896
julia> results.sigma .-1
16-element Vector{Int64}:
0
0
0
0
1
1
1
2
2
3
3
4
5
5
5
5
julia> results.num_iter
3
julia> results.mc
Discrete Markov Chain
stochastic matrix of type Adjoint{Float64, Matrix{Float64}}:
[0.09090909090909091 0.09090909090909091 … 0.0 0.0; 0.09090909090909091 0.09090909090909091 … 0.0 0.0; … ; 0.0 0.0 … 0.09090909090909091 0.09090909090909091; 0.0 0.0 … 0.09090909090909091 0.09090909090909091]
Note that we set B=10, M= 5. Thus the number of possible state amounts to 16 (1+10+5). Note that the lowest possible state starts with 0, and thus we add 1 to 15.
results.v
: value function with each possible starting stateresults.num_ter
: number of iterationsresults.sigma .-1
: optimal policy with each possible stateresults.mc
: it gives dynamics of state when it follows optimal policy
plot(x=1:16, y=results.v, Geom.line(),
Geom.point(),
Guide.ylabel("value"),
Guide.xlabel("state"))
Reference
https://julia.quantecon.org/dynamic_programming/discrete_dp.html