Consider a simple consumption-saving model, where action (a) is defined by the amount of savings each period, state (s) defined by the current stock, reward be the utility which depends on consumption (c=s-a). Suppose that state is updated where the output is drawn from a uniform distribution on {0, . . . , B}. Let the global upper bound of storage be M.

## State space#

State space is $$n = M+B+1$$ dimension. $$S = \lbrace{0, \cdots, M+B \rbrace}$$

## The set of feasible actions at s#

The set of actions is $$m = M+1$$ dimensions. $$A = \lbrace{0, \cdots, M \rbrace}$$

## Reward function#

$$r(s, a) = u(c)= u(s-a) = c^{\alpha} = (s-a)^{\alpha}$$

## Transition probabilities#

$$Q(s, a, s^\prime) := \begin{cases} \frac{1}{B+1} & \text{if } a \leq s^\prime \leq a + B \newline 0 & \text{otherwise} \end{cases}$$

## Defining instances#

R is $$n \times m$$ reward array, Q is $$n \times m \times n$$ array. Set R[s,a] = u(s-a) if $$a\leq s$$ and negative infinity otherwise.

## Julia codes#

using BenchmarkTools, Plots, QuantEcon, Parameters
SimpleOG = @with_kw (B = 10, M = 5, α = 0.5, β = 0.9)

function transition_matrices(g)
# unpack g to corresponding parameters
(B, M, α, β) = g
# define reward
u(c) = c^α
# define number of possible states (n)
n = B + M + 1
# define number of possible actions (m)
m = M + 1

# set R, a reward matrix be n by m matrix, where
# row indicates state and column indicates actions
R = zeros(n, m)
# set Q, a transition matrix, a n by m by n matrix
Q = zeros(n, m, n)

# do the interation to produce all possible outcomes
# for a predefined M
for a in 0:M
# for all rows of each column a and across a:a+B, apply same probabilities
Q[:, a + 1, (a:(a + B)) .+ 1] .= 1 / (B + 1)
# create reward function
# where it takes positive utility when consumption is positive,
# otherwise takes -inf
for s in 0:(B + M)
R[s + 1, a + 1] = (a≤s ? u(s - a) : -Inf)
end
end

# return transition matrix Q and reward matrix R
return (Q = Q, R = R)
end

SimpleOG = @with_kw (B = 10, M = 5, α = 0.5, β = 0.9)

# unpack SimpleOG()
g = SimpleOG()

# Store return values to Q, 16 by 6 by 16 array and R, 16 by 6 array
Q, R = transition_matrices(g);

# DiscreteDP takes Reward array, transition probability array Q and discount factor
ddp=DiscreteDP(R, Q, g.β)

# solve for the discrete dynamic programming
results=solve(ddp, PFI)


Output:

julia>  results.v
16-element Vector{Float64}:
19.017402216959916
20.017402216959912
20.431615779333004
20.749453024528783
21.040780991093477
21.308730183524606
21.54479816102439
21.769281810799857
21.982703576083242
22.18824322823849
22.384504796519902
22.57807736386172
22.761091269771104
22.943767083452705
23.115339958706517
23.277617618874896

julia> results.sigma .-1
16-element Vector{Int64}:
0
0
0
0
1
1
1
2
2
3
3
4
5
5
5
5

julia> results.num_iter
3

julia> results.mc
Discrete Markov Chain
stochastic matrix of type Adjoint{Float64, Matrix{Float64}}:
[0.09090909090909091 0.09090909090909091 … 0.0 0.0; 0.09090909090909091 0.09090909090909091 … 0.0 0.0; … ; 0.0 0.0 … 0.09090909090909091 0.09090909090909091; 0.0 0.0 … 0.09090909090909091 0.09090909090909091]


Note that we set B=10, M= 5. Thus the number of possible state amounts to 16 (1+10+5). Note that the lowest possible state starts with 0, and thus we add 1 to 15.

• results.v : value function with each possible starting state
• results.num_ter: number of iterations
• results.sigma .-1 : optimal policy with each possible state
• results.mc: it gives dynamics of state when it follows optimal policy
plot(x=1:16, y=results.v, Geom.line(),
Geom.point(),
Guide.ylabel("value"),
Guide.xlabel("state")) ## Reference#

https://julia.quantecon.org/dynamic_programming/discrete_dp.html