Algorithms | RDP 2025-03: Fast Posterior Sampling in Tightly Identified SVARs Using ‘Soft’ Sign Restrictions

RDP 2025-03: Fast Posterior Sampling in Tightly Identified SVARs Using ‘Soft’ Sign Restrictions 3. Algorithms

Matthew Read and Dan Zhu

May 2025

Download the Paper 1.45MB

This section describes algorithms that can be used to obtain draws of Q from the uniform distribution over $𝒬 (ϕ | S)$ . We focus on sampling from this component of the posterior (or prior) distribution rather than the joint distribution of $θ$ , since the problem of sampling the reduced-form parameters is well understood (e.g. Del Negro and Schorfheide 2011). As a benchmark, we first describe an accept-reject algorithm. We then introduce our general approach to sampling based on ‘soft’ sign restrictions, before describing a specific MCMC sampler – the slice sampler – that can be used to implement our general approach.

3.1 Accept-reject sampling

The following algorithm describes an accept-reject sampler for drawing from the conditionally uniform distribution over $𝒬 (ϕ | S)$ .

Algorithm 1 (Accept-reject sampling). For a given value of $ϕ$ :

Step 1. Draw an n×n matrix Z of independent standard normal random variables and let Z = $\tilde{Q} R$ be the QR decomposition of Z, where $\tilde{Q}$ is orthonormal and R is upper-triangular with non-negative diagonal elements.

Step 2. Normalise the signs of the columns of $\tilde{Q}$ such that $diag (A_{0}) \geq 0_{n \times 1}$ and let Q be the normalised matrix. If $\tilde{Q} = ({\tilde{q}}_{1}, ..., {\tilde{q}}_{n})$ , then

(6)

Q = (sign ({(Σ_{t r}^{- 1} e_{1, n})}^{'} {\tilde{q}}_{1}) {\tilde{q}}_{1}, ..., sign ({(Σ_{t r}^{- 1} e_{n, n})}^{'} {\tilde{q}}_{n}) {\tilde{q}}_{n})

Step 3. Keep the draw if it satisfies $S (ϕ, Q) \geq 0_{s \times 1}$ and terminate the algorithm. Otherwise, return to Step 1.

Step 1 draws $\tilde{Q}$ from a uniform distribution over $𝒪 (n)$ using an algorithm proposed in Stewart (1980) (see also the descriptions in Rubio-Ramírez et al (2010) and Arias et al (2018)). Step 2 normalises the draw so that the sign normalisation diag $(A_{0}) \geq 0_{n \times 1}$ is satisfied, which increases the efficiency of the sampler relative to a sampler that omits this step and uses the subsequent accept-reject step to impose the sign normalisation.^[10] Step 3 is the accept-reject step, which simply involves checking whether the sign restrictions are satisfied. The algorithm is repeated to obtain the desired number of draws.

Let Q(Z) be a function that returns Q in the QR decomposition of Z (so Z = QR). Algorithm 1 can be interpreted as drawing Z from the truncated normal distribution with density

(7)

f (Z | Q (Z) \in 𝒬 (ϕ | S)) = \frac{f_{Z} (Z) 𝟙 (Q (Z)) \in 𝒬 (ϕ | S)}{\int_{{\tilde{Z} : Q (\tilde{Z}) \in 𝒬 (ϕ | S)}} f_{Z} (\tilde{Z}) d \tilde{Z}}

where f_Z(Z) is the density of the standard matrix normal distribution. The interpretation of Algorithm 1 as drawing from this density will be useful in introducing our sampler.

The challenge with using accept-reject sampling in this setting is that it may take a large number of candidate draws (and thus computational time) to obtain a sufficiently large number of draws satisfying the identifying restrictions. This will occur when $𝒬 (ϕ | S)$ is assigned small measure under the uniform distribution over $𝒪 (n)$ – that is, when identification is tight.

3.2 Soft sign restrictions

The indicator function $𝟙 (Q (Z) \in 𝒬 (ϕ | S))$ appearing in Equation (7) can be decomposed into a product of indicator functions corresponding to individual sign restrictions:

(8)

𝟙 (Q (Z) \in 𝒬 (ϕ | S)) = \prod_{l = 1}^{s} 𝟙 (Q (Z) \in 𝒬 (ϕ | S^{(l)}))

where $𝒬 (ϕ | S^{(l)}) = {Q \in 𝒪 (n) : S^{(l)} (ϕ, Q) \geq 0}$ and $S^{(l)} (ϕ, Q) \geq 0$ represents the lth sign restriction with l = 1,...,s. The key feature underlying our approach is that we replace the indicator function with a smooth regularisation (or penalty) function $Λ (f, Δ) : ℝ \times ℝ_{+} \to (0, 1)$ , which satisfies the following assumption.

Assumption 1. The regularisation function $Λ (f, Δ)$ is such that

\begin{array}{l} \lim_{f \to \infty} Λ (f, Δ) = 1 \forall Δ \in ℝ_{+} \\ \lim_{f \to - \infty} Λ (f, Δ) = 0 \forall Δ \in ℝ_{+} \\ \lim_{Δ \to 0} Λ (f, Δ) = {\begin{matrix} 1 & f \geq 0 \\ 0 & f < 0 \end{matrix} \end{array}

In addition, for some finite K > 0, it satisfies

| Λ (f, Δ) - 𝟙 (f \geq 0) | \leq K

for all $Δ \in ℝ_{+}$ and $f \in ℝ$ .

$Λ (f, Δ)$ can be interpreted as a function that penalises draws of Q (equivalently, Z) that violate (or are close to violating) the sign restrictions by down-weighting their density. In the limit as $Δ \to 0$ , the regularisation function converges to the indicator function. One choice for $Λ (f, Δ)$ that satisfies Assumption 1 (and that we will make use of below) is the logistic function:

(9)

Λ (f, Δ) = \frac{1}{1 + \exp (- f / Δ)}

This function is illustrated in Figure 1 for different values of $Δ$ .

Figure 1: Example of Regularisation Function – Logistic Function - A line chart plotting the regularisation or penalty function under different choices of the penalisation parameter, Delta. There are three lines, each of which corresponds to a different value of Delta. The chart shows that as Delta becomes smaller the penalisation function becomes more similar to the indicator function. — Figure 1: Example of Regularisation Function – Logistic Function

We propose sampling from a smooth density that replaces the indicator function with the regularisation function:

(10)

f_{Δ} (Z) = \frac{f_{Z} (Z) \prod_{l = 1}^{s} Λ (S^{(l)} (ϕ, Q (Z)), Δ)}{\int f_{Z} (\tilde{Z}) \prod_{l = 1}^{s} Λ (S^{(l)} (ϕ, Q (\tilde{Z})), Δ) d \tilde{Z}}

The advantage of working with this smooth density is that alternative sampling algorithms, such as MCMC methods, can be directly applied, which obviates the need for accept-reject sampling. In the limit as $Δ \to 0$ , the probability of obtaining a draw violating the restrictions approaches zero and the draws of Q are approximately uniformly distributed over $𝒬 (ϕ | S)$ . This claim is formalised in the following proposition.

Proposition 1. Assume the conditions in Assumption 1 hold and let T : $ℝ^{d} \to ℝ$ be such that $\int_{R^{d}} | T (Z) | f_{Z} (Z) d Z < \infty$ . Then,

(11)

\lim_{Δ \to 0} 𝔼_{f} (T (Z)) - 𝔼_{Δ} (T (Z)) = 0

where $𝔼_{f} (.)$ and $𝔼_{Δ} (.)$ are expectations taken under f and $f_{Δ}$ , respectively.

For $Δ > 0$ , the obtained draws of Z will not necessarily satisfy the sign restrictions and – conditional on satisfying the sign restrictions – will not follow the desired truncated normal distribution; equivalently, the draws of Q will not be uniformly distributed over $𝒬 (ϕ | S)$ . However, an importance-sampling step can be applied to obtain draws from an approximation of the desired distribution. The importance weights are given by

(12)

\frac{f (Z | Q Z) \in 𝒬 (ϕ | S)}{f_{Δ} (Z)} \propto \frac{f_{Z} (Z) 𝟙 (Q (Z) \in 𝒬 (ϕ | S))}{f_{Z} (Z) \prod_{l = 1}^{s} Λ (S^{(l)} (ϕ, Q (Z)), Δ)}

(13)

\propto \frac{𝟙 (Q (Z) \in 𝒬 (ϕ | S))}{\prod_{l = 1}^{s} Λ (S^{(l)} (ϕ, Q (Z)), Δ)}

We can compute these importance weights up to a normalising constant simply by evaluating the regularisation function and checking whether the sign restrictions are satisfied. The normalising constant is the ratio of the probability measures assigned to the identified set under the two probability distributions, and is computationally costly to obtain. An implication of ignoring this normalising constant is that the importance sampler draws from a distribution that is not exactly equal to $f (Z | Q (Z) \in 𝒬 (ϕ | S))$ . However, a corollary of Proposition 1 is that the normalising constant converges to one as $Δ \to 0$ (almost surely under the reduced-form prior). This implies that any bias present in the importance sampler should be small for small enough choices of $Δ$ .

Theoretically, a smaller $Δ$ reduces the bias when approximating the posterior distribution of the structural parameters. However, a smaller $Δ$ also introduces sampling inefficiencies as the distribution becomes steeper (i.e. as the gradient of the log density function becomes larger). In the context of a random walk Metropolis algorithm, this steepness implies the need for a relatively smaller tuning parameter (i.e. the scale of the proposal distribution) to achieve a reasonable acceptance rate, as larger steps are more likely to be rejected in regions with high gradient changes. In the next section, we discuss an alternative method – slice sampling – that is more robust in such situations, offering improved efficiency in navigating steep target distributions.

Finally, if the draws of Q are used only to approximate the bounds of an identified set, such as when conducting prior-robust Bayesian inference, resampling the draws is unnecessary and it suffices to discard draws that violate the sign restrictions. This is because the approximated bounds depend only on the minimum and maximum values of the parameter of interest evaluated at the draws of Q, so the distribution of the draws over the identified set does not matter in this case.

3.3 Slice sampling

There are many MCMC methods that could be used to sample from $f_{Δ} (Z)$ . We make use of the slice sampler, motivated by its robust convergence properties, efficiency (relative to standard random walk Metropolis algorithms) and ease of implementation (Neal 2003).

The slice sampler is motivated by the fact that sampling from $f_{Δ} (Z)$ is equivalent to sampling uniformly from the region under the density function. The ‘simple’ slice sampler constructs a Markov chain that converges to this uniform distribution by alternating between two steps: 1) sample y uniformly from the interval $[0, f_{Δ} (Z_{k})]$ given some predetermined Z_k; and 2) sample Z_k+1 uniformly from the ‘slice’ $S (y) = {Z : f_{Δ} (Z) > y}$ .^[11] Iterating over this process generates a sequence of dependent draws from the target density. Figure 2 illustrates this idea in a univariate setting.

Figure 2: Illustration of ‘Simple’ Slice Sampler in Univariate Setting - A line chart illustrating how the slice sampler works in a univariate setting. The chart plots a bimodal density function for a random variable x. Given some initial value of x, a value is randomly drawn from a uniform distribution over an interval that ranges from zero to the density function evaluated at the initial value of x. For a particular initial value of x and random draw from the corresponding interval, the chart plots the slice, which is the set of values of x where the density function exceeds the drawn value. In this example, the slice is the union of two disconnected intervals. — Figure 2: Illustration of ‘Simple’ Slice Sampler in Univariate Setting

Mira and Tierney (2002) prove that if the target density is bounded and has support with finite Lebesgue measure, then the simple slice sampler is uniformly ergodic. More importantly, as noted by Roberts and Rosenthal (1999), the simple slice sampler is almost always geometrically ergodic, which is a property shared by very few other MCMC algorithms. These properties have led to slice sampling becoming a widely used method for sampling from non-standard distributions in low dimensions, although the applicability of the simple slice sampler is limited. In the multivariate setting, sampling uniformly from $S (y)$ is generally infeasible, making the second step of the simple slice sampling algorithm impractical. To address this, the second step is typically modified to sample a Markov chain on $S (y)$ , which maintains the uniform distribution over the slice as its invariant distribution.

In the multivariate setting, the slice sampler can be implemented by updating each variable in turn or all variables simultaneously. We build on Matlab's implementation of the slice sampler, which updates all variables simultaneously.^[12] Sampling directly from a uniform distribution over the slice is infeasible in the current setting. However, as discussed above, any update that leaves the uniform distribution over the slice invariant will yield a Markov chain that converges to the target distribution. Matlab's implementation of the slice sampler updates the chain in a way that satisfies this condition using an approach described in Neal (2003). To briefly summarise, this procedure involves: randomly positioning a hypercube with side width w around the initial point; drawing a point from a uniform distribution over the hypercube; and repeatedly shrinking the hypercube (‘shrinking in’) if the candidate draw lies outside the slice until a draw is obtained within the slice.

To give an example of the shrinking-in procedure, consider the univariate setting illustrated in Figure 2. Let [x_l, x_r] be an interval of width w randomly positioned around x₀. Consider a random draw x^(p) from the uniform distribution over [x_l, x_r]. If $x^{(p)} \in S (y)$ , we set x₁ = x^(p). If $x^{(p)} \notin S (y)$ , we shrink the interval by setting x_l = x^(p) if x^(p) < x₀ or x_r = x^(p) if x^(p) > x₀. We draw again from the uniform distribution over the updated interval, repeating this process until we obtain a draw within $S (y)$ .

The choice of w will affect the speed at which the Markov chain converges to the target distribution and the sampler's computational efficiency. Under the general class of identifying restrictions we consider, there is no guarantee that $𝒬 (ϕ | S)$ is path connected, which means the smoothed density $f_{Δ} (Z)$ may be multimodal. A small value of w may lead to difficulties moving between modes and slow convergence. On the other hand, setting w too large can make the sampler computationally inefficient, because many shrinking-in steps may be required to obtain a draw that lies within the slice. We try to balance these considerations by using a ‘contaminated’ proposal, where w = 1 with 95 per cent probability and w = 3 with 5 per cent probability. Choices of w on this scale seem reasonable given that, for small values of ∆, the target distribution resembles a truncated multivariate standard normal distribution.

Footnotes

This step could be modified to impose an alternative sign normalisation (e.g. diag

(A_{0}^{- 1}) \geq 0_{n \times 1}

). [10]

It is only necessary to evaluate

f_{Δ} (Z)

up to a constant of proportionality. [11]

As discussed in Neal (2003), extensions of the slice sampler can make use of local information about the shape of the target density, such as by using local quadratic approximations based on the derivatives of the log target density. The regularised constraints that we use allow us in principle to construct such approximations. Further work could potentially improve the efficiency of our approach by using this information. [12]