# RDP 2023-07: Identification and Inference under Narrative Restrictions 3. General Framework

This section describes the general SVAR, outlines the identifying restrictions we consider, and defines the conditional and unconditional likelihoods in this general setting.

## 3.1 SVAR

Let yt be an n × 1 vector of variables following the SVAR(p) process:

(11) $A 0 y t = A + x t + ε t , t=1,...,T$

where: A0 is invertible; ${x}_{t}={\left({{y}^{\prime }}_{t-1},...,{{y}^{\prime }}_{t-p,},{{z}^{\prime }}_{t}\right)}^{\prime }$ with zt containing any exogenous variables (e.g. a constant); ${A}_{+}=\left({A}_{1},...,{A}_{p},{A}_{z}\right)$; and ${\epsilon }_{t}{}^{{}_{\sim }^{iid}}\mathcal{N}\left({0}_{n×1},{I}_{n}\right)$ are structural shocks. The initial conditions $\left({y}_{1-p},...,{y}_{0}\right)$ are given.

The reduced-form VAR(p) representation is

(12) $y t =B x t + u t , t=1,...,T$

where: $B=\left({B}_{1},...,{B}_{p},{B}_{z}\right)\text{\hspace{0.17em}}\text{with}\text{\hspace{0.17em}}{B}_{l}={A}_{0}^{-1}{A}_{l};$ and ${u}_{t}={A}_{0}^{-1}{\epsilon }^{{}_{\sim }^{iid}}\mathcal{N}\left({0}_{n×1},\Sigma \right)$ with $\Sigma ={A}_{0}^{-1}{\left({A}_{0}^{-1}\right)}^{\prime }$. $\varphi ={\left(\text{vec}{\left(B\right)}^{\prime },\text{vech}{\left(\Sigma \right)}^{\prime }\right)}^{\prime }\in \Phi$ are the reduced-form parameters.

As is standard in the literature that considers set-identified SVARs, we reparameterise the model into its orthogonal reduced form (e.g. Arias et al 2018):

(13) $y t =B x t + Σ tr Q ε t , t=1,...,T$

where: ${\Sigma }_{tr}$ is the lower-triangular Cholesky factor of $\Sigma \left(\text{i}\text{.e}\text{.}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\Sigma }_{tr}{{\Sigma }^{\prime }}_{tr}=\Sigma \right)$ with non-negative diagonal elements; and Q is an n × n orthonormal matrix with $𝒪\left(n\right)$ the set of all such matrices. The structural and orthogonal reduced-form parameterisations are related through the mapping $B={A}_{0}^{-1}{A}_{+},\Sigma ={A}_{0}^{-1}{\left({A}_{0}^{-1}\right)}^{\prime }$ and $Q={\Sigma }_{tr}^{-1}{A}_{0}^{-1}$ with inverse mapping ${A}_{0}={Q}^{\prime }{\Sigma }_{tr}^{-1}$ and ${A}_{+}={Q}^{\prime }{\Sigma }_{tr}^{-1}B.$

We assume B is such that the VAR(p) can be inverted into an infinite-order vector moving average (VMA$\left(\infty \right)$) representation:[7]

(14) $y t = ∑ h=0 ∞ C h u t−h = ∑ h=0 ∞ C h Σ tr Q ε t−h, t=1,....,T$

where Ch is the hth term in ${\left({I}_{n}-{\Sigma }_{l=1}^{p}{B}_{l}{L}^{l}\right)}^{-1}$ and L is the lag operator.[8] The (i, j)th element of the matrix ${C}_{h}{\Sigma }_{tr}Q$, which we denote by ${\eta }_{i,j,h}\left(\varphi ,Q\right)$, is the horizon-h impulse response of the ith variable to the jth structural shock:

(15) $η i,j,h ( ϕ,Q )= e ′ i,n C h Σ tr Q e j,n = c ′ i,h ( ϕ ) q j$

with ${{c}^{\prime }}_{i,h}\left(\varphi \right)={{e}^{\prime }}_{i,n}{C}_{h}{\Sigma }_{tr}$ the ith row of ${C}_{h}{\Sigma }_{tr}$ and qj = Qej,n the jth column of Q.

## 3.2 Narrative restrictions

In the absence of identifying restrictions, Q – and functions of Q such as ${\eta }_{i,j,h}\left(\varphi ,Q\right)$ – are set identified, since any $Q\in 𝒪\left(n\right)$ is consistent with the joint distribution of the data, which is summarised by the reduced-form parameters. Imposing identifying restrictions is equivalent to restricting Q to lie in a subspace of $𝒪\left(n\right)$. Throughout, we impose the ‘sign normalisation’ $\text{diag}\left({A}_{0}\right)=\text{diag}\left({Q}^{\prime }{\Sigma }_{tr}^{-1}\right)\ge {0}_{n×1}$, so a positive value of ${\epsilon }_{it}$ is a positive shock to the ith equation in the SVAR at time t.

It is common to impose sign restrictions on the impulse responses (e.g. Uhlig 2005) or on the structural parameters (e.g. Arias, Caldara and Rubio-Ramírez 2019). For example, the restriction ${\eta }_{i,j,h}\left(\varphi ,Q\right)={{c}^{\prime }}_{i,h}\left(\varphi \right){q}_{j}\ge 0$ is a linear inequality restriction on a single column of Q that depends only on the reduced-form parameters $\varphi$. Restrictions on elements of A0 take a similar form.

In contrast, NR constrain the values of the structural shocks in particular periods. The structural shocks are

(16) $ε t = A 0 u t = Q ′ Σ tr −1 u t$

The shock-sign restriction that the ith structural shock at time k is positive is

(17) $ε ik ( ϕ,Q, u k )= e ′ i,n Q ′ Σ tr −1 u k = ( Σ tr −1 u k ) ′ q i ≥0$

We can treat ut as observable given $\varphi$ and the data, so we suppress the dependence of ut on $\varphi$ and ${\left({{y}^{\prime }}_{t},{{x}^{\prime }}_{t}\right)}^{\prime }$ for notational convenience. The restriction in Equation (17) is a linear inequality restriction on a single column of Q. In contrast with traditional sign restrictions, the shock-sign restriction depends directly on the data through the reduced-form VAR innovations.

In addition to shock-sign restrictions, AR18 consider restrictions on the historical decomposition, which is the cumulative contribution of the jth shock to the observed unexpected change in the ith variable between periods k and k+h (i.e. the contribution to the (h+1)-step-ahead forecast error):

(18) $H i,j,k,k+h ( ϕ,Q, { u t } t=k k+h )= ∑ l=0 h e ′ i,n C l Σ tr Q e j,n e ′ j,n ε k+h−l = ∑ l=0 h c ′ i,l ( ϕ ) q j q ′ j Σ tr −1 u k+h−l$

An example is the restriction that the jth structural shock was the ‘most important contributor’ to the change in the ith variable between periods k and k + h, which requires that $|{H}_{i,j,k,k+h}|\ge {\mathrm{max}}_{l\ne j}|{H}_{i,l,k,k+h}|$. Another is that the jth structural shock was the ‘overwhelming contributor’ to the change in the ith variable between periods k and k + h, which requires that $|{H}_{i,j,k,k+h}|\ge {\Sigma }_{l\ne j}|{H}_{i,l,k,k+h}|$. From Equation (18), it is clear that these restrictions are nonlinear inequality constraints that simultaneously constrain every column of Q and that depend on the realisations of the data in particular periods in addition to the reduced-form parameters.

Other restrictions also naturally fit within this framework. For instance, Ludvigson et al (2018) restrict the magnitudes of structural shocks in particular periods (e.g. ${\epsilon }_{ik}\left(\varphi ,Q,{u}_{k}\right)<\lambda$ for some specified scalar $\lambda$). One could also consider restrictions on the relative magnitudes of a particular shock in different periods (e.g. ${\epsilon }_{ik}\left(\varphi ,Q,{u}_{k}\right)\ge {\epsilon }_{ij}\left(\varphi ,Q,{u}_{j}\right)$ for $j\ne k$).[9]

A collection of NR can be represented in the general form $N\left(\varphi ,Q,{Y}^{T}\right)\ge {0}_{s×1}$, where s is the number of restrictions. As an illustration, consider the case where there is a single shock-sign restriction in period $k,{\epsilon }_{1k}\left(\varphi ,Q,{u}_{k}\right)\ge 0$, as well as the restriction that the first structural shock was the most important contributor to the change in the first variable in period k. Then,

(19) $N( ϕ,Q, Y T )=[ ( Σ tr −1 u k ) ′ q 1 | e ′ 1,n Σ tr q 1 q ′ 1 Σ tr −1 u k |− max j≠1 | e ′ 1,n Σ tr q j q ′ j Σ tr −1 u k | ]≥ 0 2×1$

Traditional sign and zero restrictions can also be imposed alongside NR. We follow AR18 by explicitly allowing for sign restrictions on impulse responses and on elements of A0. We denote such sign restrictions by $S\left(\varphi ,Q\right)\ge {0}_{\stackrel{˜}{s}×1}$, where $\stackrel{˜}{s}$ is the number of traditional sign restrictions. It is straightforward to additionally allow for zero restrictions, so long as these are not over-identifying. These include ‘short-run’ zero restrictions (Sims 1980), ‘long-run’ zero restrictions (Blanchard and Quah 1989), or restrictions arising from external instruments (Mertens and Ravn 2013; Stock and Watson 2018; Aria, Rubio-Ramírez and Waggoner 2021).[10]

## 3.3 Conditional and unconditional likelihoods

When constructing the posterior of the SVAR's parameters, AR18 use the likelihood conditional on the NR holding. Define

(20) $D N = D N ( ϕ,Q, Y T )≡1{ N( ϕ,Q, Y T )≥ 0 s×1 }$
(21) $r( ϕ,Q )≡Pr( D N ( ϕ,Q, Y T )=1| ϕ,Q )$
(22) $f( y T |ϕ )≡ ∏ t=1 T ( 2π ) − n 2 | Σ | − 1 2 exp( − 1 2 ( y t −B x t ) ′ Σ −1 ( y t −B x t ) )$

The likelihood conditional on DN = 1 can be written as

(23) $p( y T | D N =1,ϕ,Q )= f( y T |ϕ ) r( ϕ,Q ) . D N ( ϕ,Q, y T )$

$f\left({y}^{T}|\varphi \right)$ is the joint density of the data given $\varphi$ (i.e. the likelihood of the reduced-form VAR), which depends only on $\varphi$ and the data. The indicator function ${D}_{N}\left(\varphi ,Q,{Y}^{T}\right)$ equals one when the NR are satisfied and is zero otherwise. This determines the truncation points of the likelihood. $r\left(\varphi ,Q\right)$ is the ex anteprobability that the NR are satisfied. This is constant when there are only shock-sign restrictions; for example, if there are s shock-sign restrictions, $r\left(\varphi ,Q\right)={\left(1/2\right)}^{s}$. When there are restrictions on the historical decomposition, this probability depends on $\varphi$ and Q.

Consider the case where $\varphi$ is known, which will be the case asymptotically because $\varphi$ is point identified. When $r\left(\varphi ,Q\right)$ depends on Q, the conditional likelihood is maximised at the value of Q that minimises $r\left(\varphi ,Q\right)$ (within the set of values of Q satisfying the restrictions). The posterior based on this likelihood therefore places higher posterior probability on values of Q that result in a lower ex ante probability that the restrictions are satisfied. As discussed in Section 2.2, this is an artefact of conditioning on a non-ancillary event, which represents a loss of information.

We therefore advocate constructing the likelihood without conditioning on the NR holding. The unconditional likelihood (the joint distribution of the data and DN) is

(24) $p( y T , D N =d| ϕ,Q )= [ f( y T |ϕ ) D N ( ϕ,Q, y T ) ] d ⋅ [ f( y T |ϕ )( 1− D N ( ϕ,Q, y T ) ) ] 1−d =f( y T |ϕ )⋅ [ D N ( ϕ,Q, y T ) ] d ⋅ [ 1− D N ( ϕ,Q, y T ) ] 1−d$

For any value of $\varphi$ such that yT is compatible with the NR, there is a set of values of Q that satisfy the restrictions, which depend on the data, but the value of the unconditional likelihood is the same for any Q in this set. The conditional posterior for Q given $\varphi$ is therefore proportional to the conditional prior in these regions. Given a fixed number of NR, the likelihood has flat regions even with a time series of infinite length, so posterior inference may be sensitive to the choice of conditional prior for Q given $\varphi$, even asymptotically (which is also the case for the conditional likelihood when the restrictions are ancillary). This motivates considering Bayesian procedures that are robust to the choice of conditional prior, which we explore in Section 5.2.

## 3.4 Discussion of assumptions

### 3.4.1 Distributional assumptions

Researchers may be concerned about misspecification with regards to the assumption of standard normal shocks. For instance, one could worry that the periods in which the NR are imposed are ‘unusual’ in the sense that the structural shocks in these periods were drawn from a distribution with, say, different variance or fat tails. The unconditional likelihood depends on the normality assumption only through the reduced-form VAR likelihood, $f\left({y}^{T}|\varphi \right)$. By omitting terms in $f\left({y}^{T}|\varphi \right)$ corresponding to the periods in which the NR are imposed, one can thus conduct inference that is robust to the distributional assumption about the shocks in these particular periods.

To illustrate, consider the case where NR are imposed in period k only and assume the likelihood for yT takes the form

(25) $f ˜ ( y T |ϕ )=v( { y t −B x t } t≠k |ϕ )w( y k −B x k )$

where

(26) $v( { y t −B x t } t≠k |ϕ )= ∏ t≠k ( 2π ) − n 2 | Σ | − 1 2 exp( − 1 2 ( y t −B x t ) ′ Σ −1 ( y t −B x t ) )$

and $w\left({y}_{t}-B{x}_{k}\right)$ is an unknown, potentially non-normal, density. Replacing $f\left({y}^{T}|\varphi \right)$ in Equation (24) with $v\left({\left\{{y}_{t}-B{x}_{t}\right\}}_{t\ne k}|\varphi \right)$ yields an ‘unconditional partial likelihood’ that does not depend on the distribution of ${\epsilon }_{k}$, but is still truncated by the NR. This would potentially result in a loss of information relative to a likelihood that correctly specifies the distribution of the shocks in period k. However, when NR are imposed in only a few periods, this loss is likely to be small. In contrast, when using the conditional likelihood, the distribution of the structural shocks must be specified in all periods to be able to compute $r\left(\varphi ,Q\right)$.

Concerns about misspecification may also be alleviated by recognising that the distributional assumption is irrelevant asymptotically. The set of values of Q with non-zero unconditional likelihood depends only on $\varphi$ and the realisation of the data in the periods in which the NR are imposed. Under regularity assumptions, the likelihood (and thus the posterior) of $\varphi$ will converge to a point at the true value of $\varphi$ asymptotically regardless of whether the true data-generating process is a VAR with homoskedastic normal shocks.[11] The set of values of Q with non-zero likelihood will therefore converge asymptotically to the same set regardless of whether the distributional assumption is correct.

### 3.4.2 Mechanism generating NR

In line with the existing literature, we do not explicitly model the mechanism responsible for revealing the information underlying the NR (i.e. whether DN =1 or DN =0) or the mechanism determining the periods in which this information is revealed (e.g. the identity of k in examples above). If the revelation of this information depends on the data, the likelihood will be misspecified. The exact implications of this misspecification for identification or inference will depend on assumptions about the mechanism revealing the narrative information. Exploring the consequences of such misspecification may be an interesting area for further work. In the bivariate model of Section 2, if the identity of k is randomly determined independently of ${\epsilon }_{1},...,{\epsilon }_{T}$, we can interpret the current analysis conditional on k.

## Footnotes

The VAR(p) is invertible into a VMA $\left(\infty \right)$ process when the eigenvalues of the companion matrix lie inside the unit circle. See Hamilton (1994) or Kilian and Lütkepohl (2017). [7]

Ch can be defined recursively by ${C}_{h}={\Sigma }_{l=1}^{\mathrm{min}\left\{h,p\right\}}{B}_{l}{C}_{h-l}\text{\hspace{0.17em}}$ for $h\ge 1$ with C0 = In. In practice Ch can be computed using the companion form of the VAR. [8]

Ben Zeev (2018) imposes a restriction on the timing of the maximum three-year average of a particular shock, as well as restrictions on the sign and relative magnitudes of this three-year average in specific periods. Restrictions on averages of shocks can also be implemented in this framework. An earlier version of our paper considered the restriction that the shock in a particular period was the largest (absolute) realisation of the shock in the sample period; see also Read (2022a). [9]

GK21 explicitly allow for zero restrictions in their robust Bayesian analysis of set-identified SVARs. Giacomini et al (2022b) extend this to proxy SVARs. Read (2022b) imposes sign, narrative and zero restrictions within our robust Bayesian framework. [10]

See Plagborg-Møller (2019) for a discussion of this point in the context of structural VMA models. [11]