# New Capabilities using the TARS 
This folder contains all the source code related to *Section 4 - New Capabilities* of the paper An AI-powered Tool for Central Bank Business Liaisons: Quantitative Indicators and On-demand Insights from Firms. 

In the paper, we show how we can use the data from a Business Liaison TARS to create useful textual measures that (in the associated paper) are used in real economic analysis (see `nowcasting` folder for empirical exercise). Specifically, the code in this folder allows for the new capabilities:
* Taking a dictionary of words to build textual measures of topic exposure and tone.
* Using the output of a LM to build textual measures of topic exposure and tone.
* Taking a dictionary of uncertainty words to build a measure of uncertainty.
* Using the extracted numerical quantities to create aggregate series that can reflect economic measures (such as Australian CPI and WPI).  

> Note The demo data used in this repo has been generated by ChatGPT to broadly reflect the type of text extracted and enriched in Business Liaison TARS database described in the paper. However, given this data is artifical in nature, the output will not exactly reflect measures shown in the paper. Using this data to build textual indices and numerical extraction quantities will also not aggregate to reflect actual economic data as it is only intended to demonstrate how the process works in practice.

# Code Structure
```
Repo
├── Data
│   ├── rdp-2025-06-graph-data.xlsx (graph data to plot figures in paper)
│   ├── dictionary.xlsx (dictionaries for topic indices)
│   ├── LM_dictionary.xlsx (guide for LM-based topic indices)
│   ├── uncertainty_dict.xlsx (dictionary for uncertainty index)
│   └── liaison.sqlite (TARS containing liaison-like text data for code demo)
└── Capabilities
    ├── Capabilities.Rproj (R project file to ensure working in correct directory)
    ├── requirements.txt (contains package versions required for replication)
    ├── Dictionary_based_indices.R (topic exposure and tone using dictionary)
    ├── LM_based_indices.R (topic exposure and tone using LM)
    ├── Uncertainty_index.R (uncertainty using dictionary)
    ├── Numerical_extraction_indices.R (numerical quantity extraction using LMs)
│   ├── Plot_measures.R (script to plot figures from paper)
    └── data_gen_utils.R (contains utility functions for generating textual indices)
```

# Getting Started
### Quick Start
1. Open `Capabilities.Rproj` to ensure you are working in the correct directory for running each of the scripts. 
3. Open the R script for the relevant measure and run the code.
4. After running each script, the output files should appear in `../Data/`

### Requirements
For R and package versions of the original paper, see `requirements.txt`. Check your R package versions using `packageVersion("<package_name>")`. Also ensure the liaison database (`liaison.sqlite`) and the dictionary/guide for the measure you wish to create are in the `../Data/` directory. The program **RStudio** is needed to open and use the `.Rproj` file in each of the subfolders. In our work we used version 2024.04.0. For more information, see https://www.posit.co.

### Dictionaries and Guides
To change the type of topic exposure and tone measure extracted, adjust or append the `dictionary.xlsx` file for dictionary-based or `LM_dictionary.xlsx` for LM-based measures. Similarly for the uncertainty measure, edit `uncertainty_dict.xlsx` to adjust the words used to measure firm uncertainty. 

> **Note:** It is important to retain the template structure of each dictionary and guide. This means not changing the keyword and qualifier column names or orders (including for the topic exposure tabs). Also ensure that the names of each tab follow the same template as *Labour* and *Wages* in the example dictionary. 

### Numerical Extraction
The `Numerical_extraction_indices.R` will extract numerical quantities for wages and prices by default. If you wish to extract other quantities from textual data, you will first have to run the numerical extraction code in the `backend` folder. Once you have a new column of extracted numerical quantities, copy and/or replace the code for prices/wages with the new numerical quantity column name. The code will automatically create an aggregate measure for this numerical quantity. 

The demo data used in this repo has been generated by ChatGPT and so aggregate measures will not reflect real measures such as CPI and WPI. The number of usable observations for wages is also very small in this demo data, but it will at least allow for testing the code in your personal environment.