RDP 2023-02: Did Labour Market Concentration Lower Wages Growth Pre-COVID? 3. Data

For the analysis, I use the LEED discussed in Appendix A of Andrews et al (2019). This dataset links the near universe of employees and their wages to employers (via annual Pay-As-You-Go PAYG statements). It also contains demographic information (for example, location) and information about the firm such as industry, revenue, expenses and employment. The dataset is highly representative and the number of workers covered in each industry lines up well with ABS employment releases (see Appendix A).

I focus on the period 2005–2016 due to the availability of job-level location data. I focus on the market sector, removing the public administration, health and education industries.

The literature tends to focus on concentration within ‘local’ labour markets. These are defined as the intersection of a location and an industry or occupation. Accounting for location is important given it can be hard or costly for workers to move between areas. Similarly, many jobs have occupation- or industry-specific skills that prevent workers moving easily between occupations or industries.

For locations, I use working zones constructed in BITRE (2016). These are constructed using Census data on the home and work locations of workers and are designed to capture local areas within which people tend to both work and live. This makes them an ideal location definition. The results are robust to using alternate geographies such as SA4, or SA4 and Greater Capital City regions (Appendix A and B).

For industry, I use ANZSIC 2006 3-digit industries. I use industry rather than occupation data as it is available on a job-level. Occupation data are only reported annually on an individual level. Moreover, other analysis has shown that people may be slow to update their occupations in their tax reporting. The results of the analysis are robust to using occupation data, as well as less granular ANZSIC categories (Appendix A and B), consistent with findings in other papers (Handwerker and Dey 2022). However, given the lower quality of the occupation data, these results are not the focus of the paper.[4]

Based on these definitions, I have around 190 industries, 290 working zones, and about 25,000 local labour markets per year.

My unit of observation is a plant. This is the intersection of a firm and a locality.[5] So, if a large chain has stores in two different working zones, these will be considered two different plants. This is a key advantage of the LEED. Most firm-level datasets will only record one location for a firm. However, based on this dataset I find that around half of all plants belong to multi-plant firms - that is firms with workers and operations in multiple locations. And with the LEED, I can accurately allocate the workers to these different locations.

My measure of concentration is the Herfindahl-Hirschman Index (HHI), defined as:

HH I market,t = plantMarket ( Plantheadcountorwagebil l plant,t Marketheadcountorwagebil l t ) 2

This is a standard measure of concentration used in the literature. The HHI is bounded from 0 to 1. Low levels of the HHI indicate low levels of concentration (lots of small firms). High levels of the HHI indicate a highly concentrated market, with an HHI of 1 showing that there is only one employer.

Most papers focus on an employment-based measure of concentration. However, Berger et al (2021) argue that wage-based measures are more relevant. For example, wage-based measures are likely to be less impacted by part-time workers or worker turnover. As such, I analyse both employee- and wage-based concentration measures, but mainly focus on an employment-based measure for comparability to the literature (see Appendix A and B for wage-based concentration results). Results are robust to this choice.

As discussed below, in the regression analysis I control for productivity. Unfortunately, this can only be modelled at the firm level, rather than plant level. For these regressions I focus on companies and exclude 2016 due to data constraints. I measure productivity as the ratio of value-added to employees. Value-added is defined as income less all expenses other than depreciation, interest expenses and wage expenses, which are not subtracted.

In these regressions, wages are measured as total wages in the local market, or plant, divided by the number of workers.


An extension would be to use data-driven industry groupings based on flows as in Jarosch et al (2019). [4]

Where firms have complex tax reporting structures, I collapse them down to a single reporting unit. [5]