Economic productivity loss due to breast cancer in Norway – a case control study using the human capital approach

We estimate the productivity loss attributable to breast cancer in Norway, using a human capital approach where the value to the national economy of subjects’ labor is estimated through their income. Since the approach takes the viewpoint of the society, it includes reduced labor due to mortality as well as reduction among survivors. A case-control design was used. The Norwegian Cancer Registry identified 2,010 patients aged 45-54 years, with breast cancer as their first life-long malignancy diagnosis in 1992-1996. Statistics Norway matched these with random controls of the same age, marital status and municipality residency, and provided data on pension qualifying income until the year 2005, inflation adjusted to 2012 €. The effect of the cancer diagnosis on income throughout the follow-up period was estimated as the difference between the cases’ and controls’ income, corrected for systematic differences between cases and controls before the diagnosis. A quadratic curve approximation was used to estimate effects after the 13 years follow-up period, and a bootstrap resampling approach was applied to compute confidence intervals (CI). Regression modeling was used to estimate life-long productivity loss as a function of age at diagnosis. This was combined with the current Norwegian age distribution of breast cancer incidence, to estimate the national productivity loss due to breast cancer. For our cohort, the 13-years productivity loss attributable to the diagnosis was estimated to 102,600 € per case, with 95% CI (88,500, 116,700). The life-long estimate was 119,200 €, CI (95,400, 155,600). The annual national productivity loss was estimated to 179,900,000 €, or 58,200 € per case. For those aged < 65 at diagnosis, the estimate was 94,300 € per case. The estimated life-long productivity loss depends heavily on the age at diagnosis. The results can be used for evaluating the societal economic benefit from breast cancer prevention programs. JEL classification: I18


Introduction
Cancer has an impact not only on affected patients and their families, but also on the national economy, through reduced supply of labor.In the present article, we study this effect for breast cancer specifically, which affects many women in age groups with high levels of employment (Kreftregisteret, 2016).Our aim is to quantify the negative effect that breast cancer mortality and morbidity have on the total productivity of the Norwegian economy, through reduction in available labor.It is important to quantify this macro-economic aspect of the burden of the disease, because it can inform the government on the long term indirect economic gain from preventing new cases.Mammography screening programs may be one such application, where the economic benefit from detection and treatment of pre-cancer conditions may be quantified.The welfare gain for the population, in terms of longer and healthier lives, should be the main goal of such programs, but in a cost-effectiveness framework, economic benefit for the society can be interpreted as a reduction in the cost of a prevention program.The effects of breast cancer on macro-economic productivity has previously been studied in other Western-European countries, including Sweden (Lidgren et al., 2007), Flanders (Broekx et al., 2010) and Ireland (Hanly et al., 2012).These studies show that productivity loss is a substantial effect of breast cancer, which makes the issue and important one.Given that the Norwegian labor market and general economy differ in relevant ways from those countries (see e.g.statistics on unemployment (Eurostat, 2016)), the present study is well motivated.
We apply the human capital approach (HCA), which views human labor as an input into the production process (Berger,201).Assuming there is a well-functioning labor market, the wages that are paid reflect the market price of labor, and therefore its value.The labor that a person delivers contributes to the production in the economy, and the value of this additional production is assumed to be equal to the person's income.In the words of Berger (2001): "… the measurement of indirect costs is based on the premise that the value of an individual's work and its contribution to society is measured in terms of the person's potential income generation."Hanly et al. (2012) phrase it this way: "This approach encompasses a societal perspective and estimates an individual's contribution to society by applying labor force earnings as a measure of productivity."Hence, when a woman's employment situation is affected by breast cancer, we assume that there is a loss in production, given by her loss in income.In order to quantify the total value of the national production that is lost due to the disease, we estimate the sum of the affected women's income reduction.This reduction will typically last many years after diagnosis, including post mortem if the patient dies prematurely.
The HCA assumption, that the impact of lost labor on production can be estimated through the wages that would have been paid, is not universally accepted.The main alternative is the friction cost approach (FCA), which defines the productivity loss of a lost employee through the time and effort it takes to hire and train a replacement.As pointed out by Hanly et al. (2014), each method has its strengths and limitations.In an economy with a high level of unemployment in all parts of the labor market, the FCA is indicated, since lost labor can readily be replaced, and the HCA will overestimate the impact on productivity.However, even with a high level of unemployment, the FCA method will tend to underestimate the true effect, as a replacement employee will sometimes be recruited from a different job.This generates friction costs for the previous employer, which may in turn hire someone who was also previously employed, and so on.Even when a previously unemployed person is hired, the FCA method may underestimate the cost of the given cancer case, since the employed person might have created his or her own businesswithin the time span of the subject's counterfactual career without cancer.These knock-on effects of lost labor are hard to estimate within an FCA framework.In principle, HCA may be more reasonable than FCA, since it values labor according to its market price, rather than as a resource with zero value in itself.For the present study, we find HCA particularly well motivated, since the unemployment rates in Norway have been low from 1992 until the present (2016), with levels ranging from 2.3% to 6.8%.This is consistently below the levels of Sweden, Ireland and Belgium, for the same period (Eurostat, 2016).The Norwegian level has been approximately half that of the European Union since the beginning of that time series in year 2000 (Eurostat, 2016).The assumption of low unemployment is therefore reasonable for the present study, and more so than for the studies with which we compare our results.Note that in this article, the term productivity is used only on the aggregate level of the nation's production per year.We do not study the impact that the disease has on the individual patients' productivity in the way of work efficiency or ability to produce per unit time.
The present study contributes with productivity loss estimates for a Norwegian breast cancer population, which has not previously been studied, using novel analysis techniques and high quality registry data.

Data
The study is based on data from two nationwide Norwegian registries, and their merging was possible due to the unique personal identity number given to all Norwegian citizens.The Cancer Registry of Norway collects data on all new cases of cancer based on statutory registration requirements for all hospitals, laboratories and general practitioners.The diagnostic data from the registry are close to complete and of high quality (Larsen et al., 2009).Income data were acquired from the Event Database of Statistics Norway.

2.1
Sampling of patients and cancer-free controls Eligibility criteria for identification of patients in the Cancer Registry were women aged 45 to 54 years diagnosed with breast cancer as the first lifetime malignancy between 1992 and 1996.The age span 45-55 was chosen because breast cancer incidence and workforce participation are high for these ages.The five years of recruitment was chosen to get a reasonable follow-up time; the age span assured a high proportion of employed women, as the formal retirement age in Norway is 67 years.The Cancer Registry identified 2,052 women fulfilling these criteria.Cancer-free controls matched for age, marital status and municipality residence at inclusion were drawn randomly from the Norwegian population registry by Statistics Norway, one control per patient.The controls were cancer-free, based on checks in the Cancer Registry.Throughout the article the term "year of diagnosis" is used also for controls, referring to the year of breast cancer diagnosis for their matched case.Among 4,104 women in the sample, 14 did not have a registered income and 22 had missing income data in some years during the study period.These 36 women were excluded from the analyses together with their matched case/control.In addition, six controls died before the time of cancer diagnosis and were excluded together with their matched case.Thus, the sample consisted of 4,020 women; 2,010 breast cancer patients and 2,010 matched controls.The cohorts of 1992 to 1996 contained, 342 (17.0%), 362 (18.0%), 373 (18.6%), 436 (21.7%), and 497 (24.7%) pairs, respectively.

Income
The variable pension qualifying income, in the following referred to only as income, is used in this study.It is defined as personal income excluding transfers of social security benefits, like sick-leave, unemployment benefits, work assessment allowance or disability pension.Income data were acquired for the years 1990-2005 for all cases and controls.Hence, the follow-up time varied between nine and 13 years depending on the year of diagnosis (1992)(1993)(1994)(1995)(1996), and the pre-diagnosis period varied between two and six year.

2.3
Socio-demographic characteristics The present study was based on income data and age only.The socio-demographic variables of marital status and municipality residence entered the analysis implicitly through the matching procedure.The socio-demographic variables in Table 1 (previously published by Šaltytė Benth et al. ( 2013)) were not included in the present analysis.
These descriptive statistics of cases and controls show statistically significant differences between the groups for all variables, in line with epidemiologic research (Feller, 2017).Except for mortality, the magnitudes of the differences are relatively modest.

Statistical analyses
The statistical analysis consists of a sequence of steps.In Section 3.1 we adjust for income differences between cases and controls at baseline, in order to make the subsequent income development of the two groups comparable.In Section 3.2 we estimate the average difference in annual income for cases and controls, as a function of years after diagnosis.
These estimates pertain to the specific cohort of patients aged 45-54 years at diagnosis, and the 13 years of follow-up time.In Section 3.3 we extend the analysis of the cohort beyond the follow-up period by extrapolating life-long income differences.In Section 3.4 we estimate income differences outside of the cohort.For this we develop a regression model, which represents the impact that age at diagnosis has on the relation between years since diagnosis and income differences.This enables us to estimate the total life-long income loss, as a function of age at diagnosis, and in Section 3.5 we combine this with age-dependent incidence rates, giving an estimate of the national productivity loss attributable to breast cancer.

Controlling for case-control baseline differences
The income variable was adjusted for the Norwegian consumer price index (CPI) for 2012, and converted to €.For each case, we use the income of the control as an estimate of her counterfactual income.However, as shown in (Šaltytė Benth et al., 2013), the matching procedure resulted in systematic differences with cases having higher average income than controls, prior to diagnosis.This may be due to confounding socio-economic factors that were not used in the matching procedure.Since income differences prior to the diagnosis are unlikely to be due to the disease, it must be adjusted for in the analysis.We therefore computed a factor:

𝑓 =
Average income for cases the year before diagnosis Average income for controls the year before diagnosis The average income for cases and controls prior to the diagnosis were 27,200 € and 24,900€, respectively, giving  = 1.092.All income data for the controls, throughout the follow-up period, were rescaled by multiplication with f.In this way we controlled for the fact that the cases had 9.2% higher income than the controls prior to diagnosis.Note that the 9.2% number includes any differences in employment rates, overtime work and fullversus part-time employment between cases and controls, and cannot be interpreted as difference in hourly pay.
The key assumption is that the rescaled income of the controls gives unbiased estimates of the counterfactual income that the cases would have had after the year of diagnosis, without cancer.This assumption cannot be tested directly, since the counterfactuals by definition are unobserved.We can, however, make a relevant test backwards in time, by comparing the income of the cases to the rescaled income of the controls prior to the diagnosis.Our data set contains income data from 1990 and onwards, so we have two years history on the entire cohort, and six years for those diagnosed in 1996.These statistical tests turn out favourably, as there are no statistically significant differences between the cases' income and the controls' rescaled income two, three, four, five or six years before the diagnosis.In other words, if we pick a reference year more than one year before the diagnosis, the rescaling procedure gives comparable income development of the two groups up to the year of diagnosis.It is therefore reasonable to assume that they would have followed the same track further, had there been no cancer diagnosis.
The rescaled incomes for the controls were used for all the subsequent analysis, so all references to the control group's income in the rest of the article are to the rescaled values.

3.2
Average income loss by follow-up year At 0 to 13 years after diagnosis, the difference in average income between cases and controls was calculated.Figure 1 shows the average income development for cases (solid curve) and controls (dashed curve) as a function of follow-up years.The vertical dashed line shows the year prior to the diagnosis, where the average income for cases and controls are identical, since this is the baseline year for the rescaling procedure.To the left of this line, the graph shows the income developments prior to the reference year, for cases and controls.Due to the sampling procedure, the number of observations is lower for each year prior to the diagnosis, which can explain the fluctuations in the curves in the left hand side of the diagram.The trend from the earlier period extends to the year of diagnosis, where the income difference is close to zero.Thereafter, the case group falls behind, and after about 10 years generate an annual income of approximately 10.000 € less than the control group.Keep in mind that the calculations include zero income for those who die, so the figures cannot be interpreted as average wages or workforce participation among survivors.Toward the end of the follow-up period the gap narrows, probably due to retirement.

3.3
Estimating life-long income loss The area between the curves in Figure 1 is interpreted as the total income loss due to breast cancer over the 13 years follow-up time, which equals 102,600 €.Statistically, it is nontrivial to estimate a confidence interval (CI) for this sum, due to the strong correlation of income over time for a given subject.Also, the fact that the sample size varies, with fewer data points toward the end of the follow-up period, makes an analytical estimation approach impractical.We therefore implemented a bootstrap method (Efron, Tibshirani, 1993), which is known to give reliable CI estimates.A set of 1,000 replicated data sets of identical size as the original one was generated, by drawing case-control pairs randomly with replacement.The full analysis described above was then applied for each replicated data set, giving 1,000 estimates of the average 13-years income loss.The 2.5 and 97.5 percentiles of this distribution were used as the end-points of the 95% CI, giving (88,500 €; 116,700 €).
Most of the income loss is likely to take place within the 13 years follow-up period, due to retirement, but we also provide an estimate of the life-long income loss.We do this by fitting a quadratic curve to the annual income loss estimates, and computing the righttail area under this curve, as illustrated in Figure 2. The bars show the average annual income loss estimates, as a function of follow-up years.It is calculated as the difference between the income curves for cases and controls in Figure 1.The year before the diagnosis it is zero by definition, due to the rescaling of the controls' incomes.The bars end at 13 years, which is the end of the follow-up period, and the sum of these bars is the 102,600 € as given above.The curve shows the quadratic polynomial approximation, which follows the estimated values up to 13 years reasonably well.The grey right-tail area shows the estimated income loss after the follow-up period, by extending the quadratic trend.This area corresponds to 16,500 €, which gives a life-long estimated income loss of 119,200 €.A 95% CI for this estimate based on 1000 replicated bootstrap data sets was (95,400 €; 155,600 €).

3.4
Life-long income loss by age at diagnosis The preceding analysis gives estimates of average life-long lost income for our cohort of woman aged 45-54 years at diagnosis.Although these results are interesting in their own right, our final goal is to estimate the impact of breast cancer on the total national economy.Most patients contract the disease at an older age, and the lifelong lost labor is likely to depend heavily on the women's age at diagnosis.We therefore need to extend our analysis to other age groups, and we do this through regression, using the following model: The index i represents a case-control pair, and j represents a calendar year.Each combination of a pair i and a year j that is present in the dataset represents a data point for the regression.The variable   is the number of years since diagnosis for case i in year j.The variable   gives the age at diagnosis for case i. (Note that this does not depend on the calendar year of the given data point.) The dependent variable   is the income difference for pair i in year j.The first predictor is time since diagnosis, while the second one is an interaction term between time since, and age at, diagnosis.The last predictor is the square of the time since diagnosis.This functional form was chosen, because it models the income difference as a quadratic polynomial in time since diagnosis, which fit the data over all in Figure 2. If we suppress the indices, and rearrange the terms, the regression equation looks like this: The patient's age at diagnosis enters the calculation by modifying the shape of the tpolynomial, which is what we need in order to use our model for predicting how income differences will vary with age at diagnosis.Note that the functional form ensures that the predicted income difference in the year of diagnosis is zero ( = 0 ⇒  = 0), which is reasonable, and confirmed by the analysis in Section 3.2.Usually, when an interaction term is included in a model, one would also include the linear terms, which in this case would mean including not only   , but also   as a separate predictor.We avoid this, since it would violate the constraint  = 0 when  = 0.The constant term is avoided for the same reason.
Figure 3 shows the resulting quadratic curves when the age at diagnosis varies from 45 (highest curve) to 55 (lowest curve).The area under these curves gives the estimated life-long income loss for the different ages at diagnosis.The time  0 when the predicted income loss becomes zero for age a at diagnosis is given by  0 () = −( 1 +  2 )  3 ⁄ , and the life-long income loss is the integral of y up to  0 : Figure 4 shows the life-long income loss estimates as a function of age at diagnosis, where each dot corresponds to the area under one of the curves in Figure 3.The figure also shows 95% confidence bands, based on 1000 bootstrap samples.We see that the life-long income loss estimates decline with increasing age at diagnosis, which is reasonable, since older age means fewer years left in employment.
In Figure 5 we show how the regression analysis extrapolates to ages above 54.Such extrapolations must always be interpreted carefully, but in this case the results seem reasonable, as the income loss falls quickly when the age at diagnosis approaches the standard retirement age of 67.The model predicts a very small effect even after this, which fades away completely around the age of 80.

3.5
Impact on the total national economy We combined the age-dependent income loss estimates with age-dependent incidence estimates from 2014 (Cancer Registry of Norway, 2015).This amounts to multiplying the number of incident cases for each age with the corresponding life-long income loss, summed over all ages.Recall that () gives the estimated life-long income loss for a woman aged a at diagnosis, and let () be the number of new cases a year, of age a. Then the total (future) income that is lost due to cancer cases that are diagnosed through a given year is estimated to: The sum starts at 15, which is the age of the youngest registered breast cancer case in the incidence data, and ends at 83, at which age L falls to zero.Note that we count the future losses at the year of diagnosis.This is equivalent to fixing a year and summing the loss for woman diagnosed any number of years prior to that, but more computationally convenient.The calculated LOSS was 219,200,000 €, which is interpreted as the loss in national productivity, according to the HCA assumptions.The total number of cases in 2014 was 3,098, implying an average income loss of 70,800 € per case.If we restrict this analysis to women aged < 65, the total loss estimate is 211,900,000 €, so they represent 97% of the productivity loss.
Figure 5 shows that the model predicts a steep increase in life-long income loss when we move to younger age groups.This is reasonable, since these groups have a longer expected career at risk, but there is a considerable uncertainty in extending a steep trend out of sample.We have therefore experimented with applying the 45 years estimate to ages 44 and below, as a sensitivity analysis.This gave a national productivity loss of 179,900,000 €, or a 15% reduction, which may be viewed as a lower bound of our estimate.

Discussion
The average estimated life-long income loss from a breast cancer case is estimated to 119,200 € for our cohort of women.This depends heavily on the women's age at diagnosis (45 to 55 in our cohort), which is confirmed by our regression analysis.The average income loss per case is estimated to 70,800 € over all, and 115,700 € if we restrict to women aged < 65 at diagnosis.The total annual productivity loss due to breast cancer in Norway is estimated 219,200,000 €.The age-dependent productivity loss estimates may be used for evaluating the societal economic benefit from breast cancer prevention programs, including mammography screening.
The Irish study (Hanly et al., 2012) reports an average life-long HCA-based income loss of 193,425 € per breast cancer case, which is substantially higher than our estimate.However, the Irish study was based on respondents from a postal survey, with 54% response rate, and of these, only those employed at the time of diagnosis were included in their sample of 250 breast cancer cases.They reported that approximately 50% of the respondents were unemployed, and therefore excluded from the analysis, which likely accounts for the difference between their and our results.The Swedish study (Lidgren et al., 2007) gives a national estimate of the annual total productivity loss due to breast cancer of 223 Million 2002 €.Inflation adjusted to our reference year of 2012, this is 275 Million €.They reported an incidence number of 6,623, which gives 41,500 € per case, which is only 58% of our 70,800 € estimate.Broekx et al. (2010) give an estimated productivity loss of 95,600 € over five years following the diagnosis in Flanders, Belgium, which is substantially higher than our within-cohort 5-year estimate of 28,900 €.These estimates are not directly comparable, however, since the Flanders study included the value of unpaid work including housekeeping activities, which are not included in the present study.
The main strength of the present study is the use of national registries providing a large data set of high quality without subjective reporting or self-selection.Our approach captures not only mortality effects and effects of absence due to treatment and illness, but also indirect effects like how the cancer diagnosis affects the patients' career development and salary.The quadratic polynomial approximation of the tail of the productivity loss was a pragmatic choice that appears to fit the data adequately.The 13 years follow-up is a sufficiently long period to cover most of the life-long effects, so that any bias in this approximation is likely to be small.The bootstrap approach allows for estimation of confidence intervals, which appears to be novel in this field.The regression modeling of income loss as a function of age at diagnosis and time since diagnosis produced reasonable estimates.The same goes for the resulting life-long income loss estimates as a function of age at diagnosis, which approached to zero for ages around the standard age of retirement, and faded out completely around the age of 80.
A potential weakness of our study is the assumption that the rescaled income development of the control group gives valid estimates of the income development that the case group would have experienced, had they not developed cancer.This assumption is fundamentally untestable, and it cannot be denied that the two groups differ with respect to attributes that are related to income.As pointed out in (Šaltytė Benth et al., 2013)), parity is one such variable, since child birth is known to correlate with employment as well as risk of breast cancer.Other studies indicate that socio-economic status (Clegg et al., 2009) and alcohol consumption (Beral et al., 2002) also correlate with breast cancer risk, and may be relevant confounding factors.In retrospect, it might have been wise to include such sociodemographic variables in matching of controls to cases, possibly at the expense of exact municipality match.It would not have been useful to include such confounders in our regression model, however, since the national incidence data are not stratified by these variables.A model with these confounders could therefore not be combined with incidence data to give more accurate national productivity estimates.Our key assumption is that the confounding variables mainly affect the level of average income, and that the two groups would have had similar income development percentagewise, were it not for the cancer.This assumption is supported by the fact that the two groups had similar percentagewise income development in the six years prior to the diagnosis.Since the time of diagnosis varies over a five years period, the abrupt change in relative income development after diagnosis cannot be attributed to a period effect.Also, the cases' age at diagnosis varies from 45 to 54 years, so the abrupt change cannot be attributed to an age effect, either.
Another error source is related to repeated malignancies, as the data set is restricted to first-time malignancies, while the incidence estimates are not.This means that our national productivity loss estimates may be inflated.The effect is likely to be small, however, since repeated malignancies mostly occur at an older age, when employment rates are lower.
In 2002 the definition of pension qualifying income was modified to also include rehabilitation benefits, and in 2004 temporary disability benefits were also included.These changes are not likely to have a large impact on our analysis, since they are present only for a small part of the time series, and apply to both cases and controls (although probably to different degree).Surviving cancer patients are likely to receive more support of this kind than controls, but the increased mortality of the cases gives the opposite effect, so the sign of this error source is not clear.
The main weakness of the study may be related to extrapolations between time periods, as the cancer diagnoses dates back to 1992-1996, while the follow-up period stretches to 2005.Breast cancer incidence, morbidity and mortality have changed over the latest decades (Kreftregisteret, 2016), which will have affected the present-day productivity loss to some degree.The income data were inflation adjusted through the Norwegian consumer price index to the reference year of 2012, and then from NOK to € through the 2012 exchange rate.The results will therefore be sensitive to inaccuracies in the inflation adjustment, and fluctuations in exchange rates.
It may also be considered a weakness that we only consider production loss related to the patient's paid work.One might also consider the societal value of un-paid work, as was done in the Flanders study.Also, relatives of the cancer patient are likely to carry some of the burdens of care, and may therefore reduce their own participation in the labor market.Such indirect effects are not covered by our study.
The age-dependent income loss estimates are not intended for use in individual treatment decisions of patients, which would be ethically problematic.However, they can be used to estimate the effect that breast cancer treatment or prevention programs may have on the national economic productivity.In particular, the productivity loss estimates may be useful in evaluations of the cost-effectiveness of mammography screening programs.

Figure 1 :
Figure 1: Income development by follow-up years, for cases and controls.

Figure 2 :
Figure 2: Estimated annual income loss per cancer case, with quadratic curve estimation of life-long estimate.

Figure 3 :
Figure 3: Income loss as a function of years since diagnosis for ages 45 (highest curve) to 55 (lowest curve) at diagnosis.

Figure 4 :
Figure 4: Estimated life-long income loss as a function of age at diagnosis.

Figure 5 :
Figure 5: Extrapolation to higher age groups.