Estimating the effect of the 2005 change in BCG policy in England: a retrospective cohort study, 2000 to 2015

Background In 2005 in England, universal Bacillus Calmette–Guérin (BCG) vaccination of school-age children was replaced by targeted BCG vaccination of high-risk neonates. Aim Estimate the impact of the 2005 change in BCG policy on tuberculosis (TB) incidence rates in England. Methods We conducted an observational study by combining notifications from the Enhanced Tuberculosis Surveillance system, with demographic data from the Labour Force Survey to construct retrospective cohorts relevant to both the universal and targeted vaccination between 1 January 2000 and 31 December 2010. We then estimated incidence rates over a 5-year follow-up period and used regression modelling to estimate the impact of the change in policy on TB. Results In the non-United Kingdom (UK) born, we found evidence for an association between a reduction in incidence rates and the change in BCG policy (school-age incidence rate ratio (IRR): 0.74; 95% credible interval (CrI): 0.61 to 0.88 and neonatal IRR: 0.62; 95%CrI: 0.44 to 0.88). We found some evidence that the change in policy was associated with an increase in incidence rates in the UK born school-age population (IRR: 1.08; 95%CrI: 0.97 to 1.19) and weaker evidence of an association with a reduction in incidence rates in UK born neonates (IRR: 0.96; 95%CrI: 0.82 to 1.14). Overall, we found that the change in policy was associated with directly preventing 385 (95%CrI: −105 to 881) cases. Conclusions Withdrawing universal vaccination at school age and targeting vaccination towards high-risk neonates was associated with reduced incidence of TB. This was largely driven by reductions in the non-UK born with cases increasing in the UK born.


Imputation of UK birth status
As we were imputing a single variable we reformulated the imputation as a categorical prediction problem. This allowed us to use techniques from machine learning to improve the quality of our imputation, whilst also validating it using metrics supported by theory. We included year of notification, sex, age, Public Health England Centre (PHEC), occupation, ethnic group, Index of Multiple Deprivation (2010) categorised into five groups for England (IMD rank), and risk factor count (risk factors considered; drug use, homelessness, alcohol misuse/abuse and prison). However, we could not account for a possible missing not at random mechanism not captured by these covariates. To train the model we first split the data with complete UK birth status into a training set (80%), a calibration set (5%) and a test set (15%). We then fit a gradient boosted machine with the 10000 trees, early stopping (at a precision of 1" − 5, with 10 stopping rounds), a learning rate of 0.1, and a learn rate annealing of 0.99. Gradient boosted machines are a tree based method that can incorporate complex non-linear relationships and interactions. Much like a random forest model they work by ensembling a group of trees, but unlike a random forest model each tree is additive aiming to reduce the residual loss from previous trees. Once the model had been fit to the training set we performed platt scaling using the calibration data set. Our fitted imputation model had a Logloss of 0.28 on the test set, with an AUC of 0.93, both of which indicate a robust out of bag performance. We found that ethnic group was the most important variable for predicting UK birth status, followed by age and PHEC.
Using the fitted model we predicted the birth status for notifications where this was missing, using the F1 optimal threshold as our probability cut-off. It is common to impute missing values multiple times, to account for within-and between imputation variability. However, we considered this unnecessary for our analysis as the amount of missing data was small, our analysis considered only aggregate counts, our model metrics indicated a robust level of performance out of bag and any unaccounted for uncertainty would be outweighed by the uncertainty in our population denominator [10]. We found that cases with imputed birth status had a similar proportion of UK born to non-UK born cases as in the complete data (Supplementary Table S6). Inclusion of imputed values for UK birth status should reduce bias caused by any missing not at random mechanism captured by predictors included in the model. Graphical evaluation of UK birth status indicated that missingness has reduced over time, indicating a missing not at random mechanism. If only the complete case data then incidence rates would have reduced over the study period due to this mechanism, this may have biased our estimate of the impact of the change in policy.

Prior choice
Default weakly informative priors were used based on those provided by the brms package. For the population-level effects this was an improper flat prior over the reals. For both the standard deviations of group level effects and the group level intercepts this was a half student-t prior with 3 degrees of freedom and a scale parameter that depended on the standard deviation of the response after applying the link function.

Estimating the magnitude of the estimated impact of the change in BCG policy
We estimated the magnitude of the estimated impact from the change in BCG policy by applying the IRR estimates from the best fitting model for each cohort to the observed number of notifications from 2005 until 2015 in our study population. For the cohorts relevant to the universal school-age vaccination scheme we estimated the number of prevented cases by first aggregating cases (% & ) and then using the following equation, Where % ' ( is the predicted number of cases prevented using the mean ("), 2.5% bound (5) and 97.5% bound (6) of the IRR estimate , ( . For the cohorts relevant to the targeted high-risk neonatal scheme we used a related equation, adjusting for the fact that the populations were exposed to the scheme and we therefore had to first estimate the number of cases that would have been observed had the scheme not been implemented. After simplification this results in the following equation, % ' ( = % 8 9 1 , ( − 1: , Where 4 = ", 5, 6.

Descriptive analysis of age-specific incidence rates
From 2000 until 2012 incidence rates in the UK born remained relatively stable but have since fallen year on year. In comparison incidence rates in the non-UK born increased from 2000 until 2005, since when they have also decreased year on year. In 14-19 year old's, who were UK born, incidence rates remained relatively stable throughout the study period, except for the period between 2006 to 2009 in which they increased year on year. This trend was not observed in the non-UK born population aged 14-19, where incidence rates reached a peak in 2003, since when they have consistently declined. In those aged 0-5, who were UK born, incidence rates also increased year on year after the change in BCG policy, until 2008 since when they have declined. This does not match with the observed trend in incidence rates in the non-UK born population, aged 0-5, in which incidence rates declined steeply between 2005 and 2006, since when they have remained relatively stable (Supplementary Figure S1; Supplementary Table S7; Supplementary Table S8).
Supplementary Figure S1: Incidence rates per 100,000 for UK born population and non-UK born population, aged 0-5 and therefore directly affected by the targeted neonatal vaccination programme, and aged 14-19 and therefore directly affected by the universal school-age scheme.

Incidence estimates for all cases, those aged 0-5 and those aged 14-19
Supplementary