Completeness of tuberculosis (TB) notification: inventory studies and capture-recapture analyses, six European Union countries, 2014 to 2016

Background Progress towards the World Health Organization’s End TB Strategy is monitored by assessing tuberculosis (TB) incidence, often derived from TB notification, assuming complete case detection and reporting. This assumption is unlikely to hold in many settings, including European Union (EU) countries. Aim We aimed to assess observed and estimated completeness of TB notification through inventory studies and capture–recapture (CRC) methodology in six EU countries: Croatia, Denmark, Finland, the Netherlands, Portugal Slovenia. Methods We performed record linkage, case ascertainment and CRC analyses of data collected retrospectively from at least three national TB-related registers in each country between 2014 and 2016. Results Observed completeness of TB notification by inventory studies was 73.9% in Croatia, 98.7% in Denmark, 83.6% in Finland, 81.6% in the Netherlands, 85.8% in Portugal and 100% in Slovenia. Subsequent CRC analysis estimated completeness of TB notification to be 98.4% in Denmark, 76.5% in Finland and 77.0% in Portugal. In Croatia, CRC analyses produced implausible results while in the Netherlands and Slovenia, it was methodologically considered not meaningful. Conclusion Inventory studies and CRC methodology suggest a TB notification completeness between 73.9% and 100% in the six EU countries. Mandatory reporting by clinicians and laboratories, and cross-checking of registers, strongly contributes to accurate notification rates, but hospital episode registers likely contain a considerable proportion of false-positive TB records and are thus less useful. Further strengthening routine surveillance to count TB cases, i.e. incidence, accurately by employing record-linkage of high-quality TB registers should make CRC studies obsolete in EU countries.

Background: Progress towards the World Health Organization's End TB Strategy is monitored by assessing tuberculosis (TB) incidence, often derived from TB notification, assuming complete case detection and reporting. This assumption is unlikely to hold in many settings, including European Union (EU) countries.Aim: We aimed to assess observed and estimated completeness of TB notification through inventory studies and capture-recapture (CRC) methodology in six EU countries: Croatia, Denmark, Finland, the Netherlands, Portugal Slovenia. Methods: We performed record linkage, case ascertainment and CRC analyses of data collected retrospectively from at least three national TB-related registers in each country between 2014 and 2016. Results: Observed completeness of TB notification by inventory studies was 73.9% in Croatia, 98.7% in Denmark, 83.6% in Finland, 81.6% in the Netherlands, 85.8% in Portugal and 100% in Slovenia. Subsequent CRC analysis estimated completeness of TB notification to be 98.4% in Denmark, 76.5% in Finland and 77.0% in Portugal. In Croatia, CRC analyses produced implausible results while in the Netherlands and Slovenia, it was methodologically considered not meaningful. Conclusion: Inventory studies and CRC methodology suggest a TB notification completeness between 73.9% and 100% in the six EU countries. Mandatory reporting by clinicians and laboratories, and cross-checking of registers, strongly contributes to accurate notification rates, but hospital episode registers likely contain a considerable proportion of false-positive TB records and are thus less useful. Further strengthening routine surveillance to count TB cases, i.e. incidence, accurately by employing record-linkage of high-quality TB registers should make CRC studies obsolete in EU countries.
In 2014, the World Health Organization (WHO) published the Action Framework towards TB elimination in low-incidence countries, and in 2016, the WHO Regional Office for Europe (WHO/Europe) published the Roadmap to implement the tuberculosis action plan for the WHO European Region 2016-2020: Towards ending TB and multidrug-resistant tuberculosis. They outline blueprints to carry out the WHO's global End TB Strategy in Europe and to reach the sustainable development goal (SDG) target for tuberculosis (TB) [1][2][3]. Key strategic targets include reduction of global TB incidence by 80% in 2030 and 90% in 2035, compared with 2015 [4,5]. In the WHO European Region, the targets include a 25% reduction in TB incidence rate by the year 2020 compared with 2016 [3]. TB incidence can be derived from TB notification rates, assuming complete case detection and reporting. This is considered a strong assumption unlikely to hold in many settings, including European Union (EU) countries. Several studies across the EU/EEA, e.g. those from France, Spain, Italy and Romania, have revealed high rates of under-reporting [6,7].
Inventory studies (IS) are a widely accepted methodology to study the level of under-reporting of TB cases [8]. To determine completeness of TB notification, TB IS compare the number of cases meeting standard case definitions and recorded in multiple TB-related registers, such as laboratory registers or hospital episode registers, with the cases notified to local and national authorities [9,10]. For this comparison, record-linkage methodologies are used. Subsequently, through capture-recapture (CRC) analysis, the number of cases unknown to all registers can be estimated, thereby providing an estimate of the completeness of TB notification [11].
At the time of the study and writing this article, the United Kingdom (UK) was still part of the EU and only two EU countries, the Netherlands and the UK, had assessed the level of TB under-reporting using IS/ CRC studies on a national basis [11,12]. In 2016, the European Centre for Disease Prevention and Control (ECDC) commissioned an IS/CRC project to estimate TB under-notification in six to nine EU countries and expand the evidence base of the methodology.
The objective of this study was to assess observed and estimated TB notification completeness in EU countries, selected after an eligibility and feasibility appraisal, and determine whether data collected at national level reflect the real TB situation in these countries.

Study design
This study used inventory studies and capture-recapture analyses as per the methodology outlined by Van Hest and the WHO [6,8].

Country selection process
To select the six most appropriate and representative EU and European Economic Area (EEA) countries for the study, we followed a predefined selection procedure that was developed by the consortium commissioned to perform this work and agreed upon with ECDC during the project kick-off meeting in June 2016. First, eligibility and feasibility for IS and more specifically CRC studies was assessed by an 11-item tick-box questionnaire that was distributed to the TB operational contact points for epidemiology at ECDC and WHO/Europe TB surveillance network meeting in Bratislava in June 2016 (Table 1). Non-responders and countries not present at this meeting were invited by email to return the questionnaire by 1 August 2016. Countries who had not responded by 29 July received a reminder email from ECDC to fill out the form, but this did not yield any additional responses. Twenty forms were received from 31 countries (28 EU and 3 EEA): Belgium, Bulgaria, Croatia, Czech Republic, Denmark, Estonia, Finland, France, Germany, Ireland, Malta, the Netherlands, Poland, Portugal, Romania, Slovakia, Slovenia, Spain, the UK and Norway. Four countries, France, Malta, the UK and Norway, indicated they did not want to participate. Germany and Ireland indicated they did not have a third case-based register or national data which are essential requirements.
To obtain detailed information regarding TB-related registers, data availability, data access and data quality, the 14 countries identified in this rapid eligibility and feasibility assessment were contacted by 1 hr telephone or Skype calls, with a follow-up call or email for some countries. Subsequently, a simple scoring system was applied to ensure a consistent and systematic selection of countries (Table 1). A matrix of geographical location, economic status, TB incidence and proportion of foreign-born TB patients was also prepared to ensure diverse representation of countries. After the eligibility and feasibility appraisal, six countries, Croatia, Denmark, Finland, the Netherlands, Portugal and Slovenia, were selected for the IS and CRC analyses, and country-specific protocols were developed.

Case definition and inclusion criteria
This study included all cases recorded as active pulmonary or extrapulmonary TB, new as well as previously treated or diagnosed cases, in the various registers of the six EU countries in the selected study year. Double entries in each of the registers, cases caused by non-TB mycobacteria (NTM) or cases later diagnosed as not having TB were excluded. TB diagnosis was either: (i) bacteriologically confirmed by a positive culture for Mycobacterium tuberculosis complex (gold standard), (ii) bacteriologically confirmed by a positive auramine or Ziehl-Neelsen stained respiratory or non-respiratory sample and/or a positive PCR for M. tuberculosis complex, or (iii) based on clinical, radiological, histopathological or epidemiological findings with an intention to treat. Hospital episode records were included when coded with International Code of Disease (ICD)-9 code 010-018 or ICD-10 code A15-A19 [13,14].

Inclusion and period of data collection
TB cases with a date of notification, mycobacteriological confirmation, first day of hospital admission or death between 1 January and 31 December of the same study year were included. Study years varied in the countries between 2014 and 2016 ( Table 2). To allow for correction of late positive results from laboratory diagnostics or late notification, the period of data collection was from 6 months before the year of study until 3 months after (6 months after in Portugal).

Sources used to identify tuberculosis cases
The national electronic databases with case-based records of registered TB cases used in the IS in the six EU countries are listed in Table 2.

Record linkage
Two different approaches were used to link records of the same individual in different databases [8]. Deterministic record linkage, identical identifiers across datasets for a match, was performed when a unique identifier was available (Denmark, Finland, Portugal and Slovenia). Relaxed deterministic record linkage was used in the Netherlands, allowing for slight differences between the combination of identifiers in the various registers. Probabilistic record linkage, based on a likelihood score of a match [8], was used when a combination of proxy-identifiers, e.g. initials of the name, full date of birth, full postcode and sex of the patient, was available (Croatia, Portugal) ( Table 2). In Portugal probabilistic record linkage was performed after deterministic record linkage provided unrealistic results due to errors in the data sources.

Analysis
Observed TB notification completeness was calculated as the proportion of TB cases notified among the total number of TB cases observed after record linkage through the retrospective ISs. Estimated TB notification completeness were obtained through CRC analysis. The preferred CRC method entails log-linear modelling of at least three possibly incomplete, partially overlapping and preferably, but not necessarily, independent registers. Three-source log-linear modelling is less compromised by violation of the underlying modelling assumptions (e.g. a closed and homogeneous population; independence of registers; absence of false-positive cases) [6,8,[15][16][17][18][19]. CRC aims to estimate the unobserved number of TB cases from the information provided by the overlap between the linked registers. The unobserved number of TB cases were modelled based on rates of presence in each register (main effects) and pairwise dependencies (interactions) between registers; the latter were estimated as incidence rate ratios (IRRs), where estimates > 1 indicated a positive dependence between registers. Three linked registers provide eight different possible models ranging from containing no dependencies (the base model), any possible combination of the three pairwise dependencies, to the saturated model, containing all dependencies. The interaction between all three registers cannot be estimated and must be assumed absent. Selection of the preferred model (and thus the estimate of unobserved TB cases) was determined by balancing model fit with parsimony, based on Akaike Information Criterion (AIC) scores, coupled with knowledge of epidemiological plausibility, i.e. consistency with prior reports or estimates [12]. The estimated completeness of notification was the proportion of TB cases notified of the estimated total number of TB cases [6,8,[15][16][17][18][19]. Poisson log-linear modelling of three independent registers was done in Croatia, Denmark and Portugal. For Finland, modelling was extended to four sources, incorporating proportions of false-positive TB cases, based on external evidence. CRC was considered not feasible in the Netherlands and Slovenia because of complete overlap of respectively two and three registers.

Results
In Croatia, record linkage of three national registers (notifications (489 cases), reference laboratory (334 cases), and hospital episodes (476 cases)) resulted in a total of 662 registered TB cases, translating into an observed completeness of notification of 73.9% (489/662). Of the 173 TB cases not notified, 140 (81%) were only known to the hospital episode register (Figure). In CRC analysis, AIC scores favoured the saturated model, which estimated an implausibly high number of unobserved TB cases (1,705; 95% confidence interval (CI): 707-4,114).
In Denmark, record linkage of three national registers (notifications (379 cases), reference laboratory (283 cases), and hospital episodes (188 cases)) resulted in a total of 384 registered TB cases, translating into an observed completeness of notification of 98.7% (379/384). Of the five TB cases not notified, two were only known to the hospital episode register, two to the laboratory register only and one to both registers. CRC analysis selected a model with one interaction (between the laboratory and hospital episode registers), estimating one unobserved TB case, and resulting in an estimated completeness of notification of 98.4% (379/385) (95% CI: 97.9-98.7).
In Finland, complete overlap between the reference laboratory and the notification registers was expected due to automatic registration of the laboratory results. Record linkage of the notification register (260 cases) with the hospital episode (402 cases) and primary care episode (246 cases) registers resulted in a total of 569 registered TB cases, translating into an observed completeness of notification of 45.7% (260/569). After record linkage with a fourth register, the mortality register (20 cases), 573 TB cases were observed, translating into an observed completeness of notification    In the Netherlands, record linkage of three national registers (notifications (814 cases), reference laboratory (526 cases), and hospital episodes (489 cases)) resulted in a total of 998 registered TB cases in 2014, translating into an observed completeness of notification of 81.6% (814/998). All culture-confirmed laboratory TB cases were notified (observed completeness of notification of 100%). Because of this complete overlap two linked registers remained, with all TB cases not notified (n = 184; after excluding NTM (10 cases) and latent TB infections (4 cases)) only known to the hospital episode register, indicating that 49.0% (489/998) of the TB cases in the Netherlands in 2014 were (initially) hospitalised. Scrutinising the individual patient records of the cases only available in the hospital episode registry to assess whether they had signs, symptoms or diagnostic test results suggestive for TB, or whether treatment was initiated and completed, was not possible for confidentiality, logistical and financial reasons. A CRC analysis was methodologically not considered meaningful due to the complete overlap of the notification this would be a two-source CRC (see Discussion) [20] In Portugal, three TB registers were identified in 2015: the national TB programme register (2, In Slovenia, a strong positive interaction between the notification (118 cases), reference laboratory (111 cases) and hospital episode (115 TB cases) registers was expected, as all these services are mainly concentrated in one university hospital. Record linkage of these registers with the mortality (19 cases) register resulted in a case ascertainment of 119 TB cases. Excluding one person with a positive culture of M. tuberculosis starting treatment abroad and assumed notified there, 118 TB cases were registered in Slovenia in 2016, translating into a completeness of notification of 100%. Due to the almost complete overlap of the registers used, CRC analysis was methodologically considered not meaningful.

Discussion
Observed completeness of TB notification was calculated to be 73.9% in Croatia, 98.7% in Denmark, 83.6% in Finland, 81.6% in the Netherlands, 85.8% in Portugal, and 100% in Slovenia. Subsequent CRC analyses estimated completeness of TB notification to be 98.4% in Denmark, 76.5% in Finland and 77.0% in Portugal. In the other three countries, CRC analysis gave implausibly high estimates of unobserved TB cases (Croatia) or was not performed because it was methodologically not considered meaningful (the Netherlands, Slovenia).
In the Netherlands and Slovenia only two databases could be used and using two-source capture-recapture models for epidemiological data often violates the underlying capture-recapture assumptions, resulting in biased estimates and is thus not preferred [20]. According to the latest ECDC/WHO Europe TB surveillance and monitoring report, completeness of notification was estimated as 88.5%, 94.3%, 90.8%, 88.9%, 87.4%, and 84.3% for Croatia, Denmark, Finland, the Netherlands, Portugal and Slovenia, respectively [21].
In monitoring progress in TB control, the ultimate aim is to count TB cases (incidence) accurately through routine surveillance. When routine surveillance is not robust, alternative methods can be used to assess completeness of notification, as described elsewhere [22]. One such method is a prevalence survey (active case finding) [23]. However, prevalence surveys are costly and laborious and can only be justified in countries where many cases are thought to be missed and the expected smear positive TB prevalence is high enough (≥ 100 per 100,000 population) to obtain an accurate estimate with a reasonable sample size [23,24].
Surveillance by notification may result in inaccurate incidence rates because of under-ascertainment of the true number of cases or over-ascertainment due to false-positive cases [24]. Standardised means are needed to evaluate and adjust incidence rates [22]. IS and CRC analysis are such methods but can be difficult to implement, even in resource-rich countries [6]. We encountered various challenges when assessing completeness of TB notification in the six EU countries through retrospective IS and CRC methodology: (i) high interdependency between notification and laboratory registers (as a result of improved surveillance, sometimes by mandatory reporting, automatic record linkage, and routine cross-checking of registers); (ii) false-positive TB cases in hospital episode (and related) registers; (iii) privacy issues and aggregated data collection, preventing the use of prescriptions and health insurance registers; and (iv) selection of the CRC model. A comprehensive discussion of violation of underlying CRC assumptions can be found elsewhere [6,18,19].
Previous regional and national IS and CRC studies in EU countries often used the notification, reference laboratory and hospital episode registers, assuming absence of major interdependencies [6,11,12,25,26]. In recent years, many EU countries made it mandatory for the laboratory to report positive bacteriological TB test results, in addition to notification by the diagnosing clinician. This can result in a (near) complete overlap between these registers, as observed in Denmark, the Netherlands and Slovenia. Often, notification and reference laboratory registers are maintained within the same institute, with routine cross-checks between registers and sometimes with (in)direct access to other TB-related registers as well. The assumption that notification and reference laboratory registers in EU countries are relatively independent should be judged critically. In some countries, such as Denmark and Slovenia, where, by convention, almost all TB patients are (initially) hospitalised, a (near) complete overlap and interdependence between the hospital and notification registers was observed. Likewise, negative dependencies are expected when the hospital database includes clinically diagnosed cases, which by definition do not appear in the laboratory register, such as in Croatia. Violation of the assumption of homogenous capture probabilities can lead to biased estimates.
The two previous national IS and CRC studies in EU countries suggested a considerable proportion of falsepositive TB cases in hospital episode registers, possibly coding a differential diagnosis upon admission or a presumptive diagnosis upon discharge [11,12]. The diagnosis TB, a disease with a sometimes prolonged diagnostic pathway, can be withdrawn later, e.  [27]. In Croatia and the Netherlands, comparison of the hospital episode records with clinical information was not possible. Routine review of national hospital episode registers against patient records, in consultation with the clinicians, e.g. half a year after entry, could possibly increase the proportion of true-positive TB cases, possibly reducing over-estimation of the number of TB cases.
Alternative registers that could be used in some EU countries are mortality registers, when not nested in other registers. Other registers include health insurance registers and prescription registers for anti-TB medication, most specifically pyrazinamide, since other first-line drugs are also prescribed for different conditions, such as latent TB infection and treatment of NTM. Possible problems with these registers could be collection of only aggregate data, unsuitable for record linkage, or unavailability of data due to privacy regulations, such as the EU General Data Protection Regulation (GDPR) [28], which came into effect on 25 May 2018. Prescription registers could still be used for triangulation of IS and CRC results through pharmacoepidemiology by estimating the number of TB cases based on the daily dose equivalent of pyrazinamide [29]. In our study, a limited number of registers were available for linkage: mortality registers only contained few TB-related records and prescription and health insurance registers were not suitable in existing formats or under current regulations.
In some studies, the saturated CRC model, i.e. the model incorporating all pairwise interdependencies but also the highest order interdependency between all registers which cannot be controlled, was preferred based on the lowest AIC [17]. The literature generally recommends extreme caution when this model is chosen as one or more assumptions underlying the CRC analysis are likely violated, making statistical inferences unreliable [18]. Further work would be beneficial, including alternative structural models for the data-generating process that do not rely on log-linear assumptions [30], Bayesian models incorporating prior uncertainty on proportions of false-positive cases [31], data from multiple years to account for individuals observed in different years, or the incorporation of covariate data [32], as attempted in Portugal.
Given the intrinsically complex nature of TB epidemiology, variations in diagnoses (e.g. bacteriologically or clinically confirmed), socioeconomic background and disease severity, the assumption of homogenous probabilities of observations is unlikely to hold. Loglinear models may partially incorporate population heterogeneity, especially if relevant covariate information explaining variation is available, but the problem is likely persistent to some extent and a limitation of CRC [18]. The limited number of TB registers available and the absence of data to assess variations in individual detection probabilities in different countries did not allow to identify how heterogeneity affects population estimates across different settings. Hence, the accuracy of the final population estimates could only be assessed based on expert opinion, critical scrutiny of the nature of various sources and the quality of the data used in the analyses.
Another limitation of this study is that the six countries were selected after an eligibility and feasibility appraisal, with likely selection bias, limiting extrapolation of the results to other EU countries. The six countries selected were predominantly countries with a small population, a limited number of TB cases and a well-organised TB notification and control system.

Conclusion
Our analysis suggests that only in Denmark and Slovenia the assumption that TB notification reflects TB incidence is likely to be true. In these countries, the estimated number of unobserved TB cases was minimal (Denmark) or expected to be minimal because of complete overlap of TB cases between registers (Slovenia). In the other countries, observations and estimates were more difficult to interpret.
Mandatory reporting by both clinicians and laboratories, and cross-checking of registers, strongly contributes to accurate notification rates but hospital episode registers likely contain a considerable proportion of false-positive TB records and more scrutiny is needed. Further strengthening routine (computerised) infectious disease surveillance systems to count TB cases (incidence) accurately, by structurally employing electronic record linkage of high-quality TB registers, should make CRC studies obsolete in EU countries.