Mortality review as a tool to assess the contribution of healthcare-associated infections to death: results of a multicentre validity and reproducibility study, 11 European Union countries, 2017 to 2018

Introduction The contribution of healthcare-associated infections (HAI) to mortality can be estimated using statistical methods, but mortality review (MR) is better suited for routine use in clinical settings. The European Centre for Disease Prevention and Control recently introduced MR into its HAI surveillance. Aim We evaluate validity and reproducibility of three MR measures. Methods The on-site investigator, usually an infection prevention and control doctor, and the clinician in charge of the patient independently reviewed records of deceased patients with bloodstream infection (BSI), pneumonia, Clostridioides difficile infection (CDI) or surgical site infection (SSI), and assessed the contribution to death using 3CAT: definitely/possibly/no contribution to death; WHOCAT: sole cause/part of causal sequence but not sufficient on its own/contributory cause but unrelated to condition causing death/no contribution, based on the World Health Organization’s death certificate; QUANT: Likert scale: 0 (no contribution) to 10 (definitely cause of death). Inter-rater reliability was assessed with weighted kappa (wk) and intra-cluster correlation coefficient (ICC). Reviewers rated the fit of the measures. Results From 2017 to 2018, 24 hospitals (11 countries) recorded 291 cases: 87 BSI, 113 pneumonia , 71 CDI and 20 SSI. The inter-rater reliability was: 3CAT wk 0.68 (95% confidence interval (CI): 0.61–0.75); WHOCAT wk 0.65 (95% CI: 0.58–0.73); QUANT ICC 0.76 (95% CI: 0.71–0.81). Inter-rater reliability ranged from 0.72 for pneumonia to 0.52 for CDI. All three measures fitted ‘reasonably’ or ‘well’ in > 88%. Conclusion Feasibility, validity and reproducibility of these MR measures was acceptable for use in HAI surveillance.


Introduction
Healthcare-associated infections (HAI) are a major public health problem affecting more than 90,000 patients on any given day in European acute care hospitals, which results in an estimated 4.5 million cases each year [1]. HAI are associated with increased morbidity and mortality [2]. Data on attributable mortality are limited, hampering accurate estimates of the burden of HAI. The attributable mortality of HAI is difficult to assess because of various competing causes of death in severely ill patients, especially in intensive care units (ICU). In addition, death is a consequence of events that occur over a period of time, which is usually not well addressed in statistical models. Attributable mortality of HAI is usually estimated by calculating the difference in the relative risk of death between patients with and without HAI from comparative studies or by modelling approaches [3][4][5][6][7][8]. However, statistical approaches are not easily applied in individual hospitals as they require detailed data on a cohort of patients and statistical expertise. Potential sources of bias, such as heterogeneity in multicentre studies and time-dependency of the observed outcome, need to be taken into account [5,9], and the results can be difficult to assess as they depend primarily on the availability of data on risk factors. Another approach to estimate the attributable mortality of HAI is to perform mortality review studies that entail a descriptive evaluation, for each patient who died with an HAI, of the likelihood that the HAI contributed to the death of the patient according to clinical judgement.
The European Centre for Disease Prevention and Control (ECDC) coordinates the European Healthcare-Associated Infections surveillance Network (HAI-Net). In 2013, the European Commission requested that the ECDC should collect additional data on mortality from HAI. To address the request, the ECDC introduced mortality review into the HAI-Net surveillance protocols with a measure that categorises the contribution of an HAI to death in three categories: no contribution, possibly contributed and definitely contributed, based on the work of Kaoutar et al. [10]. As the validity of mortality reviews has never been established (e.g. through autopsy studies) and standardisation of the criteria and review process across hospitals and countries would be necessary, the ECDC initiated a study to evaluate the validity, feasibility and reproducibility of the review measure.

Methods Preparation
An expert panel was established to support the project group in developing the study design. This panel consisted of 12 experts that were either National Focal Points for HAI, infection prevention and control doctors, intensive care physicians, surgeons or epidemiologists, known for their clinical and research experience in HAI. The study group, including both project group and expert panel, met to discuss the three-category mortality review measure (3CAT) developed by Kaoutar et al. [10] and evaluated in 16 French hospitals. We added two alternative measures: one based on the World Health Organization (WHO) death certification methodology that is widely applied by clinicians [11] (WHOCAT) and a quantitative Likert scale from 0 to 10 (QUANT), to enable a more visual assessment [12] (Box).
Pneumonia, bloodstream infection (BSI) and Clostridioides difficile infection were selected for evaluation as these HAI are recorded within two HAI-Net modules (European surveillance of healthcareassociated infections in intensive care units (pneumonia and BSI) [13] and European surveillance of C. difficile infections [14]) and are both frequent and associated with increased mortality [2]. During the expert meeting, the panel evaluated the feasibility and validity of the three outcome measures with a number of case vignettes.

Hospital recruitment
The ECDC national focal points for HAI of all countries contributing to HAI-Net were invited by email to recruit hospitals in their country, preferably those performing HAI surveillance for ICU-acquired HAI and/or CDI, applying the ECDC surveillance protocols [13][14][15].

Review procedure
On-site investigators attended the kick-off meeting, where the review procedure and the data to be collected were explained and discussed.
Adult patients 16 years and older were included if they had BSI or pneumonia (most often, but not exclusively, ICU-acquired, defined as occurring after more than 48 h in ICU) or CDI, and subsequently died during the same hospital/ICU stay. Cases with SSI could be included but were not the focus of the study. A local team consisting of an on-site investigator (OSI; usually an infection prevention and control doctor or ICU physician) and a treating physician (TP) evaluated the patient records. The reviews were performed within ca 1 month of the death, to enable recollection of relevant details.

Box
Description of the three mortality review outcome measures    For each deceased patient with an HAI, the OSI and TP independently assessed the contribution of the HAI to the patient's death, using the three outcome measures (Box). They subsequently discussed the case aiming to reach a consensus. Agreement or disagreement was recorded both before and after the discussion.

Data collection
The OSI entered data from the patient records in a data registration form prepared in Excel. The following data were recorded for each included patient: gender, age, hospital and ICU admission data, ward type, ICU type, date of onset and type of limitation of treatment (such as withholding or withdrawal of life-sustaining treatment), type of surgery for SSI cases, type of HAI, date of HAI and date of death, microbiology results (with a maximum of two pathogens), other HAI (BSI, pneumonia, CDI or SSI) and, in case of CDI, origin (healthcareor community-associated) and complicated course. If more than one HAI was present, the HAI considered as the most severe was selected for the review. The assessment of the contribution was performed with the help of a checklist to increase inter-rater reliability and facilitate the interpretation of the results by the project group. This checklist included both objective and subjective items (for details see the data entry form (Supplementary Text Box S1)): expected mortality on admission when not admitted to an ICU, severity scores (Simplified Acute Physiology Score (SAPS) II or Acute Physiology and Chronic Health Evaluation (APACHE) II_III scores for ICU, from which the expected mortality on admission was derived using ECDC HAI surveillance data from 2012 to 2015, and Sequential Organ Failure Assessment (SOFA) score), condition and comorbidities on hospital admission (McCabe score, Charlson's severity of illness and Charlson's comorbidities), American Society of Anesthesiologists (ASA) score for patients with a SSI, status of HAI on the day of death (HAI or complication thereof still active), severity of the HAI, plausible pathophysiological mechanism for contribution of the HAI to death, and presence of competing cause for the death.
In addition, we recorded selected antimicrobial resistance (AMR) phenotypes under surveillance, as specified in HAI-Net protocols [16], the perceived adequacy of antimicrobial treatment, and the contribution of AMR to the death of the patient, using scales similar to 3CAT and QUANT (Supplementary Table S1). Treatment was considered inadequate when the initiated empirical treatment, although conforming to the local antimicrobial policy, did not match the susceptibility of the cultured microorganisms, resulting in a delay in instituting adequate antimicrobial treatment. AMR could have contributed to death through a delay in adequate antimicrobial treatment or an adverse event (such as renal failure) induced by the antimicrobial prescribed to treat an HAI with a resistant organism.
For each case, reviewers answered the question "How well did [the measure] apply", independently assessing the fit of the measure (with the categories: applies well/reasonably/poorly/not). The fit indicated how well the assigned category for each MR measure corresponded with the perceived contribution in each particular review case.

Statistical analysis
Inter-rater reliability was measured with Cohen's kappa statistic (kappa), weighted kappa statistic (wk), which accounts for ordered categories, percentage agreement and/or the intraclass correlation coefficient (ICC), depending on the measure. We calculated both the overall averaged kappa and an average kappa that controlled for hospital by adjusting for the hospitalspecific variances [17].
We calculated the percentage agreement per category with the formula (2 × a)/(2 × a + b + c + d + g), where a is the agreed number of cases for a category and b, c, d and g are the number of cases where only one reviewer assigned that category. In this article, 'agreement' refers to the initial agreement between the two reviewers, unless stated otherwise. We were interested in the ICC for absolute agreement and employed a two-way ICC, assuming that the raters' effects will contribute to the variability of the ratings as random effects [18]. To study the association between patient and HAI characteristics and the perceived contribution, we used the consensus value of the measure. When a consensus was not reached, the assessment of the TP was used.
To diagnose contribution to death (3CAT and WHOCAT), we used a random forest classifier approach. A set of the best predictors was selected to achieve an optimal prediction accuracy. Using this set, we switched to model construction in order to assess the association between the variables and the categorical outcome by means of multinomial logistic regression. In the overall analysis of 3CAT, we could perform a multilevel analysis, allowing for clustering at the hospital level. With HAI-specific subsets, these models usually did not converge. Variables that had a p value < 0.2 in the univariate analysis were included in the multivariate analysis. The final model was attained by manual backward selection, controlling the decrease in model fit with the −2log likelihood test. We used SAS software version 9.4 of the SAS system (SAS Institute Inc., Cary, United States) and R, version 3.5.1 (R Foundation for Statistical Computing, Vienna, Austria).

Ethical statement
The study protocol was submitted to the medical research ethics committee (MREC) of the University Medical Centre Utrecht. As the study was not interventional the need for further evaluation was waived.
Participating hospitals also approved the study protocol.

Assigned scores
With 3CAT, the HAI was considered to have definitely or possibly contributed to the patient's death in 83% of cases according to the TP and 87% according to the OSI (Table 2). For the types of HAI, the responses of TP and OSI were respectively 71% and 81% for pneumonia, 94% and 95% for BSI and 82% and 85% for CDI (Supplementary Table S1). When the contribution was considered definite, it was viewed as a major contribution in, respectively, 92 (118/128) and 96% (108/113), whereas when the contribution was considered possible, it was viewed as major contribution in 30% (34/112 and 42/140) for both TP and OSI. With WHOCAT, the HAI was considered part of the causal sequence in the majority of patients (56% for TP and 55% for OSI) and rarely viewed as the sole cause (9% and 7%, respectively). Table 2 summarises the ratings for 3CAT and WHOCAT and Figure 1 summarises the ratings for QUANT.  some of the results will therefore be presented for 3CAT only.

Inter-rater reliability and perceived fit
The wk for 3CAT was 0.68 overall, whereas the percentage of initial agreement was 76% (Table 3). Consensus agreement after discussion was reached in 93% of cases. Percentage agreement was the highest when the contribution of the HAI was considered definitely present (> 80%, except for CDI) and lowest for the category 'did not contribute'. The wk differed between hospitals, ranging from 0.26 to 1.00 (p = 0.015) and was higher in tertiary than in secondary care centres (p = 0.03 for pneumonia, p = 0.07 for BSI). The kappa on whether the HAI was a major or minor cause, when 3CAT assessments were 'possibly contributed' or 'definitely contributed', was 0.69 (95% CI: 0.60-0.79) and agreement was 86% (197/229).
The order of the categories of WHOCAT was less clearcut than that of 3CAT and QUANT. In all except two hospitals, the inter-rater reliability was the same or higher when assuming that the categories of the variable were ordered than when the categories were considered not ordered. The inter-rater reliability for WHOCAT was comparable to that of 3CAT, both overall and for each type of HAI. Kappa differed significantly between hospitals (p < 0.0001).
Similar to 3CAT and WHOCAT, the ICC was highest for pneumonia and lowest for CDI. The observed agreement for QUANT (Figure 1) was higher at the extreme values of the scale than at the intermediate values. All three measures were reported to fit reasonably to well for more than 88% of the reviewed cases (Supplementary  Table S2). WHOCAT and QUANT measures were considered to fit better than 3CAT.  Frequency (n)

Pathogens, antimicrobial resistance and adequacy of treatment
QUANTitative score

A. 3CAT, combined with minor/major cause, and QUANT (n=287)
AMR phenotypes under surveillance, the contribution of the HAI to death, using 3CAT, was classified as possible or definite in 86% of TP assessments and 95% of OSI assessments ('definitely' in 57% by TP and 47% by OSI, Supplementary

Patient and healthcare-associated infection characteristics associated with agreement and contribution to death
Both the agreement on the initial assessments (Supplementary Table S4) and the consensus were higher for the patient and HAI characteristics that were to be assessed separately, than for the contribution of HAI to death.  World Health Organization death certification based measure. a Excluding hospitals with fewer than six cases. b Zero cases with agreement, one case in denominator; cases where one of the ratings was missing or 'unknown' were excluded. Percentages are presented in italics.
The presence of a pathophysiological mechanism for the contribution of HAI to death was most strongly associated with a contribution considered definite, for all three measures (Supplementary Table S5). Severity of HAI and presence of a competing cause for death were among the top three associated factors. The type of HAI, whether the HAI or complication of the HAI was active on the day of death, ICU admission and the Charlson's severity score were, to a lesser extent, also associated with contribution to death. HAI were considered to contribute more to the death of 'moderately ill' patients ('definitely' contributed in 51% for TP and 44% for OSI) than in 'not or mildly ill' patients (20/52 for TP and 20/52 for OSI) or 'severely ill' patients (38% for TP and 33% for OSI) (Supplementary Table S7).

Discussion
Our study demonstrated that the inter-rater reliability of three mortality review measures for the contribution of HAI to death, measured with wk and percentage agreement, was moderate to strong, depending on the type of HAI. Together with the correlation between the three outcomes, 3CAT, WHOCAT and QUANT, and the perceived fit, corroborating the validity, this implies that the mortality review measures are considered acceptable for use in HAI surveillance.
Although feasibility was not evaluated in detail, MR appeared feasible in the participating centres. Meeting up with the treating physicians was sometimes challenging but this could improve when MR is embedded in standard practice.
Autopsy studies are the gold standard to assess construct validity of the contribution of HAI to death, but they are few and not recent [19][20][21][22]. Therefore, we applied three measures which had been proven valid before in a single centre study [10], were based on related concepts [11] or were perceived useful by an expert panel. These measures were discussed and tested with case vignettes by the expert panel to further ensure face, content and construct validity. The correlation between the three measures supports the assumed validity. Another feature that can corroborate the content and construct validity of the measures is the perceived fit, which was reasonable or good in more than 88% for all measures. The OSI preferred WHOCAT and QUANT over 3CAT on the grounds that it better reflected the rationale (WHOCAT) or a more neutral and better fit of the mortality review (QUANT).
The inter-rater reliability varied with the type of HAI: it was highest for pneumonia and lowest for CDI. Differences in kappa were larger than differences in percentage agreement, which can partly be explained by the prevalence of the different categories. The reviewers agreed most often when the contribution of the HAI was assessed as definite or, slightly less, when assessed as possible, whereas agreement on 'no contribution' was lowest for BSI and CDI. The majority of CDI cases originated from three centres and 45% from one of these, which may have introduced bias. It was difficult to conclude whether the lower agreement observed in two of these centres was due to the type of infection, i.e. CDI, or resulted from factors specific for these centres. A BSI was usually considered to have contributed to the death of a patient, either 'definitely' or 'possibly', and a skewed distribution resulted in lower kappa values.
There are a few reports on the inter-rater reliability of HAI-associated mortality review outcomes. In a study by Kaoutar et al. [10], the review was performed by an infection prevention and control professional who also interviewed the TP. This procedure resembles the joint discussion after the independent review in our study and the agreement of 91% reported by Kaoutar [23]. The inter-observer reliability (kappa = 0.4) reported by Langelaan et al. in a Dutch study on adverse events was considerably lower [24].
AMR was present in more than half of the BSI and pneumonia cases, which is higher than the approximately 30% expected in the overall population of patients (alive and deceased) with an HAI (estimated with the country-specific AMR percentages from the ECDC point prevalence surveys and the number of cases contributed by each country, not accounting for the type of HAI) [25]. The higher AMR rate in this population of deceased patients with HAI seems to be associated with death as AMR was perceived as definitely or possibly contributing to death in 70-72% of these patients. In a German mortality review of 215 patients deceased with a multidrug-resistant hospital-acquired infection the infection was considered the cause of death in 36% [26], which is slightly higher than the 28-30% of our cases where contribution of (not necessarily multidrug) resistance was considered definite. Overall, antimicrobial treatment was considered inadequate in 15% of the cases, in the lower ranges of what has been reported elsewhere [27]. Inadequate antimicrobial treatment was associated with a higher contribution of AMR to death. Inadequate treatment is a known and confirmed risk factor for mortality of patients with infections in observational studies [28].
Our study showed that healthcare-associated BSI, pneumonia and CDI were perceived to have definitely or possibly contributed to the death of a patient in the majority of cases. The presence of a pathophysiological mechanism that explained the contribution of the HAI to the death of the patient, and the severity of the HAI, were items that were most strongly associated with the perceived contribution (Supplementary Table  S5). For CDI, 'complicated course' fitted the results better than severity. In some cases, a clear pathophysiological mechanism can relate the HAI to the cause of death. However, in other cases, the perceived presence of a pathophysiological mechanism can be considered as a proxy of the assessment of the contribution and may therefore not be useful to guide a reviewer's assessment. Some but not all reviewers described the checklist as helpful for gathering the relevant information. Altogether, the variables shown to be significantly associated with death may be used as tools for facilitating and standardising the assessment.
When evaluating only pneumonia, BSI and related infections in the study by Kaoutar et al. [10], the proportions of cases with definite and possible contribution of pneumonia were 29% and 40%, respectively, which is comparable to our study. For BSI, the contributions were 36% and 38% respectively, lower than in our study (51% and 43%). Differences in the patient population (more ICU patients in our study) and improvement in the prognosis of BSI since Kaoutar's study in 2000 and 2001 may account for this difference. Decoster et al. found that death was attributable to an HAI in 33% of patients with McCabe score 1 or 2 and a bacteraemia, systemic, respiratory or catheter infection [29]. In the same patient category, the contribution was classified as definite in 47% (TP) and 42% (OSI). Branger et al. found that the death was 'most likely' associated with the HAI in only 20% of the cases but this study did not exclude infections with little impact on mortality, such as UTI [30]. Two earlier studies included autopsy reports in the evaluation. Hospital-acquired bacteraemia/sepsis and pneumonia were perceived as the 'immediate cause of death' in 33% of BSI and 59% of pneumonia cases in the first study [22] and in 49% pneumonia cases in the second study [21], i.e. more or equally frequent as in our results for pneumonia, but less frequent for BSI. Although the attributable mortality of CDI has been frequently documented [31][32][33], mortality review data are scarce for CDI. Mlangeni et al. found that CDI contributed to death in 24% of 85 cases [34], which is less than the 82% (TP) and 85% (OSI) in our study. It is difficult to conclude what reasons might explain the differences in the perceived contribution of BSI and CDI to death. The specific hospital mix of the studies might contribute to this. Although in our study, the perceived contribution of HAI to death was higher in tertiary care centres than in secondary care centres, this does not necessarily need to be the case [22]. The cited studies were all performed in a single country, but countries differ with regards to the availability of ICU beds [35] and consequently the average disease severity, infection prevention and control practices [36], prevalence of AMR [25] and other, including cultural, factors that may affect the contribution of HAI to death and the assessment of this contribution.
A strength of our study was the multicentre design, including hospitals from 11 countries, which increased the generalisability of its results and insight into possible differences among countries and hospitals, but also introduced new sources of variance that cannot always be controlled for. The results for CDI are less robust as 45% of all cases originated from one centre and the majority from three. A local team performed the reviews as in routine HAI surveillance. As a consequence, there were known and unknown differences among the review practices despite initial training at the kick-off meeting and use of a standardised protocol. Strongly opinionated reviewers and other subjective factors may be sources of bias in individual centres but are expected to average out when a large number of hospitals contribute to regular HAI surveillance. The contribution of specific types of HAI might have been overestimated as the most severe HAI, in cases with more than one HAI present, was selected for the mortality review. Our results may not be representative of all types of hospitals. The majority of the participating hospitals were tertiary care centres, and the inter-rater reliability appeared to be higher than in secondary care centres. This could be due to the smaller number of reviewed cases in secondary hospitals. Autopsies were not performed in the framework of our study.
A common criticism of the association between a HAI and the death of a patient is that patients die with the HAI and not because of the HAI [37]. The present study demonstrates that clinicians frequently think otherwise and that a mortality review can be performed with reasonable inter-rater reliability. Still, clinicians sometimes fear the judgement of hospital management or medico-legal consequences if they perform a mortality review with explicit outcome statements. These anticipated consequences are a major barrier for widespread adoption of an otherwise feasible mortality review. It is important that stakeholders understand that neither the death of a patient with an HAI, nor the HAI itself are necessarily preventable, and support clinical staff, as improved insight into the contribution of HAIs to patients' morbidity and mortality is an important driver of quality improvement processes and interventions to prevent HAI.

Conclusion
Although the construct validity of mortality review is difficult to assess because there is no recent goldstandard for the assessment of the contribution of an HAI to death, this study showed that the validity and reproducibility of the three evaluated mortality review measures was acceptable for use in European surveillance of HAI. The performance of the three measures was comparable and the perceived fit of the three outcomes was predominantly reasonable or good. Most reviewers preferred the WHO categories (WHOCAT) that better account for the different levels of causality assessment and the quantitative scale (QUANT) which was perceived as more neutral than other measures. Further standardisation of the measures for surveillance purposes through training and the use of case vignettes may increase robustness and comparability across hospitals and countries.