Bias in vaccine effectiveness studies of clinically severe outcomes that are measured with low specificity: the example of COVID-19-related hospitalisation

Many vaccine effectiveness (VE) analyses of severe disease outcomes such as hospitalisation and death include ‘false’ cases that are not actually caused by the infection or disease under study. While the inclusion of such false cases inflate outcome rates in both vaccinated and unvaccinated populations, it is less obvious how they affect estimates of VE. Illustrating the main points through simple examples, this article shows how VE is underestimated when false cases are included as outcomes. Depending how the outcome indicator is defined, estimates of VE against severe disease outcomes, whose definition allows for the inclusion of false cases, will be biased downwards and may in certain circumstances approximate the same level as the VE against infection. The bias is particularly pronounced for vaccines that offer high levels of protection against severe disease outcomes but poor protection against infection. Analysing outcomes that are measured with low sensitivity generally does not cause bias in VE studies; defining outcome indicators that minimise the number of false cases rather than the number of missed cases is preferable in VE studies.


Introduction
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) Omicron variant (Phylogenetic Assignment of Named Global Outbreak Lineages (Pangolin) designation B.1.1.529),which appeared towards the end of 2021, was generally associated with milder disease than previous SARS-CoV-2 variants but also with higher transmission rates [1][2][3][4].At the height of the Omicron wave, a large number of hospital patients therefore happened to be SARS-CoV-2-positive despite being hospitalised for reasons unrelated to COVID-19 [5].Analyses that attributed such hospitalisations to COVID-19, simply because of the co-occurrence in time with a SARS-CoV-2 infection, exaggerated the number of hospitalisations supposedly due to COVID-19 [6].But how does such outcome misclassification affect estimates of vaccine effectiveness (VE)?
Vaccine effectiveness is essentially a measure of the protective immunological effect induced by a vaccine against a certain outcome of interest such as infection, disease, hospitalisation or death.It is defined as the ratio of the case rate in vaccinated people to that in unvaccinated people.Valid estimates in VE studies are important as such studies help inform public health decisions.Observational VE studies are routinely used to monitor the protective effect in a population, or segments of a population, of new and established vaccines, for example those against seasonal diseases such as influenza and COVID-19, but they are also used in more unusual outbreaks as seen recently with studies of vaccine protection against mpox [7][8][9].
In this article, we investigate how outcome misclassification affects estimates in VE studies of severe outcomes.The first section presents the theoretical background and is followed by a section illustrating the main points through six simple scenarios using the example of VE against COVID-19-related hospitalisation and finally some suggestions for assessing the magnitude of the bias based on a few assumptions.

Theoretical background
Define VE against infection with a particular pathogen as VE inf = 1 − (π i1 /π i0 ) where π i1 and π i0 are the infection rates in the vaccinated and unvaccinated population respectively.
If the outcome of interest is a disease outcome caused by the infection, e.g.hospitalisation, rather than just infection itself, the quantity we seek to estimate is where π h1 and π h0 denote the probability that an infection in a vaccinated and unvaccinated person, respectively, will lead to the outcome.For argument's sake, but without loss of generality, this outcome is taken to be hospitalisation from this point onwards.
In studies using case definitions with low specificity, however, the measured outcome rates will be inflated by the inclusion of misclassified cases.For example, if individuals hospitalised for other reasons also have an 'incidental' infection with the pathogen studied and are erroneously included as cases in the analysis, the quantity actually being estimated is Here π f is the probability that a randomly selected person from the population is hospitalised for reasons unrelated to the studied infection and is assumed independent of vaccination status and infection.
In the numerator of the ratio measure in Formula 2, note that the term π i1 × π h1 is retained from Formula 1 and represents the proportion of the vaccinated population that is hospitalised due to the infection.The additional term, π i1 (1 − π h1 ) × π f , is the proportion among the vaccinated population that is infected and hospitalised due to other causes.The denominator is of the same form and relates to the unvaccinated population.
We can rewrite Formulae 1 and 2 in terms of relative rates (RR), Comparing the expressions in Formulae 3 and 4, and assuming that π h1 ≤ π h0 , it can be seen that RR hos ≤ RR' hos , or equivalently that VE' hos ≤ VE hos , since It can also be seen from Formula 4 that VE' hos tends towards VE inf as π f approaches 1.We therefore have VE hos ≥ VE' hos ≥ VE inf .By introducing a bias correction factor c we can express RR hos as a function of RR' hos is a value ranging from 1 to π h1 /π h0 as π f ranges from 0 to 1.Note that the bias is greatest when π h1 is a lot smaller than π h0 and when π f is large relative to π h0 and π h1 .Meanwhile there is no bias (c = 1) when π h1 = π h0, since VE hos = VE inf at that point.This would be the case for a vaccine which may protect against infection but does not provide any further protection against hospitalisation once infection has occurred.Note also that there is no relationship between the size of the bias, c, and the infection rates as expressed through π i0 and π i1 .
The relationship in Formula 5 can be rewritten as

Illustrating examples
In the following, the results derived above are demonstrated through a number of scenarios using the example of VE against COVID-19-related hospitalisation.The first three scenarios present examples of a vaccine with a relatively high level of effectiveness against infection, 80%, while the VE against infection in Scenarios 4-6 is only 20%.The scenarios are hypothetical and have been selected to illustrate the relationship between the VE as estimated in a study and the parameters introduced above.

Scenario 1: Base case scenario
In Scenario 1 (see Table ), the proportion of the population who are positive for SARS-CoV-2 is 10% among unvaccinated (π i0 = 0.10) and 2% among vaccinated people (π i1 = 0.02), meaning that VE against infection is 80%.Once infected, vaccinated individuals are less likely than unvaccinated individuals to require hospitalisation as only 0.5% of vaccinated cases are hospitalised because of COVID-19 (π h1 = 0.005) compared with 2% of unvaccinated cases (π h0 = 0.02).The true RR of hospitalisation for COVID-19 in the population is therefore (0.02 × 0.005) / (0.10 × 0.02) = 0.05 resulting in a VE against COVID-19 hospitalisation of 95%.The estimated VE of 87.4% is lower, however, due to contamination from individuals with an 'incidental' SARS-CoV-2 diagnosis, i.e. individuals who happen to be infected but who are actually hospitalised for other reasons.Two per cent of the population under study, whether vaccinated or unvaccinated, is in hospital for other reasons (π f = 0.02).The observed COVID-19 hospitalisation rates are consequently inflated in both the vaccinated and unvaccinated group by an amount that is equal to 2% of the total infections observed in the two groups.This forces the numerator and denominator of the RR closer together and pushes down the VE estimate.

Table
Six scenarios illustrating bias in estimated vaccine effectiveness against hospitalisation for COVID-19 in studies with low specificity of outcome The infection rate is 10% among the unvaccinated population except in Scenario 2 where it is 1%.The risk of hospitalisation due to COVID-19 following an infection is 2% among the unvaccinated population except in Scenario 3 where it is 0.2%.Except in Scenario 6 where it is 0.1%, the risk of hospitalisation in the population among those not hospitalised for reasons related to COVID-19 is 2%, e.g.(20,000-100) × 0.02 = 398 in Scenario 1.

Scenario 2: 10 times lower infection rates
As was established above, the incidence rate of infection in the population, reflecting the contagiousness of the pathogen, does not impact the magnitude of this bias (the bias correction factor c is independent of π i0 and π i1 ).This is illustrated in Scenario 2 where the infection rate is 10 times smaller in both the vaccinated and unvaccinated population (π i0 = 0.01, π i1 = 0.002) compared with Scenario 1 but the estimated VE remains unchanged.

Scenario 3: 10 times less severe disease
In Scenario 3, the disease is milder as the risk that an infection will lead to hospitalisation is 10 times less that in Scenario 1 (π h0 = 0.002, π h1 = 0.0005).Relative to the many misclassified cases with incidental infection, the relevant contrast between vaccinated and unvaccinated cases that are actually hospitalised for COVID-19 is diluted considerably.If the number of hospitalisations with incidental infections is large relative to those actually hospitalised due to the infection under study, the observed hospitalisation rate ratio, which in Scenario 3 is (400 + 10) / (1,996 + 200) = 410/2,196, will approach the rate ratio for infections among vaccinated vs unvaccinated (20,000/100,000).Consequently, as was established above, we see in Scenario 3 that the estimated VE against hospitalisation approaches the level of VE against infection.

Scenario 4: Poor vaccine effectiveness against infection
When VE for infection is high, as in Scenarios 1-3, the absolute scale of the bias may be limited.On the other hand, when VE for infection is low (e.g.20% as in Scenario 4), estimates of VE against hospitalisation may tend towards similarly low levels despite good vaccine protection against hospitalisation once infected.

Scenario 5: Poor vaccine effectiveness against infection; good vaccine protection once infected
The bias is particularly pronounced for vaccines that offer high levels of protection against hospitalisation despite poor protection against infection as was the case for many of the original (monovalent) COVID-19 vaccines during the Omicron era [10,11].This is illustrated further in Scenario 5 where the vaccine still only protects 20% against infection but vaccinated SARS-CoV-2 cases are 10 times less likely to require hospitalisation than unvaccinated cases (π h1 / π h0 = 0.002/0.02= 0.1).Here, VE against hospitalisation is estimated at just 55.6% when in fact it is 92%.

Scenario 6: Poor vaccine effectiveness against infection; good vaccine protection once infected, low rates of hospitalisations for other reasons
Arguably most important for the magnitude of the bias is the rate of misclassified cases, π f , i.e. in this example, the proportion of people in hospital for reasons other than COVID-19.This is illustrated in Scenario 6, which is a repeat of Scenario 5 except only 0.1% of the total population under study (not hospitalised for COVID-19) is in hospital for other reasons (π f = 0.001).
In this Scenario, the estimated VE of 88.6% is wrong by only a few percentage points.In many real-life scenarios, a lower π f such as in Scenario 6 may be more realistic, especially if the study is conducted in a general population of relatively good health.

Bias correction
As explained above, the true (unbiased) VE against a severe disease outcome, such as hospitalisation, can be expressed as VE hos = 1 -c × (1 -VE' hos ), where VE' hos is the estimated VE and c is the bias correction factor which is a function of the three parameters, π f , π h1 and π h0 .To illustrate, having observed VE' hos = 81.3% as in Scenario 3, and assumin g π f = 0.02, π h1 = 0.0005 and π h0 = 0.002, we can evaluate c as 0.2679 and the true VE as 1 -0.2679 × (1 -0.813) = 95%.
In practice π f , π h1 and π h0 will generally not be known but might be gauged from electronic health records or external studies so that it may still be possible to gain a sense of the level of underestimation that can be expected in particular scenarios and, by varying the parameters, suggest a range of plausible values within which the true (unbiased) VE is likely to be.

Discussion
Generally, a low specificity of the outcome measure in clinical studies results in rates being overestimated in both treatment and control groups.Consequently, it is well established that low specificity attenuates the ratio of the measured rates causing the ratio to be closer to 1 than it truly is [12,13].However, in studies of vaccine protection against severe outcomes, VE is typically derived from the product of two ratios, namely the ratio of infection rates and secondly, among those infected, the ratio of the rates of severe outcome.In many studies, such as those using PCR methods to detect infection, the problem of low specificity affects only the second ratio.Consequently, as illustrated in this article, the biased estimate is bounded by the level of VE against infection.
We have seen that the proportion of individuals admitted to hospital for unrelated reasons, π f , is an important factor for the size of the bias.In all the scenarios shown, and in the expression for c, it is assumed that π f is independent of vaccination status.A fundamental principle of VE studies is that the vaccinated and unvaccinated groups being compared do not differ systematically with respect to other risk factors (at least not after adjustment).This is necessary to ensure that the VE measure captures only the immunological effect of vaccination and not the effects of other exposure and disease predictors.It would probably be a sign that the health profiles are not comparable if π f differed between the two groups, and the resultant VE estimate would therefore also be biased due to confounding.Supplementary analyses with negative control exposures or outcomes are recommended in observational VE studies as a way to assess such bias [14,15].Nonetheless, it is possible to adapt the expression for c to accommodate different π f rates, say π f1 and π f0 respectively, in the vaccinated and unvaccinated group; it can then be shown that the underestimation is even more pronounced if π f1 > π f0 , which would be the situation if, for example, people of poorer health were more likely to be vaccinated.
Observational studies to estimate VE may be designed in various ways.Study designs include retrospective cohort studies through analysis of routinely collected electronic health records, cross-sectional designs, prospective cohort studies, test-negative designs and other case-control studies.The biasing effects presented in this article, caused by outcome misclassification, apply equally across all these designs.Even randomised controlled trials (RCTs) are vulnerable to this type of outcome misclassification bias, although due to their budgets and rigour, RCTs often include more accurate outcome assessment methods than observational studies.
Unlike the problems caused by low specificity, low sensitivity of an outcome measure generally does not bias the ratio of the measured rates in the two groups [13].It is therefore preferable for studies of VE to use outcomes that minimise numbers of misclassified cases (false cases) rather than numbers of missed cases.
Depending on data availability, minimising the number of misclassified cases may be achieved by defining more specific outcome measures based for example on primary diagnosis codes upon hospital admission, hospital procedures or death certificate information.Death as a specific disease outcome may also be defined with higher specificity, but possibly at the cost of lower sensitivity, by requiring certain accompanying hospital diagnoses or procedures.A number of studies have explored alternative severe outcome definitions in the context of VE [6,16,17].
Whether in the context of COVID-19, influenza or some other type of infection, studies of VE against severe disease outcomes such as hospitalisation and death should aim to use outcome measures that minimise inclusion of false cases to avoid underestimation of the effects of interest.Where this is not possible, consideration should be given to the magnitude of the resulting bias, for example by investigating likely scenarios for the three parameters that enter the expression for c above.

Conclusions
Vaccine effectiveness against severe disease outcomes will generally be underestimated when incidental cases are included in the analysis.The potential error is greatest for vaccines that offer high levels of protection against severe disease but poor protection against infection.Outcomes with high specificity rather than high sensitivity are preferable in VE studies.