Unexposed populations and potential COVID-19 hospitalisations and deaths in European countries as per data up to 21 November 2021

We estimate the potential remaining COVID-19 hospitalisation and death burdens in 19 European countries by estimating the proportion of each country’s population that has acquired immunity to severe disease through infection or vaccination. Our results suggest many European countries could still face high burdens of hospitalisations and deaths, particularly those with lower vaccination coverage, less historical transmission and/or older populations. Continued non-pharmaceutical interventions and efforts to achieve high vaccination coverage are required in these countries to limit severe COVID-19 outcomes.

short periods of missing values. Daily numbers of deaths are then calculated by differencing the cumulative death time series.

Vaccinations
We use age-stratified data on weekly numbers of vaccinations with different vaccine types in EU countries from 14th December 2020 to 21st November 2021 from the European Centre for Disease Prevention and Control (ECDC) (5) for European Union countries and aggregated age-stratified data on daily numbers of vaccinations with different vaccine types from 8th December to 21st November 2021 for England from the UK government's COVID-19 dashboard (6). For the ECDC data, we convert weekly numbers of vaccinations into daily numbers of vaccinations by assuming equal numbers of vaccinations occurred on each day of a given week. We categorise the different vaccines into two groups -(A) viral vector and inactivated virus vaccines (AstraZeneca, Janssen, Sputnik and Beijing CNBG), and (B) mRNA vaccines (Pfizer and Moderna) -based on higher effectiveness for mRNA vaccines than viral vector and inactivated virus vaccines but similar effectiveness within these groups, and these being the vaccine types that have been used in European countries. First and second doses of unknown vaccine type are assigned a vaccine type according to the average proportion of each type used for first and second doses for the given age group during that week in other countries. Vaccinations with an unknown age group constitute only 0.008% of all vaccinations (and less than 0.3% of total vaccinations in any one country) so are disregarded. The breakdown of vaccinations by age is missing from the ECDC data for the Netherlands, so we scale the total number of vaccinations by the mean proportion of vaccinations in each age group in each rollout week across countries with age-stratified data (weighted by the relative population proportions of the age groups in each country) to estimate the vaccinations in each age group. Vaccination data for Germany is only available in broader age groups (<18, 19-59, 60+yrs) than we use in the analysis (7). We therefore use data from the Robert Koch Institute's COVIMO phone survey of vaccine uptake in German citizens (8) (Supplementary Table S2) to estimate the relative coverages in the finer age groups within each of broader age groups, and scale the total numbers of vaccinations over time in the broad age groups by these relative coverages. Whilst this does not fully account for the initial age prioritisation of the vaccine rollout in Germany (Supplementary Figure S1A), it gives more accurate estimates of the current vaccine coverage in the eldest age groups than the method used for the Netherlands, which underestimates coverage in the eldest age groups in Germany. We assume fixed delays of 28 days and 14 days for development of protection following first and second doses, corresponding to the time points at which efficacy was measured in clinical trials (Supplementary Table S1).

Variants
We use data on frequencies of SARS-CoV-2 variants among sequenced infections in each country from the Global Initiative on Sharing Avian Influenza Data (GISAID) EpiCoV database (9) and The European Surveillance System (TESSy) (10) (collated by the ECDC (11)), and from the COVID-19 Genomics UK (COG-UK) consortium Lineages database (12). We classify variants into three groups -"Other" for pre-existing (pre-Alpha) and other variants, "Alpha" (B.1.1.7), and "Delta" (all B.1.617.2 va(11)riants) -as at the time of writing these are the main variants that have emerged and circulated in European countries.

Vaccine effectiveness
We use estimates of overall vaccine effectiveness against different outcomes -infection ( ), disease ( ), hospitalisation ( ), and death ( ) -for different doses of different vaccines and ℎ different variants from various effectiveness studies to calculate the effectiveness estimates required for our model: those against infection, against disease given infection, against hospitalisation given disease, and against death given disease. We assume that the effectiveness against each outcome of the different vaccines is the same for "Other" variants and the Alpha variant. If is the vaccine effectiveness against infection and is the overall effectiveness against outcome , then we calculate the effectiveness against disease ∈ { , ℎ, } given infection, against hospitalisation given disease, and against death given disease, , , ℎ and respectively, for each dose and variant as: Table S1 shows the vaccine effectiveness estimates we use in the model and their sources.

Infection fatality risk
For the age-specific SARS-CoV-2 infection fatality risk (IFR) in unvaccinated individuals used in the analysis, we use the ensemble IFR estimate from (13), which combined data on age-stratified COVID-19 deaths up to 1st September 2020 for 45 countries with data from 22 national seroprevalence surveys to estimate the age-specific IFR. COVID-19 deaths were split by long-term care (LTC) residency status in (13) to account for differences in nursing home burdens across countries, so the estimated IFR represents the IFR in non-LTC residents. We incorporate uncertainty in the IFR into our estimates by assuming that the posterior distribution for the IFR for each age group , , is approximately truncated normal, with mean equal to , the median estimate, , standard deviation determined from the lower bound of the 95% , σ credible interval reported in (13), and truncation at 0: .

Population age distributions
We use the UN World Population Prospects 2020 population estimates by single-year age from 0-100yrs (14) for the population age distributions of each country. We make the simplifying assumption that population sizes and age distributions have remained constant over the course of the pandemic.
Backcalculation of infections from age-stratified death data The number of deaths in a particular age group on a particular day , , is related to the , number of infections on preceding days, , by the convolution of the infection , − (0 ≤ < ) counts with the distribution of the infection-to-death delay, , and scaling with the IFR for ℎ age group on those days, , i.e.: where is the probability of a delay of days between infection and death. = ( ℎ = ) Note that here, unlike in other studies (15)(16)(17), the IFR depends on time of infection, because the IFR changes as vaccine coverage increases.
We use the convolution of the latent period distribution (the time between infection and start of infectiousness), , and the time from start of infectiousness to death, , for the  (18), where denotes the gamma distribution with shape ( , θ) parameter and scale parameter . Since the data is in discrete time with time steps of a day, θ we discretise these distributions by integrating the probability density functions over the half day either side of the start of each day, and use the discrete convolution to find the infection-to-death delay distribution. This gives a distribution with a mean of 17.4 days and a standard deviation of 9.9 days.

Calculation of time-varying infection fatality risk
A number of factors are likely to have affected the SARS-CoV-2 IFR over the course of the pandemic, including surges in demands for hospital beds, improvements in quality of care for hospitalised COVID-19 patients, differences in variant severity, and vaccination. For the sake of simplicity and consistency of comparisons, and because it is likely to have had the largest influence, we consider only the impact of vaccination on the IFR in this analysis. We model the time-varying IFR as a weighted average of the IFR for unvaccinated individuals and that for vaccinated individuals. Specifically, we want the IFR to reflect the expected split in deaths Vaccine-induced immunity against hospitalisation and death wanes relatively slowly (19) and reinfections appear to be relatively rare and generally associated with milder symptoms (20), so the proportion of deaths that are from reinfections is likely to be small. This, coupled with the fact that vaccine rollout occurred sufficiently early in most European countries that the number of vaccinated individuals dominates the number of previously infected individuals, and vaccine effectiveness against infection is relatively high, means that the proportion of infections in vaccinated individuals at time , , can be approximated as , , / , , , , where is the effectiveness of the vaccine against infection at time , and is the vaccine , , coverage in age group at time (21). The IFR for vaccinated individuals is given by: where is the vaccine effectiveness against disease given infection and is the vaccine , , effectiveness against death given disease. Thus, the time-varying average IFR can be approximated as: (1 − , )(1 − , )) , The above concise expressions are possible because of the simplifying assumption that immunity does not wane. Including a waning assumption within this IFR calculation would require a less tractable model that tracks the full transmission process, which given available data would likely lead to identifiability issues for many countries. We

Estimation of variant proportions over time
We estimate the relative proportions of the "Other", Alpha, and Delta variants over time in each country by fitting multinomial regression models to the frequencies of sequenced variants of each type in GISAID data and COG-UK data, in line with several other studies (18,22,23). We test two fixed-effect multinomial models: one with just sample date as the explanatory variable and one with a two-degrees-of-freedom natural cubic spline function of sample date as the explanatory variable (18). The models are fitted using the multinom function in the nnet R package and their fit is compared using Akaike Information Criterion (AIC). The model with the lowest AIC, the cubic spline model, is selected and used to calculate the variant proportions over time for each country. The input data and estimated variant proportions over time are shown in Supplementary Figure S7.

Deconvolution of deaths to infections
Having calculated the time-varying IFR, we estimate the time series of infections for each age group for each country using the Robust Incidence Deconvolution Estimator (RIDE) provided in the incidental R package (24,25), an empirical Bayes deconvolution algorithm developed and validated for inferring SARS-CoV-2 infection time series from case/hospitalisation/death time series (18,24). RIDE uses a cubic spline model for underlying infections and a Poisson likelihood to describe observed cases/deaths. The degrees of freedom of the spline basis are chosen based on lowest AIC over a grid of pre-specified values, and a regularisation penalty is imposed on the second difference of the spline parameters to encourage smoothness, and the regularisation parameter chosen by maximising held-out log-likelihood from splitting of the data into training and validation data sets over a grid of pre-set values (see (24,25) for further details). We use a grid of the even numbers between 6 and 30 for the degrees of freedom of the cubic spline -a wider range than the default setting (6 to 20) to allow sufficient flexibility to fit the observed death time series. RIDE handles right censoring of the death time series by making multiple extrapolations of the observed death curve with a simple autoregressive random walk model that has the same autocorrelation as the observed data. The mean of the autoregressive model is determined by fitting a linear model to the tail of the observed death curve over a certain time window. We use a longer time window (28 days vs 14 days) and higher prior precision (100 vs 10) for the slope of the extrapolation than the default values in order that the inferred infection curve reflects the considerable uncertainty in the future epidemic trajectory.
We first aggregate COVID-19 deaths into the following age groups: 0-39, 40-49, 50-59, 60-69, 70-79, 80+yrs. The age grouping is chosen to ensure that there are sufficient deaths in the youngest age group to reliably infer the underlying infection time series even for countries with low numbers of deaths, since the IFR increases approximately exponentially with age (13,26) and low numbers of deaths lead to unstable estimates of infections (24,27). We then split deaths by LTC residency status as there is substantial evidence that IFRs among LTC residents have been higher than in the same age groups outside of LTC facilities (13,(28)(29)(30). We assume that all deaths in LTC facilities occurred in individuals aged 60yrs and older, and use data on the proportion of all deaths that have occurred in care homes, , for each country from the LTCcovid platform (13,28) to estimate the proportion of deaths amongst those aged 60yrs and over that have occurred in care homes. Namely, if is the proportion of 60+ all deaths amongst individuals 60+yrs, we calculate the number of deaths amongst those 60+yrs that occur among LTC residents in each age group , , as: For France the reported death data is for hospital deaths only, so we estimate the proportion of deaths among those 60+yrs as: and scale up the hospital deaths among those 60+yrs to obtain overall deaths in those 60+yrs: For countries for which information on the proportion of deaths among LTC residents is not available, we use the mean proportion across countries for which LTC death data is available.
With deaths split in this way, we run RIDE on the time series of deaths for each age group under 60yrs and the separate time series for non-LTC and LTC residents for each age group over 60yrs to estimate the time series of infections that led to these deaths. We then calculate the overall number of infections by dividing by a value of the IFR for the corresponding age group drawn from the distribution in Equation (1). Following (13), we take the relative frailty factor for death from COVID-19 for LTC residents to be , i.e. use a 3.8 times higher IFR for LTC γ = 3. 8 residents than non-LTC residents. where the minimum function is to ensure the number of susceptible individuals cannot become negative. Solving the difference equations allows us to estimate how many susceptible (unexposed and unvaccinated) individuals , uninfected vaccinees , and previously the same as it is now (as estimated from the deconvolution), and assume that the relative frailty factor for hospitalisation from COVID-19 for LTC residents is the same as that for death from COVID-19. So, using estimates of the infection hospitalisation risk (33) and infection  We calculate 95% CIs for the remaining hospitalisations and deaths by repeating the above calculations for each of the 1000 posterior samples of the infection time series generated by the deconvolution and IFR scaling (each time with a separate draw for the IHR and the draw for the IFR used in the backcalculation), and calculating the 2.5th-97.5th percentile of the resulting distribution.

Comparison of inferred infections with reported cases
For comparing the inferred infection time series with time series of reported cases, we convolve the inferred infection time series with the incubation period (time from infection to symptom onset), which we take to be the convolution of the latent period ( ) and the ∼ (2. 5, 1) preclinical infectious period ( ), which gives a distribution with mean 5 ∼ (4, 0. 625) days and standard deviation 2 days (18). Supplementary Figure S8 shows that the temporal patterns of inferred infections and reported cases match closely across virtually all countries and age groups. Inferred infections are higher than reported cases for all but a few age groups in a few countries, with larger differences for younger age groups. This is as expected given under-reporting of symptomatic infections, the high proportion of asymptomatic infections, and the decrease in the asymptomatic proportion with increasing age (31).

Limitations
The main limitations of our analysis are that it only provides a theoretical upper bound on the potential number of hospitalisations and deaths that could occur in the different countries considered, and that it does not account for waning of immunity, the rollout of boosters or the emergence of new variants. Our approach is also reliant on a number of relatively strong assumptions. Firstly we assume that COVID-19 deaths are completely and sufficiently consistently reported across different countries such that they can be used to estimate and compare cumulative infection burden between countries. However, even among European countries, where reporting of COVID-19 deaths is likely to be more complete than elsewhere, it is possible that there is some under-reporting of COVID-19 deaths, and definitions for COVID-19 deaths differ to some extent between countries and in some countries have changed over time (34). Potential differences in reporting of COVID-19 deaths could be further explored by examining the ratio of excess deaths to COVID-19 deaths reported in each country over time (35).
We also assume that the delay between infection and reported date of death and the age-dependent IFR in unvaccinated individuals are similar enough across countries that we can use the same values for all countries when inferring underlying infections. Different countries use different types of date (occurrence, registration, report) to report deaths so there is likely some variation in the delay between infection and reported death date (34) and there may also be some variation in the delay between infection and occurrence of death between countries due to differences in availability and quality of ICU care. There is also evidence of additional variation in the IFR between countries over that expected just from differences in population age structure (13). However, we believe that the variation in these quantities for the countries considered is likely to be sufficiently small (see Figure 2C in (13) for variation in the population-weighted IFR for the countries we analyse) that it would not significantly change our estimates.
We only consider variation in the IFR due to vaccination, but there are several other potential sources of variation, including differences in severity between variants. Several studies have reported progressively increased severity (corresponding to a higher IFR and/or IHR) for Alpha and Delta in unvaccinated individuals over the original virus strain (36)(37)(38)(39)(40), but have not found statistically significant differences in relative risks of hospitalisation and death between variants for vaccinated and unvaccinated individuals (38,39). The severity of different variants may therefore differ less for vaccinated individuals than unvaccinated individuals. Furthermore, comparison of our estimates for the cumulative proportion infected or vaccinated in England with age-stratified seroprevalence estimates suggests that our approach captures the overall change in the IFR that has occurred over time relatively accurately (Supplementary Figure S3).
The approximation of the split of infections between unexposed & unvaccinated individuals and vaccinated individuals that we use to construct the time-varying IFR (Equation (3)) assumes that vaccination rollout happened early and fast enough in most countries that the vaccinated proportion of the population dominates the proportion previously infected. It also assumes that vaccine effectiveness against infection is relatively high so the number of breakthrough infections is small relative to the number of infections in unvaccinated individuals, and the current number of vaccinated individuals can be roughly approximated by the vaccination coverage. Whilst these assumptions are reasonable for most of the European countries we consider and for the time period we analyse, they are less valid for other countries where vaccine rollout has been slower, which limits the potential applicability of the analysis beyond the countries considered. The deconvolution method also does not account for reinfections, but to date the proportion of infections that are reinfections appears to be relatively small and protection against severe outcomes from previous infection (20) likely means that the contribution of reinfections to deaths is small.
Another limitation of our analysis is that the deconvolution method relies on sufficient numbers of deaths in each age group to produce stable and temporally accurate estimates of infections. Combined with the approximately exponential increase in the IFR with age, this means that we have to use a large lowest age group (0-39yrs) to obtain reliable infection estimates for countries with low numbers of COVID-19 deaths. This results in a loss of information about differences in infection incidence between age groups <40yrs, especially between adults and children, given many children are still unvaccinated and therefore at higher risk of infection (41). As we assume the same infection risk for all individuals within each age group, the large lowest age group also means that we do not account for variation in infection risk for individuals <40yrs, despite the fact that infection risk varies with age in this group due to heterogeneity in contact patterns (31,42). The estimation issue created by low numbers of deaths in some countries could potentially be resolved by using age-stratified hospitalisation data in addition to, or instead of, death data. However, while this could provide more accurate estimates within each country, such data is not publicly available for all countries and differences in admission criteria between countries would limit comparability.
We assume that immunity acquired through infection provides approximately the same protection against infection, hospitalisation and death as two doses of a viral vector vaccine or mRNA vaccine. This is based on findings of similar levels of protection against symptomatic infection with the Delta variant from prior natural infection and two doses of the AstraZeneca/Pfizer vaccines in a large infection study in the UK (32), and estimates from some studies suggesting higher protection against hospitalisation and death from vaccination than prior infection (at least in individuals aged ≥65) (43) and estimates from others the opposite (44,45). However, some studies have observed higher rates of reinfection among previously infected individuals than vaccinees (46)(47)(48), suggesting that we may be underestimating the potential remaining burden among previously infected individuals. On the other hand, we do not account for stronger protection against reinfection among vaccinees who have been previously infected (44,45), which would have the opposite effect on the remaining burden estimates.

Code
The code and data underlying this analysis are available online at https://github.com/LloydChapman/covid_remaining_burden. The potential remaining burden estimates can be found in the output folder. *These coverages are much higher than the official vaccination coverages for broader age groups (12-17, 18-59, 60+yrs) reported by RKI (7), likely due to selection bias in the survey participants, but can be used to calculate relative vaccination coverage within the broader age groups.  Genomics UK consortium (England) (12).