Estimating number of cases and spread of coronavirus disease (COVID-19) using critical care admissions, United Kingdom, February to March 2020

An exponential growth model was fitted to critical care admissions from two surveillance databases to determine likely coronavirus disease (COVID-19) case numbers, critical care admissions and epidemic growth in the United Kingdom before the national lockdown. We estimate, on 23 March, a median of 114,000 (95% credible interval (CrI): 78,000–173,000) new cases and 258 (95% CrI: 220–319) new critical care reports, with 527,000 (95% CrI: 362,000–797,000) cumulative cases since 16 February.


Estimating the age-dependent proportion of infected people who are admitted to critical care
To estimate the proportion of COVID-19 infected people who are admitted to CC, three data sources were used: (a) estimates by Verity et al. [1] from case data in China and estimates of infection prevalence from exported cases, that were analysed to estimate the agedependent proportion of infected cases that were hospitalised, (b) estimates by the Centres for Disease Control and Prevention (CDC) (Bialek et al. [2]) from case data in the USA on the age-dependent proportion of reported cases that were hospitalised or admitted to ICU, and (c) estimates from Riccardo et al. [3] from case data in Italy on the age-dependent proportion of reported cases that were hospitalised or admitted to ICU.
The CDC analysis contained two scenarios: a low scenario with all reported cases as the denominator, and a high scenario where only those with known hospitalisation status were used in the denominator. The CDC low scenario was used for our main analysis, but sensitivity analyses were conducted using the CDC high scenario and the Italian data.
Since these data were only available by age group, linear (r = M + Na), exponential (r = Me Na ) or logistic (r = M/(1+exp(-N(a-P)) models (where r is the risk of hospitalisation/ICU admission, a is age in years and M, N and P are inferred parameters) were fitted to these data in order to determine risks by single year of age. To infer M, N and P, Bayesian updating with a Binomial likelihood function was used, taking into account the actual number of cases and denominators corresponding to each age group. M, N and P were sampled from their posterior distributions by using importance sampling; 10,000 parameter sets for both were drawn from uniform distributions and then resampled with replacement at a probability for each sample weighted by the likelihood of that parameter set. To determine the range of the uniform distribution, sampling was first conducted over very broad priors, constructed from the distributions in Table S1 below. After resampling, the ranges of parameters in the resampled sets were then determined -these ranges were more likely to contain parameter sets with a high likelihood of fitting data. We hence sampled a second time from uniform distributions over these ranges to obtain our final posterior distributions. These data were then combined to estimate the risk of ICU admission in infected patients using the following formula: P(admitted to ICU | infected) = P(hospitalised | infected) x P(admitted to ICU | reported case) / P(hospitalised | reported case) We assumed that this was the same as the risk of CC admission. These "multipliers" were then used to calculate the estimated number of infected people in the UK given an estimate of the number of COVID-19 CC admissions according to the equations below: Let AFF100(a) and ACHESS(a) be the number of CC cases aged a years in the FF100 and CHESS datasets respectively, and P(a) be the probability of a person infected with COVID-19 at age a being admitted to CC based on our analysis of Chinese, US and Italian dataset. Then m(a) = P(a) -1 is the corresponding multiplier giving the number of infected people of age a when the number of CC admissions of age a is P(a).
Let mFF100 and mCHESS the age-adjusted multiplier to estimate the number of infected people given the number of CC admissions for the FF100 and CHESS datasets respectively. Then . Then: Number of infected people on a given day using FF100 data = Number of CC admissions with symptom onset on that day x mFF100 Number of infected people on a given day using CHESS data = Number of CC admissions with symptom onset on that day x mCHESS Children below 15 years were removed from UK datasets when calculating multipliers since there were no ICU admissions in these age groups in the other datasets -we believe that the over-representation of younger people in the CHESS dataset in particular reflects biases in reporting (see Discussion of main paper).
Figures S1.1 -S.14 below show the posterior distributions for the best fitting models at each stage of the process.

Extraction of data
To estimate the delay between onset of symptoms and being reported as a CC admission, we extracted the observed time difference for 70 cases in the FF100 dataset (extracted on 31 March 2020) who were admitted to CC (i.e. all cases in the FF100 that were labelled as sporadic, and have both a symptom onset and a reporting date that was later than the date of symptom onset). The reporting date was assumed to be the date they were admitted to CC, since in all cases it was later than the date of hospital admission.
The distribution of time differences is shown in Figure S2 below, this has a median of 6.5 days (interquartile range 4.2 -10, 95% interval 1 -20).

Fit to data
The data were fitted using a discretised Gamma function (with probability distribution  Figure S2 below. Figure S2. Observed delay between onset of symptoms and being reported as a CC admission for patients in FF100 with both dates, and 95% credible interval for a gamma distribution using the posterior distribution of parameter sets.

Use in the model
The delay distribution was applied to the distribution of infections over time to generate a distribution of reported CC cases over time.

Issues around truncation bias
Note that the observed distribution of delays is subject to right truncation, i.e. in the middle of an epidemic, some patients will not yet be recorded as CC admissions, due to the delay between onset of disease and reporting. This reporting delay means that, on any particular day, the data available on onset cases will exclude those cases which have onset but have not yet been reported. We therefore see only a portion of the true epidemic curve. It is possible to adjust for this bias by nowcasting -using the delay we have observed for already reported cases to estimate how many additional, onset-not-yet-reported cases are ongoing at any point in time. Particularly early on in the epidemic when incidence is increasing exponentially, the observed delay distribution will be biased downwards since only patients with shorter delays will already have been reported to the system. As time goes on and we observe enough of the epidemic to capture the longest reasonable delay, this bias will diminish and the delays can be estimated directly but will likely have changed from the early delays.
However, with these data sources such a correction is difficult to do, not only because of the small sample size, but also because there are in fact two sources of right-truncation: (i) the delay from onset of symptoms to being reported as a CC case as just described (a "patient" delay) and (ii) the delay from being reported as a CC case to actually being available in the dataset to national researchers (a "system" delay, for example varying by the reporting trust). Decreasing effort in maintaining the dataset also likely contributed to the tailing-off of reported cases in the FF100; whether due to delay or poor completion, some cases reported on those days will never be added to this dataset. This would bias counts of CC admissions which is one reason why the only counts up to 6 March in the FF100 were used for model fitting; however for the delay distribution we used all patients with a date of reporting for CC admission up to the last complete dataset (31 March).

Sensitivity analyses
We performed sensitivity analyses by considering the following alternative scenarios: • Dataset used. Restricting the analysis to the CHESS dataset only (i.e. not using the FF100 dataset).
• Altering the sensitivity of COVID-19 detection in CC to 75%; this could reflect both testing sensitivity (e.g. poor sample collection) or incomplete reporting of test positives.
• Changing the period of validity of the FF100 database to the full period instead of just to 12 March.
• Using the CDC "high" scenario for severity of COVID-19 • Using the Italian dataset instead of the CDC dataset.
• Including individuals under 15 years old in the CHESS data set when calculating the proportion of infected cases who need CC. The