Case fatality risk of the SARS-CoV-2 variant of concern B.1.1.7 in England, 16 November to 5 February

The SARS-CoV-2 B.1.1.7 variant of concern (VOC) is increasing in prevalence across Europe. Accurate estimation of disease severity associated with this VOC is critical for pandemic planning. We found increased risk of death for VOC compared with non-VOC cases in England (hazard ratio: 1.67; 95% confidence interval: 1.34–2.09; p < 0.0001). Absolute risk of death by 28 days increased with age and comorbidities. This VOC has potential to spread faster with higher mortality than the pandemic to date.


Further information on OpenSAFELY
All data were linked, stored and analysed securely within the OpenSAFELY platform https://opensafely.org/. The dataset analysed within OpenSAFELY is based on 24 million people currently registered with GP surgeries using TPP SystmOne software. Data include pseudonymized data such as coded diagnoses, medications and physiological parameters. No free text data are included. All code is shared openly for review and re-use under MIT open license (https://github.com/opensafely/SGTF-CFR-research). Detailed pseudonymised patient data is potentially re-identifiable and therefore not shared. We rapidly delivered the OpenSAFELY data analysis platform without prior funding to deliver timely analyses on urgent research questions in the context of the global Covid-19 health emergency: now that the platform is established we are developing a formal process for external users to request access in collaboration with NHS England; details of this process will be published shortly on OpenSAFELY.org.

Information governance and ethics
NHS England is the data controller; TPP is the data processor; and the key researchers on OpenSAFELY are acting on behalf of NHS England. This implementation of OpenSAFELY is hosted within the TPP environment which is accredited to the ISO 27001 information security standard and is NHS IG Toolkit compliant; 1,2 patient data has been pseudonymised for analysis and linkage using industry standard cryptographic hashing techniques; all pseudonymised datasets transmitted for linkage onto OpenSAFELY are encrypted; access to the platform is via a virtual private network (VPN) connection, restricted to a small group of researchers; the researchers hold contracts with NHS England and only access the platform to initiate database queries and statistical models; all database activity is logged; only aggregate statistical outputs leave the platform environment following best practice for anonymisation of results such as statistical disclosure control for low cell counts. 3 The OpenSAFELY research platform adheres to the obligations of the UK General Data Protection Regulation (GDPR) and the Data Protection Act 2018. In March 2020, the Secretary of State for Health and Social Care used powers under the UK Health Service (Control of Patient Information) Regulations 2002 (COPI) to require organisations to process confidential patient information for the purposes of protecting public health, providing healthcare services to the public and monitoring and managing the COVID-19 outbreak and incidents of exposure; this sets aside the requirement for patient consent. 4 Taken together, these provide the legal bases to link patient datasets on the OpenSAFELY platform. GP practices, from which the primary care data are obtained, are required to share relevant health information to support the public health response to the pandemic, and have been informed of the OpenSAFELY analytics platform.
This study was approved by the Health Research Authority (REC reference 20/LO/0651) and by the LSHTM Ethics Board (reference 21863).

Details on data sources
We used several linked data sets in this analysis (Table S1). OpenSAFELY also contains linked hospitalisation and emergency department data, but those are not used in this analysis, because neither are available in near real-time unlike GP, testing, mortality and vaccination data, which are available after a short lag of 10-16 days.

SARS-CoV-2 tests
Positive and negative tests of people tested under the UK's Pillar 1 and Pillar 2 testing schemes are reported. 5 Pillar 2 generally is community tests and Pillar 1 is tests in hospital (patients and health care workers). Only some of the labs used for testing in England use the 3 channel PCR for which a "failure to detect the Spike-gene target" is indicative of the VOC, therefore not all positive tests have known SGTF status. The data come from Public Health England's (PHE) Second Generation Surveillance System.
General Practitioner (GP) data GP data are drawn from patients who are registered at a practice that runs the TPP SystmOne (https://www.tpp-uk.com/products/systmone). This is approximately 40% of GPs in England. Each patient encounter with a GP is coded using CTV3 codes, which fully aligns with SNOMED-CT which describe the reason for the encounter, and these codes are used to define the health history of each individual. 6 Prescribed medications are also stored in the health record. Demographic data such as age and ethnicity are collected by GPs, as are some behavioural data like whether an individual smokes.

Mortality date
The date of death plus codes for the cause of death are from the Office for National Statistics. We only use date of death in this study.

Vaccination date
The date, dose number, vaccine manufacturer and batch are entered into their health record. We only use the date of administration of the first dose in this study.

Index of multiple deprivation (IMD)
We use the England IMD which is matched to individuals at the postcode level.

Urban/Rural classification
We use 5 categories of Urban/Rural classifications which are matched to individuals at the postcode level. Table S1. Data sources used in this analysis.   We are grateful for the hard work processing these data at each of the three lighthouse laboratories mentioned above.

SGTF positivity in TPP compared to PHE
The proportion of positive tests with SGTF in PHE data 7 and in OpenSAFELY in each epidemiological week is very similar ( Figure S1). Discrepancies arise due to the geographic variation in which GP providers use TPP software. There are relatively fewer TPP SGTF positives in the NHS regions South East (especially northern Kent) and North East (especially the area around Cumbria), both of which experienced early increases in SGTF. Figure S1. The percentage of positive tests in PHE reports 7 of SGTF compared with the proportion of tests in OpenSAFELY at the NHS region level. 8.

Study Protocol
Case fatality risk of the SARS-CoV-2 variant of concern B.1.

Background
The SARS-CoV-2 (COVID-19) variant of concern B.1.1.7 (VOC) was first identified in Kent, UK in autumn 2020. Early analysis suggests the VOC is more transmissible and it has since become the dominant strain throughout the UK. Only a small number of VOC cases are identified by whole-genome sequencing. Spike gene target failure (SGTF) has been adopted as a proxy for identifying VOC and has been shown to identify the VOC in more than 95% of cases during the period 16 th November -11 th January. 1 Studies using Public Health England (PHE) line listing data, hospital admissions and ONS death data have assessed the relative fatality of the VOC compared to the originally circulating viral strain (non-VOC) and have consistently demonstrated an increase in mortality associated with the VOC. 2,3 Although these studies were able to account for age, sex, ethnicity, deprivation index, time period and geographical area, they were unable to account for comorbidities, which have been shown to be strongly associated with death among those diagnosed with COVID-19. 4

Objectives
To estimate the risk of death following confirmation of SARS-CoV-2 infection, comparing the risk among those infected with the VOC to those infected with the non-VOC accounting for both demographic factors and comorbidities. The risk of death will be quantified using the following methods: i. A relative risk model, estimating relative and absolute 28-day all-cause mortality risk ii.
A Cox Proportional hazards regression model, estimating a hazard ratio

Study Design and Population
We will use a cohort study design nested within the OpenSAFELY platform. Using test result data from the PHE Second Generation Surveillance System (SGSS), we will select all those people who are: (1) positive for SARS-CoV-2 based on PCR swab test results in the time window 16 st November to 11 th January and (2) have data on SGTF The study will focus on the comparison between those with SGTF and those without. Inconclusive SGTF results are considered in the sensitivity and additional analyses section.
The primary analysis will focus on a 28-day all-cause mortality outcome. All-cause mortality will be determined from ONS death data and is expected to have full ascertainment of deaths with a 2-week delay. Therefore, the 28-day all-cause mortality analysis will include all individuals with at least 42days follow-up from the date of COVID-19 diagnosis to the date of last ONS death data upload (28days plus 14-days to account for the delay in the ONS death data).
For Cox proportional hazards analysis all individuals will be included. Follow-up will be censored two weeks prior to the date of ONS death data upload for those without a documented date of death.

Inclusion criteria
• A positive SARS-CoV-2 PCR swab test result in SGSS within the window 16 th November to 11 th January • Data available on SGTF in SGSS.
• Registered with a primary care practice using The Phoenix Partnership (TPP) software on the date of COVID-19 diagnosis, with at least one year of continuous GP registration.

Exclusion criteria
• Missing age, sex, or index of multiple deprivation, as these are indicators of poor data quality.
• COVID-19 diagnoses prior to the diagnosis in the study time window (based on either a positive test for SARS-CoV-2 in SGSS data or a diagnosis for COVID-19 in primary care). • Receipt of vaccination against COVID-19 prior to diagnosis in the study time window.

Causal framework
The motivation for adjusting for demographics and comorbidities is not that they impact on the variant of COVID-19 infection per se, but are likely to be associated with the upstream process of getting a test (e.g. test-seeking behaviour, ability to access testing facilities). Therefore, adjustment attempts to correct for imbalances between the VOC and non-VOC exposure groups with respect to factors associated with getting a test. With the study population defined by SARS-CoV-2 positive test and SGTF data available, the minimum sufficient adjustment set implied by Figure 1 is: Age, care home status, comorbidities, deprivation index, smoking status.

Study Measures
Exposure SGTF on SARS-CoV-2 PCR swab test from SGSS data, referred to as the VOC exposure group. The comparator group being SARS-CoV-2 diagnoses without SGTF in SGSS data, referred to as the non-VOC group.

Outcomes
Death from any cause.
Region, defined by middle layer super output area (MSOA) from patient post code, or NHS England region.
Rural and urban location classification, and care home status.
Epidemiological week of the positive test.

Baseline characteristics
Participant characteristics, including all covariates listed above, will be described at baseline (the date of positive SARS-CoV-2 test), comparing the two exposure groups (VOC and non-VOC). Continuous variables will be summarised by the mean and standard deviation and compared with a t-test, or median and interquartile range and Wilcoxon signed-rank test, as appropriate. Categorical variables will be summarized by the number and proportion in each group (n (%)) and compared with a chisquare test.
The median time-to-death and interquartile range of those who die will be presented by exposure.
The proportion of SARS-CoV-2 positive tests with SGTF, identifying the VOC, will be plotted over the study period by NHS England region and descriptively compared to PHE data for the whole population.

28-day all-cause mortality
Case fatality risk will be calculated at 28-days post SARS-CoV-2 positive test result. Therefore, only those with 28-days of follow-up or a date of death within this window will be included in this analysis.
The relative risk for VOC cases vs. non-VOC will be calculated from a generalised linear regression model with binomial distribution and log link function. Absolute risk will be estimated by the predicted risk from this model.

POST ANALYSIS NOTE:
there were model convergence issues when working on the risk scale. As per the limitations section, 28-day risk was therefore calculated from a logistic regression model.

Cox proportional hazards regression
The relative hazard of death for SGTF cases vs. non-SGTF will be calculated from a Cox proportional hazards regression model, with no requirement for 28-days follow-up time. Follow-up will be censored at the earliest of two weeks prior to the date of ONS death data upload or 7 days prior to receipt of a COVID-19 vaccination.
The hazard of death following a SARS-CoV-2 positive test result is expected to vary considerably between regions in England over time. Consequently, adjustment for region is unlikely to satisfy the proportional hazards assumption of a Cox model. To account for this variability, we will stratify the analysis on region, allowing a separate baseline hazard to be estimated for each region, but with covariate effects estimated over the full population -a stratified Cox PH model. The definition of regions is discussed below.

Covariate adjustment
Unadjusted, demographically adjusted, and fully adjusted estimates will be presented for each analysis.
Demographically adjusted models will include adjustment for the following covariates: Age will be included as a cubic spline term. Ethnicity will be grouped into five categories. The primary analysis will exclude patients with missing ethnicity. Sex, deprivation index, household size, and type of residence will be included as categorical terms.
Epidemiological week of the baseline SARS-CoV-2 positive test will be included as a categorical variable.
Region will be defined by UTLA, unless data sparsity prevents this level of granularity. In which case region will be defined by STP, or aggregated geographical areas defined by NHS England region. Rural or urban location classification will be included as a categorical variable with 5 levels in line with other work.
Fully adjusted models will additionally adjust for patient comorbidities, smoking status, and obesity status. Comorbidities will be aggregated into a categorical term taking values none, 1, and 2 or more.
In line with previous work on the risk of death from COVID-19 on the OpenSAFELY platform. For smoking and obesity, missing values will be categorised as never smoked and no evidence of obesity, in line with previous OpenSAFELY studies. 4,5 The causal framework indicates both these adjustment sets result in a causal estimate of the effect of the VOC on mortality. For comparison, we will also fit a model using the minimum sufficient adjustment set implied by the causal DAG (Age, comorbidities, deprivation index, smoking status).

Defining regions
Regional stratification will be a key consideration due to variability in the incidence of COVID-19 outcomes over time. Regions will be defined using patient middle super output area (MSOA) codes derived from patient post codes. MSOA data will be aggregated into upper tier local authority areas (UTLA) which will be the primary definition of regions for analysis.
Sensitivity analysis will define region by sustainability and transformation partnership (STP) areas, also defined from patient post codes. This analysis will assess the impact of regional definitions on the estimated risk and hazard of death for SGTF cases vs. non-SGTF.
Should data sparsity preclude regional adjustment at the UTLA and STP level, aggregated geographical areas defined by NHS England region will be used instead.

A priori subgroup analyses
Case fatality relative and absolute risk will be estimated in subgroups of a priori interest, after adjustment for confounding. Differences in risk in these subgroups will be formally tested with a likelihood ratio test for an interaction with SGTF exposure status.
The subgroups of interest to be assessed are: Differential time from positive SARS-CoV-2 test to death by SGTF exposure status has the potential to bias the analysis of risk. An additional analysis will consider an increase in the risk period to 40-days to assess the sensitivity of the findings to the risk period definition.
Inconclusive SGTF results SGTF flags will be inconclusive in some cases. SGTF data are expected to take the values yes, no, maybe, unknown. The primary analysis will focus on the comparison of the yes group (VOC) with the no group (non-VOC). In additional analysis the risk of death in the maybe and unknown groups will also be quantified and compared to that of the VOC and non-VOC exposure groups.

Multiple imputation of missing ethnicity
Previous work in OpenSAFELY has identified that ethnicity data are missing for up to one quarter of all patients. The primary analysis will use the complete case set with regards to ethnicity. This sensitivity analysis will assess the impact of excluding records with missing ethnicity by imputing missing ethnicity using multiple imputation based on all variables included in the full adjustment set.
Software and reproducibility Data management will be performed using Python and Google BigQuery, with analysis carried out using Stata 16.1 / Python. Code for data management and analysis as well as codelists archived online https://github.com/opensafely/sgtf-cfr-research.

Feasibility and power calculations
To be assessed when SGTF data are available.

Strengths and Limitations
Risk models fail to converge Although the size of our study population will likely be considerable with a large number of deaths, it's possible that the models of risk may fail to converge due to data sparsity in some covariate subsets. If this is the case we will revert to a logistic regression approach, estimating the log odds ratio. Inference on the risk of death will then be performed by converting predicted odds to estimates of absolute risk.

Non-random availability of SGTF data
Although the fact that we adjust for factors associated with getting tested should help account for possible non-random availability of SGTF data, we will also compare the characteristics of people included in the study (who all have SGTF data) with those not included in the study due to lack of SGTF data. This will help us assess whether those with SGTF are representative of all those tested during the time period of the study, and allow us to discuss the implications of this in our write-up as necessary.