Increased transmissibility and global spread of SARS-CoV-2 variants of concern as at June 2021

We present a global analysis of the spread of recently emerged SARS-CoV-2 variants and estimate changes in effective reproduction numbers at country-specific level using sequence data from GISAID. Nearly all investigated countries demonstrated rapid replacement of previously circulating lineages by the World Health Organization-designated variants of concern, with estimated transmissibility increases of 29% (95% CI: 24–33), 25% (95% CI: 20–30), 38% (95% CI: 29–48) and 97% (95% CI: 76–117), respectively, for B.1.1.7, B.1.351, P.1 and B.1.617.2.


( ) =
where rv is the growth rate and pv the number of infectious individuals on day t = 0 for variant v. We note rwt the growth rate of non-VOC/VOI variants, referred to as "wild-type". The transmissibility of a variant v can be modelled using an additive term Δ so that its growth rate is defined as rwt + Δv and we have: The proportion among cases of a variant v at time t, and therefore the probability of a sequence Si randomly sampled at time Ti being variant v, is given by: where Nv is the total number of variants circulating in the population. This can be simplified as follows: This model can be fitted using multinomial logistic regression, where the observed categorical outcome is the variant v of a given sequence Si and the single explanatory variable is the time of sampling Ti.
For multinomial logistic regression with a single explanatory variable x1,i, the linear predictor function of the probability that observation i has outcome v is given by: ( , ) = 0, + 1, 1, which, given a logit link function, defines the absolute probability that the observation (i.e. the sequence) Si has outcome v, if the outcome v = 1 = wt is chosen as the "pivot", as: By using the time of sampling Ti as the explanatory variable x1,i, it can be seen that the coefficients of the multinomial logistic regression relate to the logistic growth model as follows: This regression model was implemented and fitted using the multinom function from the nnet package in the R statistical software (R Project for Statistical Computing, r-project.org).

Supplementary Text 2: Estimating reproduction numbers
To relate estimated differences in growth rates to changes in the reproduction number R and generation time distribution, we used the formulation described by Park et al. 1 which assumes a Gamma distributed generation time distribution parameterized by the mean ̅ and squared coefficient of variation : The reproduction number Rv of a variant v can be defined relative to the growth rate of the wild-type strain rwt and the difference in growth rates Δ v : This equation can be re-defined using the equivalence: to generate: which simplifies to: The increase in the reproduction number of a variant v relative to the wild-type strain is therefore: Under the assumption that the coefficient of variation of the generation time distribution is the same between variants in that = : This formulation shows that the ratio of reproduction numbers is dependent on the value of . However, the value of cannot be directly estimated during the period of variant co-circulation as only the composite reproduction number , defined as average reproduction number across the circulating variants weighted by variant prevalence, is observed.
can be calculated from with knowledge of the variant proportions over time, which are directly estimated in the multinomial regression model and given by ( = ), the differences in growth rates Δ also estimated above, the wild-type generation time distribution (i.e. the mean ̅̅̅̅̅ and the squared coefficient of variation ), and the mean generation time of the variant ̅̅̅ , which is unknown but can be explored in sensitivity analyses: This equivalence was solved numerically for using daily, country-specific estimates of taken from publicly available sources. 2 Daily estimates of were used to calculate the daily relative increase in the reproduction number for all variants, which was averaged across the period for which 0.01 < Pr( = ) < 0.99.
The wild-type generation time distribution was assumed to have a mean ̅̅̅̅̅ of 5.0 days and a standard deviation of 1.9 days 3 , giving a squared coefficient of variation = 0.14. Sensitivity of estimates to changes in the mean generation ̅̅̅ time of the variants were explored using a range from -30% to +30% relative to wild-type (i.e. 3.5 days to 6.5 days).

Supplementary Text 3: Model fitting
The multinomial growth model was fitted to countries with at least 250 sequences submitted since 1 November 2020. Variants were included if at least 25 sequences were submitted by a given country and were designated as non-VOC/VOI otherwise. Variant sublineages were grouped into the parent lineage (e.g. B.1.525.1 was classified as B.1.525). The country of exposure was used instead of the country of reporting to exclude targeted sequencing of incoming travelers.

Supplementary Figures
Supplementary Figure S1. Empirical and modelled variant proportions over time for selected countries. Points and lines represent stacked empirical variant proportions, coloured areas between lines represent absolute empirical proportions, error bars represent multinomial 95% confidence intervals.
Smoothed coloured areas represent stacked modelled variant proportions, white areas represent 95% confidence intervals of the modelled estimates. Darker shaded areas therefore represent overlap between empirical and modelled estimates. VOCs are indicated in bold text, VOIs in plain text. Countries, territories and areas as reported in GISAID do not imply the expression of any opinion whatsoever on the part of WHO concerning the legal status of any country, territory or area or of its author ities.