The SARS-CoV-2 B.1.351 lineage (VOC β) is outgrowing the B.1.1.7 lineage (VOC α) in some French regions in April 2021

To assess SARS-CoV-2 variants spread, we analysed 36,590 variant-specific reverse-transcription-PCR tests performed on samples from 12 April–7 May 2021 in France. In this period, contrarily to January–March 2021, variants of concern (VOC) β (B.1.351 lineage) and/or γ (P.1 lineage) had a significant transmission advantage over VOC α (B.1.1.7 lineage) in Île-de-France (15.8%; 95% confidence interval (CI): 15.5–16.2) and Hauts-de-France (17.3%; 95% CI: 15.9–18.7) regions. This is consistent with VOC β’s immune evasion abilities and high proportions of prior-SARS-CoV-2-infected persons in these regions.


Multinomial log-linear model
To perform the multinomial log-linear model, we used the multinom function from the nnet R package. This function uses neural networks to perform model selection in a step wise manner starting from the null model (i.e. without any predictor).
The model formula was the following: variant ~ age + location_sampling + date:region, where age is the age of the individual (which is treated as an integer and centered and scaled), location_sampling is a binary variable indicating whether the sample was collected in a hospital or not, date is the sampling date (which is treated as an integer and centered and scaled), and region is the French administrative region of sampling.
The multinom function uses an AIC criterion to identify the best model and returns the estimated multinomial logistic regression coefficients as well as their standard error (SE).
These can be used to calculate a z test statistic, which is simply the ratio of the coefficient value to the SE. From there, we can construct a p-value, P>|z|, which is the probability the z test statistic would be observed under the null hypothesis and assuming that z follows a normal distribution. Here, we use a classical significance threshold of alpha 5%. When the p-value is smaller than alpha, the null hypothesis can be rejected and the parameter is considered to be significant.
Note that an alternative approach could be to calculate the 95% confidence interval for the coefficient value of the multinomial model using the SE and a critical value on the standard normal distribution.
To give a more intuitive interpretation of the results, we compute the relative risk ratios (RRR) by taking the exponential of the coefficient values of the model. The RRR reflects, for a given variable, how the risk of belonging to one of the outcomes (here variant detection) varies compared to the control group.
Further details about multinomial log-linear models and their interpretations can be found at [1].

Selection advantage estimation
Following methods developed in population genetics to estimate the selection coefficient of a mutant allele compared to a wild type allele [2], and following earlier studies in epidemiology [3][4][5], we calculate the selection coefficient s by fitting a logistic growth model to the time series of variant frequency.
Indeed, provided that the selection coefficient s does not vary over time, and by denoting p(t) the frequency of an allele (here variants V2 or V3) in the population (here variants V1, V2, and V3), we have the following relationship: Note that this value needs to be scaled with respect to the generation time T, which is here obtained from the serial interval calculated by [6]. Overall, the transmission advantage sT of variants V2 and V3 over variant V1 is given by the formula sT = s T.
In order to estimate s, for each region of interest separately, we first perform a generalised linear model (GLM) with a binomial distribution of the residuals (i.e. a logistic regression) where the response variable is the variant type (V2/V3 or V1) and the factors are the age of the individual (which is treated as an integer and centered and scaled), the sampling date (which is treated as an integer and centered and scaled), and the sampling department, which is the French administrative level below the region. We then use the fitted values from the GLM to perform the fit of the logistic growth function.
Further details about the implementation of the inference can be found in the Supplementary R script with the Supplementary data.