Estimating the false-negative test probability of SARS-CoV-2 by RT-PCR

Background Reverse-transcription PCR (RT-PCR) assays are used to test for infection with the SARS-CoV-2 virus. RT-PCR tests are highly specific and the probability of false positives is low, but false negatives are possible depending on swab type and time since symptom onset. Aim To determine how the probability of obtaining a false-negative test in infected patients is affected by time since symptom onset and swab type. Methods We used generalised additive mixed models to analyse publicly available data from patients who received multiple RT-PCR tests and were identified as SARS-CoV-2 positive at least once. Results The probability of a positive test decreased with time since symptom onset, with oropharyngeal (OP) samples less likely to yield a positive result than nasopharyngeal (NP) samples. The probability of incorrectly identifying an uninfected individual due to a false-negative test was considerably reduced if negative tests were repeated 24 hours later. For a small false-positive test probability (<0.5%), the true number of infected individuals was larger than the number of positive tests. For a higher false-positive test probability, the true number of infected individuals was smaller than the number of positive tests. Conclusion NP samples are more sensitive than OP samples. The later an infected individual is tested after symptom onset, the less likely they are to test positive. This has implications for identifying infected patients, contact tracing and discharging convalescing patients who are potentially still infectious.


Summary model output
This is a representation of the fitted values in the final model for test sensitivity as returned by mgcv::summary in R:

Estimating the false-negative error rate in cohorts of tested individuals
Using the GAMM model, we estimated the aggregate false negative rate for hypothetical cohorts of tested patients.To do this, we considered a range of Gamma distributions as parameterised by the mode and standard deviation.These distributions were used to describe the time between the onset of symptoms and patients being tested.The shape (S) and rate (R) parameters were written as functions of the mode (M) and standard deviation ( ) [23] : We explored arrival time distributions with modes ranging from 0.1 to 5 days and standard deviations ranging from 0.5 to 5. We discretised the arrival time distribution ( ) to give the proportion of patients (x) Γ in a cohort being tested on a given day.These fractions were then multiplied by the estimated probability of a false negative predicted by the GAMM function (f(x)) for a single nasal swab on that day; summing these together gave the aggregate false negative rate ( P(Neg|Inf) ) for cohorts tested according to this particular arrival time distribution.To get the probability of 2 false-negatives 1 day apart, we simply took the product f(x).f(x+1)and used this in place of f(x).

Estimating the time to test
We assumed the test has perfect sensitivity, such that since all individuals with positive tests must be infected, and so we estimated this for each day using the distribution of time to positive test results for symptomatic individuals from Bi et al.
[13] (a gamma distribution with shape 2.12 and rate 0.39).We discretised this distribution (such that [0, 0.5) corresponds to 0 days from symptom onset, [0.5, 1.5) corresponds to 1 day after symptom onset etc) and truncated it to 31 days, which is the maximum number of days from symptom onset present in the data we analysed.This truncation has no practical impact because > 99.99% of the density of this particular gamma distribution is accounted for at this point.

Meanwhile
is the probability of a positive test result for infected individuals given the day of (ψ |τ ∩ η) P i the test, which is exactly what we estimated in this study.Of course, is unknown.This gives us (ψ) P but as we assumed that individuals are tested only once then for which (τ ∩ η) means that we can easily retrieve: and then the unknown appears in every term on the RHS and so vanishes.(ψ) P

Estimating the true prevalence in a cohort of tested individuals
Supposing that all tests were performed the same number of days after symptom onset; we defined:

•
as the (unknown) true prevalence among those tested α

•
as the false-negative rate for tests done on that day i.e.P(negative test | infected) γ • T is the total number of tests done on that day, of which a fraction q are positive Then the true prevalence among those tested for infection is equal to the sum of (a) P(infected|positive test) multiplied by the number of positive tests and (b) P(infected| negative test) multiplied by the number of negative tests (i.e sum of the true positives and false negatives).These conditional probabilities can be separately rearranged via Bayes' Theorem and then added together to give: When rearranging, this as a quadratic in then we discover it has 2 roots: And so the first root allows us to estimate the true prevalence among the test cohort, while accounting for the false-negative test probability for those tested on that day.
In reality, however, individuals are tested on different days on which the false negative test probability depends, which makes it much harder to estimate in this way.One way it can be done is to use the α distribution for time to test to calculate the average false-negative test probability across all tests conducted, again assuming that all tests are done by nasal swab -here this gives a false-negative test probability of 16.71%.If we do this, then we can still apply the same equations as above and explore how accounting for the false-negative and false-positive test probabilities affects the consequent estimates of the true prevalence among those tested, which we illustrate for some different scenarios in the main text.
Importantly, this only tells us about prevalence in the test cohort and not in the wider population i.e. this does nothing to correct for not finding and not testing mild/asymptomatic cases (as discussed in the main text).

Sensitivity of Zou et al estimates
We utilise data fromZou et al. (2020)who use a combination of mid-turbinate and nasopharyngeal swabs to constitute nasal samples.To determine if there is an effect of using this combination of different swab types on results, we coded the "swab type" variable to have a separate level corresponding to the nasal samples for Zou et al, then compared it to the best fitting model with only two levels in the swab type variable(AIC = 805.31).The inclusion of a Zou-specific correction was not supported (AIC = 805.81,ΔAIC = 0.50).