Meta-analysis of diagnostic performance of serological tests for SARS-CoV-2 antibodies and public health implications

Serology-based tests have become a key public health element in the COVID-19 pandemic to assess the degree of herd immunity that has been achieved in the population. These tests differ between one another in several ways. Here, we conducted a systematic review and meta-analysis of the diagnostic accuracy of currently available SARS-CoV-2 serological tests, and assessed their real-world performance under scenarios of varying proportion of infected individuals. We included independent studies that specified the antigen used for antibody detection and used quantitative methods. We identified nine independent studies, of which six were based on commercial ELISA or CMIA/CLIA assays, and three on in-house tests. Test sensitivity ranged from 68% to 93% for IgM, from 65% to 100% for IgG, and from 83% to 98% for total antibodies. Random-effects models yielded a summary sensitivity of 82% (95%CI 75-88%) for IgM, and 85% for both IgG (95%CI 73-93%) and total antibodies (95%CI 74-94%). Specificity was very high for most tests, and its pooled estimate was 98% (95%CI 92-100%) for IgM and 99% (95%CI 98-100%) for both IgG and total antibodies. The heterogeneity of sensitivity and specificity across tests was generally high (I2[≤]50%). In populations with a low prevalence ([≤]5%) of seroconverted individuals, the positive predictive value would be [≤]88% for most assays, except those reporting perfect specificity. Our data suggest that the use of serological tests for large-scale prevalence surveys (or to grant "immunity passports") are currently only justified in hard-hit regions, while they should be used with caution elsewhere.


Introduction
Testing of patients for ongoing infection with SARS-CoV-2 is mostly conducted by detecting viral RNA in airways specimens using RT-PCR-based tests. These tests may prove less helpful in quantifying the actual number of COVID-19 cases in the population, as a large proportion of infected individuals are thought to be asymptomatic [1][2] or may not seek medical care because of mild symptoms, thus going unnoticed by surveillance systems and public health entities. Moreover, once the infection is resolved, RT-PCR tests are not informative for the previous infection. In order to overcome these shortcomings, serology-based tests are being increasingly used with the aim of gaining greater detail into the true prevalence of COVID-19 and to assess the degree of herd immunity that has been achieved in the population. Serology-based tests have thus become a key public health element in the COVID-19 pandemic and there has been a rapid growth in the number of available SARS-CoV-2 serological tests over recent months. These tests differ between one another in several ways, including the antigens used for antibody detection, the type of antibodies identified, and the laboratory method. Here, we conducted a systematic review and meta-analysis of the diagnostic accuracy of currently available SARS-CoV-2 serological tests, and assessed what their real-world performance under scenarios of varying proportion of infected individuals.

Methods
We carried out a systematic literature search (updated to April 19 th ) to review scientific articles and technical manuals (referred to as "studies" henceforth) on immunological tests for detection of SARS-CoV-2 antibodies. We considered independent studies that specified the antigen used for antibody detection, used quantitative methods, and reported the number of true positives, true negatives, false positives, and false negatives. This information was extracted from each study alongside with the laboratory method used as reference. From studies reporting results for two different kits, we entered data for the "Beijing Wantai" kit (instead of the "Xiamen InnoDx Biotech" kit), for consistency with other studies. When two different antigens were tested, we entered data for the nucleocapsid (N) protein instead All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted May 8, 2020. . https://doi.org/10.1101/2020.05.03.20084160 doi: medRxiv preprint of the Spike protein, because they generally showed better sensitivity. Sensitivity analyses were conducted to assess the robustness of pooled results against these choices.
Based on the 2x2 contingency table, we calculated the test sensitivity and specificity (with 95% confidence intervals [CI]) and the diagnostic odds ratio (DOR), to provide an overall measure of the test performance [3]. We then calculated the positive (PPV) and negative (NPV) predictive values assuming a true prevalence of 5%, 10% and 20%. Pooled estimates of sensitivity and specificity were obtained through random-effects models after Freeman-Tukey double arcsine transformation. DOR were pooled by fitting a bivariate model which takes into account the correlation between sensitivity and specificity and uses their log-transformed values as normally distributed variables. Between-studies heterogeneity was assessed using the I 2 statistics, which quantifies the percentage of variation attributable to heterogeneity rather than chance. An I 2 below 50% was considered as an indicator of acceptable heterogeneity.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. NPV fell in the range 96-100% for all IgG and IgM kits when the prevalence was assumed to be 10% (the lower limit of the range became 98% and 92% for the 5% and 20% true-prevalence scenarios, respectively).

Discussion
While some SARS-CoV-2 serological tests reported an excellent ability to discriminate between seroconverted and non-seroconverted individuals, others showed diagnostic accuracy far from optimal. In particular, the pooled sensitivity was unsatisfactory (82-85%), as a substantial fraction (one sixth on average) of seroconverted individuals would be incorrectly classified as non-seroconverted. Specificity was generally very high (≥98%), yet this may not suffice to guarantee satisfactory real-world performance in areas with a very low prevalence of infected individuals. A specificity just less than perfect (99%) would in fact produce a PPV ranging between 76% and 88% when combined with a true prevalence equal to 5%, meaning that around one fifth of those labelled as seroconverted would in reality be false positives.
According to WHO, 2-3% of the global population may have been infected by the end of the first epidemic wave [13], thus the PPV in most areas could indeed be much lower than in our simulations.
Further reasons of concern lie in the low number of subjects on which some estimates are based, the fact that some of the included studies have not been peer-reviewed yet, the variability in terms of the gold All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted May 8, 2020. . https://doi.org/10.1101/2020.05.03.20084160 doi: medRxiv preprint standard used to define sensitivity and specificity, the possible heterogeneity of testing procedures (which should be harmonized internationally to ensure comparability), and, above all, the uncertainty as to whether positivity to the test means that effective protection against re-infection has been established [14][15]. Moreover, issues of cost, speed, and availability should also be taken into account when planning large seroprevalence surveys, as well as the medical and non-medical costs of diagnostic errors. While the currently available serological tests can be used for research purposes, our data suggest that their use for large-scale prevalence surveys (or to grant "immunity passports") are currently only justified (and only if showing very high diagnostic accuracy) in hard-hit regions, while they should be used with caution elsewhere. Finally, SARS-CoV-2 serological tests are being developed at a fast pace, and these conclusions may need revision in the coming months, also depending on the further spread of the pandemic.
All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Acknowledgments
There was no funding for this manuscript.
All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.