Self-sampling for analysis of respiratory viruses in a large-scale epidemiological study in Sweden

A Plymoth (amelie.plymoth@ki.se)1, M Rotzén-Östlund2,3, B Zweygberg-Wirgart2,3, C G Sundin4, A Ploner1, O Nyrén1, A Linde5 1. Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden 2. Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden 3. Department of Clinical Microbiology, Karolinska University Hospital, Solna, Sweden 4. Department of Communicable Disease Control, Södermanland County, Mälarsjukhuset, Eskilstuna, Sweden 5. Public Health Agency of Sweden, Solna, Sweden


Introduction
Epidemiological studies of respiratory infections require laboratory confirmation of causative agents.Even a syndrome such as influenza-like illness (ILI), which is regarded as marker for influenza in routine surveillance, needs viral diagnosis in a subset of patients [1][2][3].
Until recently, viral sampling from the respiratory tract demanded professional involvement.This made largescale sampling in epidemiological studies exceedingly expensive.Now, self-sampling by lay study participants and shipment of nasal swabs via regular mail should be feasible [4][5][6], since the sensitivity of polymerase chain reaction (PCR) enables detection of infectious agents at low concentrations.Further, viral genetic material remains stable under many conditions despite loss of infectivity and multiplex PCR assays capable of simultaneously examining many viruses can enable a comprehensive overview of circulating respiratory viruses [7].
Knowledge about the spread of specific viruses in the community is fundamental for successful prevention of epidemics.In many countries, the burden and spread of infections in society may differ substantially from what is seen among patients seeking healthcare.Even for influenza, only a minority of cases may show up in healthcare.The majority stay at home, and these cases probably account for the most substantial burden of disease, the largest cost due to loss of productivity, and likely form the main basis for spread of disease.
As part of the study of work environment and disease epidemiology-infections (SWEDE-I), we developed a scheme for self-initiated respiratory self-sampling with nasal swabs in a cohort that constituted a representative sample of the workforce in Eskilstuna -a medium-sized industrial town in central Sweden.The objective was to demonstrate the feasibility of nasal self-sampling as part of the population-based surveillance of respiratory virus infections in the adult, working population.Here, we describe the logistics and results of PCR-based analyses of 14 viruses.The virology results from this self-sampling were contrasted to those obtained in contemporaneous routine clinical specimens, received during the same time period, from the same age group in the adjacent Stockholm area.

Study of work environment and disease epidemiology-infections
The study of work environment and disease epidemiology-infections (SWEDE-I) was designed to: (i) identify work-related factors associated with the risk of common acute respiratory infections and viral gastroenteritis, both overall and by causative viral agent, in order to pave the way for preventive measures; (ii) provide empirical data on factors that affect the probability of transmission of common viral infections in various work-related settings, in order to improve the epidemic models needed for predictions and planning when major outbreaks are anticipated.SWEDE-I used, for the first time, a newly developed and extensively tested population-based system for infectious disease surveillance with an analytical epidemiological approach.As part of this study, a scheme for self-initiated respiratory self-sampling with nasal swabs was developed.

Setting
We strived to conduct the self-sampling study within a fairly small, circumscribed and stable population.By restricting to such a population, the proportion of the source population that was constituted by participants would be sufficiently high to allow estimations of the period-specific activity of each virus of interest, as an indirect indicator of the probability of becoming exposed, based on the observations in the studied sample.In a small community with high participation density it is also easier to keep participants alerted to their reporting commitment since information about the study tends to propagate by word-of-mouth.Eskilstuna, a town with 97,373 inhabitants 110 km west of Stockholm, Sweden, corresponded well with our specifications and was chosen as the study site.Trade and industry is varied and includes several traditional manufacturing industries.Most gainfully employed people both live and work in the commune.

Study population
Gainfully employed people, aged 25-63 years, residing in Eskilstuna, constituted the source population.The sampling frame was provided by Statistics Sweden through cross-linkage of the continuously updated population register with the Employment Register.To achieve a premeditated sample size of 2,200, postal invitations were sent to an age-and sex-stratified random sample of 14,008 individuals.The expected under-representation of men and of the age stratum 25-44 years was compensated for in the recruitment by an over-sampling based on observed participation rates in earlier, similar population-based infectious disease surveillance studies [8].
The cohort filled in a series of web-or paper-based questionnaires which comprehensively probed into the physical and psychological work environment, work tasks, contact patterns, and commuting behaviour, as well as into potential confounding factors such as diet, family structure, living conditions, medical history, personal characteristics, and physical activity.

Participant-initiated disease reporting and selfsampling of nasal secretion
Participants were instructed to self-report all onsets of fever (>38°C), upper respiratory tract infection, and gastroenteritis, alone or in combination, immediately as they occurred during the entire study period from 1 September 2011 up to 31 May 2012.Reporting could be done via Internet or via telephone, using interactive voice response.When reporting an infection, the participants answered questions about symptoms in an automated, tree-structured interview.Based on predefined algorithms, the diseases could be classified as common cold [9], gastroenteritis, ILI [10], or other/ unclassifiable.Frequent reminders by email and mail, and monthly newsletters reminded the participants of their commitment.Additionally, participants were requested to sample nose secretions concurrently with every symptom report.Two kits with nylon flocked dry swabs in plastic tubes (Copan Diagnostics, Inc., Murrieta, CA, US) and an instruction leaflet had been distributed to each participant shortly after entry into the cohort.Each kit was uniquely linked to the participant by a barcode label on the tube.The participant sent the sampled material to the Virology department, Clinical Microbiology Laboratory at Karolinska University Hospital, Solna, via regular, pre-paid mail.When a participant's last kit had been returned, a new one was supplied.The samples were stored at -70 °C until tested for enterovirus, human coronavirus (hCOV) 229E, HKU1, NL63 and OC43, influenza A, A(H1N1) pdm09 and B, metapneumovirus (MPV), parainfluenza 1, 2 and 3, respiratory syncytial virus (RSV), rhinovirus, using in-house real-time PCR in 96-well plates [7].Remaining material was stored in a biobank for five years.
To get confidential feedback on test results, each participant received an individually unique six-character code, which, combined with the unique national registration number, gave access to a secure webpage listing the participant's results by arrival date of the specimen.Each test result was accompanied by a text describing the virus and its associated disease.

Comparison with viral diagnoses in routine healthcare
The virological laboratory used for SWEDE-I also provides diagnostic services to the entire Stockholm county (population 2.1 million).For reference, aggregated weekly test results from all samples collected during the same time period (September 2011 to May 2012) and from persons in the same age group (25-63 years) as the SWEDE-I cohort, who had been diagnosed for some or all of the same viruses were extracted.These samples (n=1,516) represent a mixture of in-and out-patients that were analysed for clinical reasons.Testing was administered in form of two standardised test packages: 73% of subjects (n=1,113) were only tested for influenza A (including A(H1N1)pdm09), influenza B and RSV.27% (n=403), and typically in-patients, were tested for the full range of viruses analysed for in SWEDE-I, though without distinguishing between different picornaviruses.

Statistical methods
For each virus category, the total number and the percentage of positive swabs for each respiratory virus over all samples tested made are reported separately for the SWEDE-I material and the clinical samples.Percentages are given with exact 95% confidence intervals (CI) [11].Hypothesis testing of equal percentages of positive tests for specific viruses in both materials was performed with Fisher's exact test (F-test).
To explore temporal trends in incidence, the proportion of new cases each week (as percentage of all cases during the nine-month study period) was computed for the four most frequent virus diagnoses (corona-, influenza A-, metapneumo-and picorna-viruses).Week-wise hypothesis testing of equal proportions in the SWEDE-I and clinical materials was done with F-test.
The distribution of sex, age, country of birth, and immigrant status was examined for SWEDE-I participants found to have a positive test for coronavirus, influenza A and picornavirus.Testing of the hypothesis that these distributions were the same for the virus-affected groups as for all participants who returned at least one nasal swab was done using chi-squared tests.
The weekly averages of number of days between reported onset of a disease episode and receipt of the corresponding nasal swab at the laboratory were plotted against study week, together with a loess smoothing curve for the mean, inversely weighted by weekly standard errors [12].The smoothing curve was then tested against a null model of constant average delay using an approximate F-test [13].Differences in this delay between individual positive and negative tests over the entire study period were tested using a Wilcoxon test.For all tests, a p-value of less than 0.05 was considered statistically significant.

Role of the funding source
This research was funded in full by AFA Insurance, Stockholm, Sweden.The sponsor had no role in the conception, design, planning, execution, analysis, interpretation or publication of the study.The corresponding author had full access to all the data in the study.

Results
After two reminders, 2,237 of 14,008 invitees agreed to participate in the SWEDE-I cohort (participation rate 16%).Some key characteristics of the cohort are exhibited in Table 1 virus (henceforth referred to as 'positive tests').Since 21 of the samples each contained two or more different viruses (two viral diagnoses in 20 samples, three in one), the total number of virus diagnoses was 898.
The number of returned swabs peaked in the last week of September 2011 (week 39) and fell until the last week of November (week 48), when a phase with variable inflow ensued (Figure 1A).After the last week of February 2012 (week 9), the numbers decreased until the end of the study.The crude number of positive tests showed a similar pattern, but there was a more distinct upward tendency from late November, a climax in mid-February, and a gradual decrease until late April.In the beginning of the self-sampling study, the proportion of positive tests remained at ca 40%, but from mid-November the proportion increased, until it exceeded 60% in early April (week 14) (Figure 1B).Then it abruptly fell back to around 45%.
With the exception of the first and last two weeks of the study, the week-wise average delay between onset of disease episodes and arrival of the specimens at the laboratory varied between four and six days, and the corresponding median delay was between 3.5 and six days.Figure 2 suggests that this average increased slightly towards the end of the study period, although the trend indicated by the smoothing curve failed to reach formal significance (p=0.06).Overall, the delay between disease onset and sample arrival appeared fairly stable.Negative tests were slightly but significantly skewed towards longer delays (p=0.004).
Of 1,212 episodes with reported nasal discharge, 679 (56.0%) showed positive tests, in stark contrast to 29 (12.1%) of 239 episodes without nasal discharge.Testnegative episodes without nasal discharge were evenly distributed across the entire study period (data not shown).Plot (black line) of the proportion of positive tests among all samples received by calendar week from the SWEDE-I cohort.The green line is the weighted loess smooth and the 95% confidence envelope is in grey.

Figure 2
Delay between specimen arrival at the laboratory and reported disease episode onset in the cohort performing selfsampling, study of work-related risk factors for transmission of viral infections (SWEDE-I), Sweden, September 2011-May 2012 (n=876 swabs) Distribution of number of days between reported disease onset and arrival of the specimen at the laboratory for positive and negative tests; the modified boxplots show quartiles and median (box) as well as 5% and 95% quantiles (whiskers).

B.
Loess smoothing curve (blue) weighted by standard errors of weekly averages of number of days elapsed between specimen arrival and reported disease onset.The grey area indicates 95% confidence limits for the loess curve.

Pattern of virus-specific diagnoses
Percentages of virus-specific diagnoses among all samples tested in the SWEDE-I cohort are listed in Table 2 (column 2).In the SWEDE-I material, rhinoviruses were the most common of all tested viruses (20.8% of all samples) and dominated the picornavirus group.
Coronaviruses, dominated by HKU1, were found in 16.2% of the samples, followed by seasonal influenza A in 4.6% of the samples and MPV in 2.9%.Among test-positive samples from patients without nasal discharge, the distribution of virus types was essentially the same as that in the entire SWEDE-I cohort (data not shown).
Columns 4 of Table 2 display the diagnostic yield from the clinical material.As for the SWEDE-I samples, corona-, influenza A and picorna-, viruses dominated, in the clinically-isolated samples, however, the rank order differed substantially and the proportion of influenza and RSV positive samples was significantly higher among such samples (p≤6e-05).

Seasonality of virus-specific diagnoses
The seasonal distributions of virus-specific diagnoses across the study period are shown in Figure 3 as weekly proportions of all specific diagnoses in the study period.From the SWEDE-I samples we found that picornaviruses, which were dominated by rhinoviruses occurred during the entire study period, but with a distinct peak in the last week of September (week 39).The season for coronaviruses lasted from early November until early May (week 18), with a climax in the second week of February (week 6).Seasonal influenza A peaked during the first three weeks of March (weeks 9-11).MPV occurred with three distinct peaks four to seven weeks apart between late December (week 52) and late March (week 13).Again the seasonal pattern among test-positive samples from patients without nasal discharge was very similar to the whole SWEDE-I cohort as just described (data not shown).samples.Despite the difference in absolute frequencies seen in Table 2, the resulting proportions obtained respectively within the self-sampling and clinical-sampling schemes indicated seasonal occurrences that were overall similar.Influenza A tracked extremely well between the two schemes, the only difference (p=0.01)being a second dominant peak in the SWEDE-I cohort, two weeks after the common peak.For the corona-, metapneumo-and picornaviruses, significant differences in weekly proportions of positive swabs among the two sampling schemes were mostly observed towards the end of the study period, with the exception of one obvious peak in week 11 for coronavirus in the clinical material, which was completely absent in the SWEDE-I material (p = 0.0004).The counts for the other viruses considered in this study were too few to allow for seasonal analyses.

Common infections and demographics
The distributions of age, sex and foreign background were remarkably similar among SWEDE-I participants with positive tests for, respectively, coronavirus, influenza A and picornavirus (Table 3).None of these viruspositive groups differed significantly from the total group with tested nasal swabs.People born abroad or with an immigrant background were similarly

Discussion
This study demonstrates the feasibility of nasal selfsampling as part of population-based surveillance of respiratory virus infections.During one season 1,843 samples, corresponding to 1.1 per person-year, could be evaluated.Previous reports have indicated that self-sampling earlier in the disease likely compensates for possible losses in sample quality [14,15].The weekwise average delay between onset of disease episodes and arrival of the specimens at the laboratory varied between four and six days, and the corresponding median delay was between 3.5 and six days.This delay is considered acceptable in terms of sample quality.
While we lack a formal validation against a gold-standard method, we argue for the validity of our results based on a number of separate lines of evidence.
With regard to self-reporting of ILI/ acute respiratory infection (ARI), the framework for self-initiated, eventdriven infectious disease reporting that we employed had already been developed for a Swedish population-based cohort [8] and used for population-based surveillance in Stockholm County since 2007 [16].A separate validation study concluded that while there was significant under-reporting of disease (estimated at 60%), this level of under-reporting was remarkably constant over time and across seasons [16], so that a simple constant correction factor can potentially restore validity of incidence rates, at least in terms of reported disease incidence.
It is possible that the additional requirement of collecting and mailing a nasal sample may have led to increased study fatigue and correspondingly increased under-reporting in the SWEDE-I study, compared with the previous studies: in Figure 1A, we see indeed that the number of tests performed decreases from more than 100 samples/week at the very beginning to ca 20 samples/week at the end.However, Figure 1B indicates that the proportion of positive samples was at about the same level of ca 40% at both times, with in between a peak of ca 60% positive samples coinciding with the peak of the influenza A season seen in Figure 3. Also, within the study period, the numbers of self-sampled specimens submitted to the laboratory decreased more after week 9.This is in agreement with syndromic surveillance in adults for the same period in the whole of Sweden, based on calls for fever and cough to a medical advice line as an indicator for respiratory infections, which shows a sharp decline of contacts from week 9, the last week in February [17].
Taken together, it appears that the fluctuation in positive samples at least is driven more by seasonal and disease-related factors than by varying levels of participation in the study.With regard to the distribution and burden of viruses in SWEDE-I, we found a similar pattern of rhinovirus-coronavirus-influenza as the three most common diagnoses (at 23%/ 16%/ 5%, respectively) seen in communitybased studies in England in the 1990s (34%/ 14%/ 9% for a population aged 0-60 + years [18] and 52%/ 26%/ 10% for a population aged 60-90 years [19]).These community-based studies used active follow-up, with sampling by health professionals, and diagnosis through a combination of virus isolation and serology.This is strikingly different from the concurrent clinically-isolated sampling scheme results, where the proportion of influenza-positive samples (25%) dominated all other infections, with seasonal influenza A, A(H1N1) pdm09, B being significantly more frequent than in SWEDE-I.At the same time, RSV and MPV were also significantly more common in the clinical materials.
Although the lower frequency of viruses causing severe infections in SWEDE-I could be partly due to the selfsampling method, it seems most likely that patients infected with these viruses are overrepresented in healthcare.
Even though we have not been able to demonstrate conclusively in this study that self-sampling has the same sensitivity as healthcare based sampling, our results strongly support the use of the SWEDE-I methodology for influenza surveillance.
With regard to the timing of the circulation of different viruses, we found that the seasonality patterns obtained were rather similar between the SWEDE-I and clinical schemes.This confirms that clinical identification parallels societal spread, as seen in similar selfsampling studies previously [15], but is by no means a measure of societal spread intensity.The higher proportion of positive tests in the SWEDE-I cohort was explained by the abundance of picorna-and coronavirus infections.Self-sampling earlier in the disease may have contributed to their frequent detection, but the most apparent explanation for their scarcity in the clinical material is that they are rarely direct causes of severe disease among adults [20,21].Interestingly, when influenza A peaked in the SWEDE-I cohort, only ca 30% of the samples were positive for influenza, and another 30% were positive for other tested viruses.This underlines the importance of virological testing to verify that acute respiratory disease is caused by influenza also during the epidemic period.
In the SWEDE-I cohort, the proportion of positives for picorna virus, corona virus and influenza viruses in adult persons from various demographic groups was very similar to the proportion among all samples obtained (Table 3).The similarity in this non-healthcare selected, adult population is obvious both when age, sex and ethnicity are considered.It is difficult to make any other interpretation than that spread of these viruses, with accompanying respiratory symptoms, is rather homogenous among adults of similar age in the society.The low rate of positives for the other viruses prevented a similar analysis.
Noticeable limitations of the study include the absence of formal validation against gold standard testing, uncertain external validity due to low participation in the invited representative sample, and probable underreporting among participants.Men, young age groups, and low-educated people were somewhat under-represented in a similar cohort [8].The number of disease reports per person-year in this study is very similar to a previous validation study in similar cohorts [16].In the previous study, a relatively constant under-reporting of 60% was identified, based on random control questionnaires on health status the previous week.Assuming a constant overall incidence of virus infections from year to year, the under-reporting was likely similar in the present study.Clearly, more research is needed to improve the completeness of disease reporting.Additional reminders and other incentives may be required.
The rate of positivity was further considerably higher among individuals whose disease was associated with nasal discharge than among those without.We found no indications that this disfavoured any specific virus, but further research is needed to verify whether patients without rhinitis are virally infected, and if so, to improve sampling.This is a large-scale epidemiological study where selfreporting, self-sampling and modern PCR-based diagnosis were combined for investigation of virus-specific respiratory infection incidence on the population level.The logistics around reporting and self-sampling functioned exceptionally well.Of major importance was the sensitivity of the virological assays used.The methodology has been evaluated [7] and the sensitivity appears to be optimal.The participants received written instructions on how to perform the self-sampling and the instructions were also available on the study web page.In addition the participants could also call the staff at the study centre to ask questions.Shortly after having sent in a sample the participant could log in with their unique code at the study web page, for access to a secure website with their viral test results.
The major cost of a virological study is the laboratory analyses, and to contain costs, samples can be stored at -70 °C until analysis in batches during periods of low workload since analysis is not necessary for clinical purpose.While the present cost of virological analyses makes routine sampling for analyses of respiratory viruses in the population unjustifiable, the feasibility of large-scale self-sampling in epidemiological studies may importantly advance the understanding of burden of disease and factors affecting spread.
The discrepancies and similarities with findings in clinical specimens seem logical, and calculations for influenza result in a very relevant incidence for the included population.For some of the viruses, a laboratory comparison of sensitivity for nasal vs nasopharyngeal aspirates is desirable, but the fact that self-sampling is performed very early during the disease may compensate for a higher sensitivity of clinical nasopharyngeal sampling.
This successful deployment of self-sampling is applicable everywhere and it can be extended to other groups than working adults, and to various geographical areas, so long as the mail transport is reasonably efficient.We believe it may be an important tool in further research on spread of viruses in the population and the effect of interventions such as vaccination.Self-sampling for vaginal and rectal material has already been introduced for diagnosis of venereal diseases [22].This sampling method can certainly support clinical and syndromic surveillance, as previously suggested [15].

Figure 1
Figure 1 Weekly number of swabs received by the cohort of the study of work-related risk factors for transmission of viral infections (SWEDE-I) and proportion of positive tests among all swabs received, Sweden, September 2011-May 2012 (n=1,843 swabs)

Figure 3
Figure 3 also shows the corresponding proportions of virus-specific diagnoses in the clinically-isolated

Figure 3
Figure 3Weekly proportions of samples positive for (A) picornavirus, (B) coronavirus, (C) influenza A and (D) metapneumovirus relative to the total positive respective samples during the whole study obtained in the self-sampled and clinically-sampled materials, Sweden, September 2011-May 2012 . The participants sent in 1,843 nasal swabs and made 2,119 disease reports, giving a sampling rate of 87%.Of the nasal swabs, 876 (47.5%; 95% CI: 45.3-49.8%)were shown to contain at least one a A total 2,237 participants were in the cohort but not all responded to all questions asked in the questionnaire.b Unless otherwise specified.c Including the index participant.

Table 2
Numbers, and percentages among all samples tested, of positive diagnoses for respiratory viruses found respectively for selfsampled and clinically-sampled swabs, Sweden, September 2011-May 2012

Table 3
Comparison according to selected demographic variables of SWEDE-I participants testing positive for coronavirus, influenza A, or picornavirus with all participants who returned a nasal swab, Sweden, September 2011-May 2012 (n=1,843)