Respiratory syncytial virus in young children: community cohort study integrating serological surveys, questionnaire and electronic health records, Born in Bradford cohort, England, 2008 to 2013

Background Bronchiolitis caused by respiratory syncytial virus (RSV) is a major cause of mortality and morbidity in infants. Aim To describe RSV epidemiology in children in the community in a high-income setting. Methods We used stored blood samples from the United Kingdom Born in Bradford cohort study that had been collected at birth, age 1 and 2 years old, tested for IgG RSV postfusion F antibody and linked to questionnaires and primary and hospital care records. We used finite mixture models to classify children as RSV infected/not infected according to their antibody concentrations at age 1 and 2 years. We assessed risk factors for primary RSV infection at each age using Poisson regression models. Results The study cohort included 700 children with cord blood samples; 490 had additional blood samples taken at both ages 1 and 2 years old. Of these 490 children, 258 (53%; 95% confidence interval (CI): 48–57%) were first infected with RSV at age 1, 99 of whom (38%; 95% CI: 33–43%) had been in contact with healthcare during peak RSV season (November–January). Having older siblings, birth in October–June and attending formal childcare were associated with risk of RSV infection in infancy. By age 2, a further 164 of 490 children (33%; 95% CI: 29–38%) had been infected. Conclusion Over half of children experienced RSV infection in infancy, a further one third had evidence of primary RSV infection by age 2, and one in seven remained seronegative by their second birthday. These findings will inform future analyses to assess the cost-effectiveness of RSV vaccination programmes in high-income settings.


Sample size calculations
We tested 490 left-over blood samples collected from the Allergy and Infection (ALL-IN) study at age 1 and 2 years old. We included all children who had sufficient serial samples remaining from the ALL-IN study at the two ages. 1 To determine a change in RSV antibody concentrations between age 1 and 2 years, we estimated a sample size of 276 children, based on detecting an increase in respiratory syncytial virus (RSV) seroprevalence from 65% to 80% between 1 and 2 years old with 80% power. 2 Assuming a difference in log2 immunoglobulin G antibody against RSV postfusion protein F levels of 15% or greater 3 we required 700 cord blood samples to detect a difference among children born preterm (<37 weeks gestation) and term-born children with 80% power. To examine maternal RSV antibody concentrations, we therefore tested an additional 210 cord samples from children in the original Born in Bradford (BiB) cohort. Among the 210 cord samples, we oversampled children born prematurely.

Deriving indicator of contact with healthcare
We used electronic health records (hospital admission, primary care and primary care prescribing records) to derive indicators of contact with healthcare due to respiratory tract infection (RTI) during peak RSV season as a proxy for likely symptomatic infection. 5 Children were indicated as having RTIrelated contact with healthcare if any of the criteria listed below were met during peak RSV season (defined as 1 st November -31 st January).
We used the same method to indicate children who had RTI-related contact with healthcare aged <6 months during RSV season (defined as 1 st October -28 th /29 th February) for sensitivity analyses 1 and peak RSV season (1 st November -31 st January) for sensitivity analyses 2.

Hospital records
We derived an indicator of hospital contact via linkage to Bradford Royal Infirmary (the main hospital serving the city of Bradford) electronic hospital records. Records were deterministically linked to children in the cohort using National Health Service (NHS) number. Diagnostic information was recorded using International Classification of Diseases version 10 (ICD-10) codes. 6 To identify RSV-related hospital admissions, we flagged hospital admissions where any of the diagnoses recorded during the admission included either of the following of the ICD-10 codes 7 : -J21: Bronchiolitis -J12.1: Respiratory syncytial virus pneumonia -J20.5: Acute bronchitis due to respiratory syncytial virus -B97. 4 Clinical information was recorded using Read medical codes version 3 (CTV3). 9 To identify RSVrelated contact with primary care, we included 77 CTV3 Term ID codes, which we translated to 129 CTV3 concept IDs (listed in appendix table 1) for symptoms such as: fever, rigor, cough, and codes likely to indicate RSV infection (RSV, respiratory tract infection, bronchitis, bronchiolitis, chest infection, pneumonia).
Prescriptions issued in primary care are coded using sections of the British National Formulary (BNF). 10 We extracted prescriptions from BNF chapter 5.1 and indicated prescriptions for amoxicillin if drug name included "amoxicillin". We focus on amoxicillin as it is indicated for community acquired pneumonia, and it is the most commonly prescribed antibiotic for respiratory tract infections. 11 Our previous research has demonstrated that the peak timing of amoxicillin prescribing coincides with the peak in RSV circulation in the UK. 12 Supplementary

Deriving RSV infection status using finite mixture models
We applied finite mixture models (FMM) to the log RSV IgG post-F levels at age 1 and 2 years old to classify children as RSV infected at age <1 year and 1-2 years respectively according to their antibody concentration levels. A priori we decided to fit a model with 2 classes (infected / not-infected). The model was as follows: (ln(RSV IgG post-F)) = 1 × 1 (ln(RSV IgG post-F)) + 2 × 2 (ln(RSV IgG post-F)) where 1 and 2 are the probabilities of observation belonging to each class, 1 and 2 are conditional probability density functions for observed antibody concentrations in each class.
For antibody levels at age 2 years old, we considered a model with no covariates for class probabilities 1 , 2 (model 1), and a model allowing the probabilities of infection 1 , 2 to depend on observed antibody concentrations at age 1 (model 2). We considered this covariate in the model as IgG post-F concentrations were likely to remain at a higher level following infection at age 1-2 years in children who were first infected in infancy. We compared latent class marginal mean IgG post-F levels, marginal posterior probabilities and Akaike's Information Criterion (AIC) for the two models. We used model 2 in the final analyses as it had lower AIC (supplementary table 2).
Supplementary The two latent classes (RSV infected vs not infected) were well defined -475 (97%) children at age 1 and 474 (97%) children at age 2 a posterior probability of infection either <10% or >90% (supplementary figure 1). 5 children were classified as not infected at age 2 years old, but infected at age 1 year old.

Supplementary Figure S1 -Posterior probabilities of RSV infection at age 1 and 2 years old according to IgG post-F antibodies
IgG post-F = immunoglobulin G antibody against postfusion protein F, RSV= respiratory syncytial virus

Descriptive analyses
Supplementary CI=confidence interval, IgG post-F=immunoglobulin G antibody against RSV post-fusion protein F, log = natural logarithm (base e), RSV=respiratory syncytial virus. Column 2 presents mean maternal log RSV post-F antibody levels. Column 3 presents p-values for one-way analysis of variance (ANOVA) comparing mean maternal log RSV IgG post-F by risk factor categories, excluding missing data category. *Hypertension included any mention of history of hypertension/pregnancy induced hypertension/ preeclampsia

Supplementary Table S6 -Distribution of risk factors in children with primary RSV infection vs never infected at age 1 and 2 years old
Aged CI=confidence interval, IgG Ga and Gb = immunoglobulin G (IgG) antibody against attachment protein G for RSV strands A and B, RSV=respiratory syncytial virus. Columns 2 and 3 present exponentiated results from the log-linear model, reflecting proportional change in maternal RSV IgG Ga and GB levels, respectively, for each risk factor category vs the baseline. Definition of "RSV infection" using each of the 3 antibody types agreed for 58% of children, and further 33% showed agreement between IgG post-F and either Ga or Gb antibodies (supplementary  table 10). Since IgG Ga and Gb antibodies are less immunogenic than IgG post-F, resulting in lower sensitivity and specificity, 4 we did not re-calculate risk ratios based on infection indicator from these models. At age 2 years, the distribution of IgG Ga and Gb proteins was approximately normal (supplementary figure 2), therefore we did not fit FMM to samples at that age.