Concordance of interim and final estimates of influenza vaccine effectiveness : a systematic review

VK Leung 1 , BJ Cowling 2 , S Feng 2 , SG Sullivan 1 3 1. World Health Organization Collaborating Centre for Reference and Research on Influenza, Peter Doherty Institute for Infection and Immunity, Melbourne, Australia 2. World Health Organization Collaborating Centre for Infectious Disease Epidemiology and Control, School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong Special Administrative Region, China 3. Fielding School of Public Health, University of California, Los Angeles, United States


Introduction
Influenza vaccination is currently the main strategy for reducing the burden of influenza morbidity and mortality.Influenza viruses continuously evolve by undergoing antigenic drift and the composition of influenza vaccines therefore varies each year to account for antigenic changes in circulating viruses.The inability to use randomised trials to measure the efficacy of the influenza vaccine each year has resulted in the use of observational studies to determine annual vaccine effectiveness.However, observational studies such as cohort or case control studies can be subject to a number of biases.
The test-negative design (TND) is increasingly being used to measure influenza vaccine effectiveness (VE).The theory and methodology behind the TND has been discussed in detail previously [1][2][3].Briefly, patients presenting for medical attention with a respiratory infection are swabbed and tested for influenza.Those testing positive are the cases and those testing negative are the comparison group [3].Laboratory end points such as PCR-confirmed influenza are preferred in the TND, rather than low-specificity endpoints which could lead to underestimation of the effect of vaccination [4].This design is favoured for the reporting of mid-season estimates, which provide a preliminary indication of vaccine performance during the season [5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21].Early VE estimates may be useful to public health authorities in the event of a pandemic or in a season where VE appears to be low, to guide resource allocation or initiate additional preventive measures.Belongia et al. have shown that interim estimates can be reliable to within 10 percentage points of the final estimate [22], while Sullivan et al. demonstrated that estimates made in seasons with an early start showed greatest reliability to within 10 percentage points [19].Jimenez-Jorge et al. also found agreement between mid-and end-of-season estimates in their comparison over four seasons in Spain [23], supporting the use of interim estimates.However, studies of interim influenza VE estimates might be expected to ignore desired exclusion criteria due to small sample sizes and incomplete data.The objective of this review is to examine differences in reported interim and final influenza vaccine effectiveness estimates derived by the test-negative design, with particular reference to changes in the analytical approach used between interim and final estimation.

Search strategy
Studies reporting influenza VE estimates were initially retrieved from PubMed on 8 November 2013 as part of a review of test-negative studies which focused solely on final estimates, excluding interim estimates [24].At that time, articles were searched using combinations of the following terms: (i) 'influenza' OR 'flu', (ii) 'vaccine effectiveness OR 'VE', (iii) 'test-negative' OR 'test negative' OR 'case-control' OR 'case control'.We used the list of excluded papers to identify interim estimates for this review.In addition, a further search of PubMed, Medline, Web of Science and Embase was conducted on 19 December 2014 and updated on 5 December 2015 using the above search terms as well as the following: (iv) 'interim' OR 'mid-season' OR 'mid season' OR 'early estimates'.
Complementary to the online search, the reference lists of retrieved articles were reviewed to identify additional studies.Articles were also identified, between May 2012 and December 2015, from influenza email alerts from the Centre for Infectious Disease Research and Policy (CIDRAP, http://www.cidrap.umn.edu/).We excluded articles which did not use the test-negative design or were a re-analysis of data, end of season analyses without corresponding interim analyses and interim analyses without corresponding final analyses.Searches were limited to articles in English only.
The titles of all papers identified were independently screened by two authors (VKL and SGS).Abstracts of potentially relevant papers were reviewed for eligibility, and the full text of eligible articles was reviewed.Studies reporting interim effectiveness estimates for any type of influenza vaccine (trivalent inactivated, live-attenuated, monovalent, adjuvanted/non-adjuvanted or unspecified) were considered.
Once all interim papers were identified, their corresponding end-of-season report was located.This was a specific search using the author names, location and season of the interim paper to identify the paper reporting final estimates.

Data retrieval
Study design and analysis features were reviewed for each article using a standardised data collection form.Specific features reviewed included the study setting, source population, case definition (including whether acute respiratory illness or influenza-like illness was used and any restrictions on time since symptom onset) exposure definition (including any restrictions on the period between vaccination and symptoms onset), study period or season, timing of interim estimates in relation to the peak (determined by reviewing the epidemic curve provided in final analyses), any other exclusions (e.g.patients with missing information, children younger than a certain age), variables included in the model to estimate VE and their specification, and reported interim and final VE estimates.If the methods referred to a previous paper, the methods in the previous paper were recorded.If the specification of a variable was not mentioned, it was assumed that it had not been taken into consideration in the analysis.In some instances where information was not available, the authors were contacted to provide this information.

Comparison of interim and final estimates
The VE estimates reported by each interim/final study pair were plotted using forest plots and compared visually.Changes between interim and final estimates of 10 or more percentage points were considered meaningful differences [19,22].The difference in VE estimates (ΔVE) between final and interim analyses was calculated.Confidence intervals were estimated using bootstrapping and were based on each study's standard error estimated from reported confidence intervals.We attempted to evaluate whether any design features were associated with ΔVE.This was done in two ways: (i) univariate linear regression, modelling each design feature explored on the absolute value of ΔVE, and (ii) logistic regression, where the outcome was a change in ΔVE of 10 or more percentage points.Multivariate models were explored using stepwise regression to identify which variables were most influential on the value of ΔVE or a change in ΔVE of 10 or more percentage points.We used stepwise regression to limit the size of the final model; given the small number of data points, a full model would have been overparameterised.Akaike information criterion (AIC) were used to choose variables for the final model using the stepAIC package in R. Design features were specified as the absolute difference between interim and final estimate Interim studies identified n=32

Titles reviewed n=43
Interim studies identified from previous review [18] n=18

Updated search n=25
Excluded n=11

Excluded n=15
PRISMA: preferred reporting items for systematic reviews and meta-analyses; TND: test-negative design.

Figure 2
Comparison of overall interim and final influenza vaccine effectiveness estimates

Figure 3
Comparison of interim and final vaccine effectiveness estimates for influenza A(H1N1)pdm09 for sample size, proportion positive, proportion of vaccinated non-cases, number of weeks studied and number of covariates in the model.For other design features, the change in variable specification was used as a predictor; this included a change in specification of calendar time, vaccination definition, exclusion criteria related to time since onset, and statistical model.We also examined whether there was a change in the dominant strain during the season and whether the interim estimate was made before or after the peak.All analyses were performed using R version 3.1.3.
The characteristics of the paired interim and final analyses are summarised in Table 1.Studies were reported from North America, Europe and Australasia, with a total of 17 countries represented.The 2013/14 final published estimate for Spain was included as part of analyses comparing interim and final estimates over a number of seasons [23].Two interim reports published for the 2012/13 northern hemisphere season in the United States (US) were published one month apart.
The first interim estimate [41] was excluded from the comparison as the number of cases was substantially smaller than those used in the second interim estimate for the season [7].Three interim studies reported agespecific estimates.No studies reported sex-specific estimates and only one interim study reported VE by risk group [16].Eight northern hemisphere interim studies [5,6,[13][14][15]17,18,21] and one southern hemisphere study [10] were published before or during the World Health Organization's (WHO) vaccine strain selection meeting.

Comparison of interim vs final vaccine effectiveness analyses
Interim and final study pairs were reviewed to identify differences within and between pairs in the methods used to make estimates.A summary of these changes is shown in Table 2.

Setting and source population
In none of the study pairs were there changes to the study setting between interim and final estimates.One pair of studies from New Zealand reported estimates for both community and hospital settings [20,37].The source population differed in the final analyses of three studies where data were pooled from multiple surveillance networks or sites [31,33,36].Pooled final estimates commonly included data from additional surveillance sites which may not have had any cases at the time the interim estimate was made.For example, during the European 2011/12 season some countries were unable to provide data for the interim estimate [12].In general, sample sizes in final analyses of VE increased compared with the interim analyses.One interim study reported a larger sample size (n = 285 [19]) than the corresponding final estimate study (n = 262 [26]), which was associated with the application of stricter criteria for the definition of the study period used and subsequent exclusion of many non-cases.

Influenza-like illness definition
The clinical case definition used to identify patients was generally termed influenza-like illness (ILI); however in the US studies, acute respiratory illness (ARI) was used as the clinical case definition.The list of symptoms included in each definition remained the same between the interim study and final study in all but one pair [27].The interim analysis for the 2010/11 season in Spain based the ILI definition on the International classification of primary care (ICPC) code for fever, whereas the final analysis provided a more specific definition for ILI.This did not appear to alter the point estimates for influenza A(H1N1)pdm09 (interim VE: 58%, 95% confidence interval (CI): 11-80; final VE: 59%, 95% CI: 29-72) [5,27].All studies included fever in the case definition for ILI, while only one study specified a temperature-based definition [13].

Influenza case definition
Cases of influenza were defined differently in two pairs of interim and final analyses.The case definition used in the interim analysis for the 2010/11 season in the United Kingdom (UK) [14] included individuals with ILI who were swab-positive for any influenza, regardless of type or subtype.The definition used in the final analysis [36] only included individuals who were swabpositive for influenza A(H1N1)pdm09 or influenza B. Conversely, Kissling et al. [12] included only patients who were positive for influenza A(H3N2) in their interim analysis, while the case definition for the final analysis included all patients who were swab-positive for any influenza [33].However, the final analysis was later restricted to influenza A(H3N2) as this was the predominant circulating subtype during the season.Their end-of-season point estimate for influenza A(H3N2) decreased by 18 percentage points from the interim estimate (interim VE: 43%, 95% CI: 0-68; final VE: 25%, 95% CI: −6 to 47).

Exposure
The classification of patients as vaccinated generally did not differ within study pairs.The definition for vaccination was not reported in the interim analysis for the Australian 2009 season [10].In the final analysis [30], the vaccinated population was restricted to those presenting 14 days or more after vaccination.

Study periods
The criteria used to define the start of the study period for interim analyses varied among studies.Two studies started with the commencement of surveillance [10,19], six started when there was evidence of circulation based on laboratory-confirmed cases [5][6][7][8]16,20].Five studies used only the weeks with cases, a certain period after the vaccination campaign [11,12,17,18,21,42], while four studies did not clearly define their study period [9,[13][14][15].Reported start date is either the date reported in the paper or was inferred if only the week was reported.Note that it refers to the date surveillance started; VE estimates may have been made for a different period.

Table 2b
Changes in vaccine effectiveness estimates by type/subtype and differences between interim and final studies in model specification (n = 34) Reported start date is either the date reported in the paper or was inferred if only the week was reported.Note that it refers to the date surveillance started; VE estimates may have been made for a different period.

Table 2c
Changes in vaccine effectiveness estimates by type/subtype and differences between interim and final studies in model specification (n = 34) In general, the study period was defined in the same manner for final estimates, and the majority (n = 15) of studies commenced their study period on the same date for both interim and final analyses.In Spain in 2010/11, the interim analysis commenced in October, while the final analysis used data only from early December; the interim and final VE estimates made for influenza A(H1N1)pdm09 against trivalent influenza vaccines (TIV) and monovalent influenza vaccines (MIV) were within 10 percentage points of each other [5,27].Conversely, the study period reported for the European 2011/12 final analysis commenced earlier than the study period of the interim analysis, and larger variation between the estimates for influenza A(H3N2) was observed (VE: 43%, 95% CI: 0-68% [12] vs VE: 25%, 95%CI: −6 to 47% [33], respectively).In Australia in 2013, while the interim and final studies listed the same commencement date, the interim estimate was based on all available data for the surveillance period, while the final estimate was based on the weeks with cases and non-cases; thus the effective start date differed.The final estimate for all influenza (55%, 95% CI: −11 to 82) in that study pair [26] increased by 12 percentage points compared with the interim estimate (43%, 95% CI: −30 to 75) [19].

Outcome
Among interim studies, patients were restricted to those presenting within four [10], seven [ or 29 days [13,14], while in one study, no such restrictions were mentioned [5].These same restrictions applied in the final analyses in all but two studies.The interim estimate for the 2010/11 season in Spain restricted analyses to patients swabbed within eight days of symptom onset [16], whereas the final analyses was further restricted to within four days of symptom onset [8].Similarly the 2012/13 season in the UK applied a restriction of less than 29 days for their interim analysis [13] and altered the cut-off to less than seven days for the final analysis [25].In both the Spanish and UK studies, final VE estimates were decreased compared with the interim estimates.

Variables included in the model to estimate vaccine effectiveness
Interim and final estimates for all influenza (n = 12 studies) and for influenza A(H1N1)pdm09 (n = 10 studies) were most commonly reported, while seven studies reported estimates for influenza A(H3N2) and four studies reported estimates for influenza B. All studies used logistic regression to estimate VE.Compared with interim analyses (which used between one and nine variables), end-of-season VE models used between two and 10 variables.Differences in the variables included in regression models were noted in 12 of the paired studies.
All estimates were adjusted for age, specified as a categorical variable.The specification of age changed between interim and final analysis for six study pairs, either by the use of different categories [22,26,27], re-specification as 10-year bands [32] or using cubic splines [31,34].
Calendar time was included in the model for 15 interim and corresponding final analyses.This variable was described in final analyses as a phase or period [27,30,34], week of swabbing, enrolment or symptom onset [22,23,28,29,[31][32][33]38,39], month of sample collection or symptom onset [25,35,36], or time relative to peak [26,37].It was not included for two interim studies [7,10] but subsequently included in the model to estimate end-of-season VE [30,34].The definition of calendar time varied in three pairs of interim and final analyses.In the model used to estimate interim VE for the 2012/13 European season, month of symptom onset was included as the calendar time variable [21], while week of symptom onset was used in the final model instead [31].In both the Australian 2013 and New Zealand 2014 studies, week of presentation was used in interim analyses [19,20], while time relative to peak was used in the final analyses [26,37].
Seven study pairs included some adjustment for the presence of chronic medical conditions in both interim and final analyses, while five included this adjustment only in the final analysis [25][26][27]34,37].
Hospitalisation in the previous year, outpatient visits in the previous year and previous receipt of pneumococcal vaccine were included in the model to estimate end-of-season VE of one study, but were not included for adjustment in the interim analysis [5].Another study adjusted for days from illness onset to enrolment, self-rated health and race/ethnicity [7] in the interim analysis, but did not adjust for these variables in their final analyses.Other variables included in both interim and final analyses included location or study site [5,7,11,[13][14][15]17,18,25,27,32,[34][35][36]38,39], history of smoking [8,11,28,32], receipt of previous influenza vaccine [11,16,29,32] and children in the household [5,27].

Comparison of interim and final vaccine effectiveness estimates
Interim and final VE estimates by type and subtype are shown in Figure 2-5.
In general, mid-season estimates were higher than end-of-season estimates.An absolute difference of less than 10 percentage points between interim and final estimates was found for 18 of 33 reported pairs of estimates, including five of 12 pairs reporting VE against any influenza, six of 10 for influenza A(H1N1)pdm09, four of seven for influenza A(H3N2) and two of four for influenza B. The largest difference between interim and final estimates was observed in the 2008/09 season in the US (interim VE: −35%, 95% CI:-172 to 33 [6]; final VE: 31%, 95% CI: 3-51 [22]).In contrast, there were no changes to the point estimates for influenza A(H1N1) pdm09 in the 2009 Australian season [10,30] and for influenza A(H3N2) in the 2012/13 European season [21,31].However, all interim and final estimates compared displayed overlapping confidence intervals.
Univariate linear regression models suggested that only the proportion of vaccinated non-cases had a significant effect on the value of ΔVE (Table 3).The multivariate model identified that the proportion of vaccinated non-cases, change in how calendar time was specified and whether the interim estimate was made before the peak were the most influential variables; these were retained in the stepwise model.Using logistic regression, no design feature was identified as being statistically associated with a change in ΔVE of at least 10 percentage points in the univariate models.
The stepwise model identified sample size, the proportion positive, the number of weeks studied, the proportion of vaccinated non-cases and whether the interim estimate was made before the peak as the most influential factors.

Discussion
We reviewed 17 pairs of published interim and final influenza VE studies that used the test-negative design to evaluate whether interim estimates can reliably predict final estimates.In general, interim estimates closely approximated final estimates, with 18 of 33 final estimates for all types and subtypes reported within 10 percentage points of their corresponding interim estimate.We attempted to explain discordance between pairs by examining their methodological differences and identified some inconsistencies between interim and final estimation.Within many of the study pairs, definitions for ILI, fever, study population, vaccination status, and the cut-off applied to the duration between patient presentation and symptom onset remained the same.The major differences were related to the change in study period and the concomitant changes in sample size, proportion vaccinated and proportion positive.In the two stepwise models we attempted, the variables identified as important predictors differed, with the exception of whether the interim estimate was made before or after the peak of the season.A previous study comparing interim and final estimates in Victoria, Australia, suggested that interim estimates may be most reliable when made after the peak of the influenza season, which was attributed to the gain in sample size when estimates are made later in the season.However, such a clear trend was not identified in a similar analysis performed in Spain [23].
Differences between interim and final estimates were most noticeable for estimates made against any influenza and influenza B. That concordance was better within subtypes possibly reflects how the summary estimate is influenced by individual specific type/subtype estimates as their prevalence changes throughout the season.Although we did not find a change in dominant strain to be an important predictor of ΔVE, we were unable to capture the more subtle influence of changes in the proportionate mix of types/subtypes as the seasons progressed.We also noted that final estimates were generally lower than interim estimates, which raises questions about waning vaccine effectiveness as the season progresses.
The largest methodological differences within study pairs were in the specification of the statistical model.When we examined whether a change to the regression model was associated with a change in the VE estimate, we found no statistical difference.This is consistent with findings from Victoria, Australia, where it was noted that estimates varied only slightly when the model used for final estimates was modified [19], and raises the question of whether it is necessary to adjust for additional variables just because they are available.In studies of VE, we are trying to estimate a causal effect [24].Thus, it could be argued that in principle, the model used for calculating VE should be decided a priori and should not change between interim and final estimation.We acknowledge that important information on known confounders may be incomplete when calculating interim estimates.In such cases, one must be mindful of statistical biases, such as biases associated with complete-case analysis, where missing data may not be missing at random, or sparse data, both of which can result in a loss of precision and inflated estimates.However, the use of identical methods provides an assurance that heterogeneity between interim and final estimates is not due to methodological differences and permits focus on other possible causes, such as the change in virus circulation and waning VE.As a minimum, reports should include in their sensitivity analyses a comparison of interim and final estimates using an identical analytical approach.
The results of our regression should be interpreted with caution.Firstly, the number of pairs available was probably insufficient to detect important associations, and certainly a multivariate model containing all predictors would have been overparameterised.With only 33 observations in the model, a change in value of any one predictor could substantially change the size and importance of the association estimated.We were also unable to explore any interactions and it is likely that the effect of any of predictors explored would vary across levels of other predictors.Secondly, although a study may have reported a certain study period, this did not necessarily correspond to the date range of the observations used in the VE estimation.This was noted in the 2013 studies in Australia, but could also happen as a consequence of covariate specification.For example, specification of week as a categorical variable can lead to perfect prediction [43] and loss of observations from weeks without both a case and a non-case.Truncation of the data by the regression programme will result in the loss of observations and reported sample sizes may therefore be misleading.Thus, it is possible that some of the predictors specified in our regression models were incorrectly calculated.Finally, we calculated ΔVE based on each study's point estimate only.Although ΔVE was calculated with a confidence interval, our regression models focussed on the median only.We did not exclude studies with large confidence intervals because their width is tied to sample size, which was one of the factors we were interested in exploring.
Interim estimates provide an early snapshot of the influenza vaccine's effectiveness during a season, but their validity and reliability needs to be assured.Endof-season estimates have advantages over interim estimates in terms of gains in sample size and the longer time available to undertake the analysis.However, they typically take more than six months to publish, which is well beyond their usefulness for policy.Interim estimates are also more useful than final estimates for decision making around vaccine composition.The WHO's Global Influenza Surveillance and Response System meets twice a year to generate a recommendation for the composition of the seasonal vaccine.Since February 2013, interim and final VE estimates generated from surveillance data have been presented at this meeting [44].The utility of VE estimates in strain composition is limited to scenarios where the virological and serological data are inconclusive, there are suitable, alternative candidates vaccine viruses, and VE suggests poor performance of the current component.However, because of their timeliness, it is the interim, not the final, VE estimates that are informative in such a scenario.
Given the potential utility of interim VE estimates and the variability between methods used to estimate interim and final VE, it would be worthwhile implementing the use of a standard model for estimating interim VE.Such a model might include a minimum set of known confounders in the statistical model, use of standardised inclusion criteria, and minimum sample size and/or standard error requirements.In conducting this review, we identified inconsistencies in the way data are reported, particularly case and vaccination status, highlighting the need for a standardised reporting template.The similarities observed between interim and final estimates support the feasibility of generating and disseminating preliminary estimates of VE while virus circulation is ongoing.

Figure 1
Figure 1 PRISMA flow diagram showing search strategy interval; ILI: influenza-like illness.a A/H1 refers to A(H1N1)pdm09.b Vaccination definition: threshold used to classify a patient as vaccinated; figures refer to the number of days since vaccination.c interval; ILI: influenza-like illness.a A/H1 refers to A(H1N1)pdm09.b Vaccination definition: threshold used to classify a patient as vaccinated; figures refer to the number of days since vaccination.interval; ILI: influenza-like illness.a A/H1 refers to A(H1N1)pdm09.b Vaccination definition: threshold used to classify a patient as vaccinated; figures refer to the number of days since vaccination.c

Table 2a
Changes in vaccine effectiveness estimates by type/subtype and differences between interim and final studies in model specification (n = 34)

Table 3
Summary of changes in study characteristics that influenced differences in vaccine effectiveness estimates β: regression coefficient; CI: confidence interval; ΔVE: difference in vaccine effectiveness estimates; inest: inestimable; NA: not applicable; NR: not retained; OR: odds ratio; se: standard error for the coefficient.a In linear models, p was measured by t-test.b In logistic models, p was measures by chi-square test.