Performance of influenza case definitions for influenza community surveillance: based on the French influenza surveillance network GROG, 2009-2014

International case definitions recommended by the Centers for Disease Control and Prevention (CDC), the European Centre for Disease Prevention and Control (ECDC), and the World Health Organization (WHO) are commonly used for influenza surveillance. We evaluated clinical factors associated with the laboratory-confirmed diagnosis of influenza and the performance of these influenza case definitions by using a complete dataset of 14,994 patients with acute respiratory infection (ARI) from whom a specimen was collected between August 2009 and April 2014 by the Groupes Régionaux d’Observation de la Grippe (GROG), a French national influenza surveillance network. Cough and fever ≥ 39 °C most accurately predicted an influenza infection in all age groups. Several other symptoms were associated with an increased risk of influenza (headache, weakness, myalgia, coryza) or decreased risk (adenopathy, pharyngitis, shortness of breath, otitis/otalgia, bronchitis/ bronchiolitis), but not throughout all age groups. The WHO case definition for influenza-like illness (ILI) had the highest specificity with 21.4%, while the ECDC ILI case definition had the highest sensitivity with 96.1%. The diagnosis among children younger than 5 years remains challenging. The study compared the performance of clinical influenza definitions based on outpatient surveillance and will contribute to improving the comparability of data shared at international level.


Introduction
According to the 2011 World Health Organization (WHO) guidelines, an influenza surveillance system aims to reliably detect the start and duration of the influenza season in order to monitor changes in the antigenicity of influenza viruses and provide guidance for influenza vaccine policies [1]. The system should provide continuous and robust data in order to monitor trends of clinically diagnosed influenza-like illness (ILI) and assess its disease burden in the general and high-risk population. The ability of the surveillance system to fulfil these epidemiological objectives depends on the accuracy of the clinical ILI case definition used. The search for the optimal case definition remains a public health challenge because of the lack of specificity of influenza symptoms, co-circulation of other respiratory viruses and low proportion of laboratory confirmation. Consequently, a variety of national case definitions are applied in surveillance networks worldwide, in addition to international ILI case definitions used by the United States (US) Centers for Disease Control and Prevention (CDC), the European Centre for Disease Prevention and Control (ECDC), and the WHO, which complicates data aggregation and comparison [2]. In addition to the established ILI case definitions, some surveillance systems use acute respiratory illness (ARI), a more sensitive but in exchange less specific case definition [2]. French influenza surveillance networks each have their own ILI definitions, which differ in the combination of clinical symptoms [2]. There are conflicting needs for a case definition: sensitive enough to ensure timely detection of the onset of an epidemic and specific enough to provide a small proportion of negative specimens among those tested and a robust impact estimate. The most accurate definition regarding sensitivity and specificity will provide the most accurate estimation of the number of influenza cases.
Evaluation and comparison of these case definitions are complicated by a variety of factors, such as differences in medical practice, prevalence during and outside the influenza seasons, respiratory co-infections in certain age groups, annual changes of influenza virus (sub)-types and heterogeneity of laboratory procedures for influenza testing. The optimal case definition should be applicable every year, internationally and in all medical settings (i.e community, outpatient and inpatient departments), regardless of the patients' age or co-infections with co-circulating respiratory viruses such as respiratory syncytial virus (RSV) or rhinovirus [1]. Several previous studies have attempted to evaluate and compare the performance of the current ILI definitions, but are restricted either to a single hospital setting [3][4][5][6][7][8] or to cohort studies [9,10]. Only few studies have evaluated the performance of the current ILI/ARI definitions in the context of a national influenza sentinel network over several years [11,12] and none included a paediatric population. Based on the data collected between 2009 and 2014 by the Groupes Régionaux d'Observation de la Grippe (GROG), a French national influenza surveillance network, this study aimed to analyse clinical and non-clinical factors associated with the diagnosis of influenza and to compare the performance of international clinical case definitions.

GROG network
In France (population: 64.6 million), the surveillance of influenza is coordinated by the national public health agency, Santé publique France (formerly Institut de Veille Sanitaire (InVS)) and combines virological, clinical as well as community and hospital data [13]. The GROG was founded in 1984 according to WHO guidelines to detect the emergence of annual influenza virus outbreaks, to monitor changes in the antigenicity of influenza viruses, to guide the selection of strains for the annual influenza vaccine, and to provide virus samples for use in vaccine production [14]. This network comprises 548 volunteer practitioners, 112 paediatricians and nine laboratories (two reference laboratories and seven hospital virology laboratories) distributed in all 22 regions of metropolitan France.
The sentinel physicians participating in the GROG network reported the weekly number of patients with acute respiratory infection (ARI), as defined by the GROG, presenting at their practice during the active influenza surveillance period (week 40 to 15). They collected information and provided, on a random sampling basis, nasal/pharyngeal swabs from a subset of ARI patients presenting within 48 hour of symptom onset. The definition of ARI adopted by the GROG was as follows: sudden onset of at least one respiratory sign (e.g. cough, sore throat, shortness of breath, coryza) AND at least one general symptom suggestive of an acute infectious disease (e.g. fever, fatigue, headache, malaise) ( Table  1).
Fever was defined as a body temperature greater than or equal to 38 °C. For each patient sampled, a standardised case reporting form was completed and sent along with the specimen to the corresponding reference

Study database
All cases between 2009 and 2014 were extracted from the GROG database. Patients were excluded from the study database if their specimens were positive for two influenza virus (sub)types or for influenza C virus, if they were sampled more than 48 hours after the onset of symptoms, or if at least one variable required for the analysis was incomplete. To avoid any inclusion bias in the patient selection, patients were excluded if the symptoms did not meet the GROG ARI definition. The start and the end of the influenza pandemic, the seasonal influenza epidemics and the bronchiolitis epidemics were defined by Santé publique France (former InVS) on the basis of the national surveillance network. A confirmed case of influenza was defined as a patient with a positive laboratory result for influenza A or B viruses.

Database analysis
All patients included in the study database were described by sex and age. Continuous variables were summarised as means with standard deviation (median with interquartile range (IQR) for non-normally distributed variables), and dichotomous or categorical variables were summarised as percentages. Influenza positivity rates were calculated by age group and month of the year.
A generalised estimating equation model was used to take account of the potential clustering of observations by practitioners. We fit a one-level, hierarchical, logistic regression model that incorporated the practitioner identity variables (level 1) using the SPSS V19 (IBM, Chicago, US) GENLIN function. Firstly, univariate associations describing the relationship of each potential predictive factor (sex, temperature, clinical symptoms, clinical case definition) with the outcome of laboratoryconfirmed influenza, were examined with univariate logistic regression analysis. Secondly, multivariable logistic regression models were used to investigate the combined influence of clinical variables tested in the bivariate analysis (sex, temperature, clinical symptoms) as potential independent predictive factors for laboratory-confirmed Influenza. In the non-stratified multivariate analysis, the interaction terms concerning the age group were also introduced in the models to adjust for the potential bias.
Both univariate and multivariate analyses were stratified according to age group. In the stratified and nonstratified multivariate analysis, influenza epidemic, influenza pandemic and bronchiolitis period were introduced as variables to adjust for potential bias.
Sensitivity, specificity and area under the curve (AUC) were calculated to assess the performance of case definitions by age group (0-4, 5-14, 15-64, ≥ 65 years) and influenza (sub)type (influenza A(H1N1)pdm09, A(H3N2) and influenza B). Sensitivity was defined as the proportion of laboratory-confirmed influenza patients who fulfilled the clinical case definition. The specificity was defined as the proportion of influenza-negative patients who did not fulfil the clinical case definition. The average predictive performance was quantified using the area under the receiver operating characteristic curve to determine the AUC.
A p value below 0.05 was considered significant. The statistical analysis for the GROG database was performed with SPSS v19 (IBM, Chicago, US) software.

Case definitions tested
We selected the three most commonly used international ILI definitions [1,2]: the ECDC ILI definitions, the WHO ILI definition updated in 2011 and the CDC ILI definition (Table 1). All definitions include the presence of general (e.g. fever) and respiratory symptoms with or without a sudden onset. The number of included criteria varies from three (WHO) to nine (ECDC).

Ethics
Oral informed consent was obtained from patients at the moment of swab taking in accordance with national regulations. All swab results and forms were anonymised by the laboratories before they were sent to the GROG network coordination. In accordance with applicable laws and regulations, no clearance by an Ethics Committee is required in France for the retrospective analysis of anonymised data collected within routine influenza surveillance schemes.

Database description
The work was conducted on a complete dataset of 14   All factors were entered into the multiple regression model performed on the whole database and stratified by age groups (Table 4). Multivariate analysis was performed using a generalised estimating equation model to account for the potential clustering of observations by general practitioner. All the variables tested in the univariate analysis were included in the multivariate analysis. Only results from the variables that were significant (p < 0.05) in the multivariate analysis are shown in the table.

Clinical and demographic predictors of laboratory-confirmed influenza detection
In the non-stratified and stratified multivariate analyses, influenza epidemic, influenza pandemic and bronchiolitis period were introduced as variables to adjust for potential bias. In the non-stratified multivariate analysis, the interaction terms concerning the age group were also introduced in the models to adjust for the potential bias.

Impact of age group, influenza (sub)type and epidemic period on performance of current ILI and ARI definitions
All ILI case definitions presented with the lowest sensitivity among the 0-4 years age group (Table 6) and the highest sensitivity among the ≥ 65 years age group. The WHO definition revealed the largest sensitivity difference (11.8%) between the oldest and the youngest age groups and had the poorest sensitivity in the 0-4 years age group (84.2%). There was no noticeable difference in sensitivities between the three definitions in the ≥ 65 years age group. Stratified by influenza (sub) type, the ECDC and CDC definitions performed similarly with sensitivities above 94%, while the WHO ILI had a higher sensitivity for influenza A(H1N1)pdm09 (91.8%) and A(H3N2) (89.2%) than for influenza B (86.6%). Stratified by influenza period, all ILI case definitions showed highest sensitivities during the pandemic period compared with the epidemic periods.
Accordingly, all ILI case definitions showed the highest specificity among the 5-14 year-olds, and the WHO definition had the highest specificity in all age groups. Stratified by influenza period, the ECDC and CDC definitions had similar specificity, while the WHO ILI had a higher specificity during the influenza seasonal epidemic period compared with the pandemic periods.
All definitions revealed the highest AUC values among the 5-14 year-olds and for the A(H1N1)pdm 09 viruses. The WHO definition had the highest AUC values in all age groups, all influenza (sub)types and all tested  periods (influenza pandemic, influenza epidemics and bronchiolitis period). There was no significant difference in the AUC values for each definition among the three tested periods.

Discussion
This study evaluated both the clinical factors associated with the diagnosis of influenza and the performance of influenza case definitions, based on a national influenza surveillance database. The database had distinct features: (i) influenza was confirmed by the gold standard RT-PCR technique, (ii) the database included a large paediatric population and (iii) the database covered one pandemic and four seasonal influenza epidemics, with information on influenza A and B viruses.
This study identified cough and fever ≥ 39 °C as the symptoms which most accurately predicted an influenza infection in all age groups. Similar findings have been reported previously [9,12]. Several other symptoms (headache, weakness, myalgia, coryza) were associated with an increased risk of influenza infection but not throughout all age groups. On the other hand, pharyngitis appeared to be associated with a decreased risk of influenza infection in all age group except those 65 years and older. Assuming an overlap between the variables pharyngitis and sore throat, these two symptoms might not improve, but rather weaken an ILI definition. This result supports the updated WHO definition from 2011 that removed sore throat from its definition [1]. Based on this study and the current literature, we believe that there is evidence to exclude 'sore throat' from ILI definitions (such as ECDC and CDC ILI definition). Several others symptoms (adenopathy, pharyngitis, shortness of breath, otitis/ otalgia, bronchitis/bronchiolitis, rash) were associated with a decreased risk of influenza infection, but not in all age groups. Surprisingly, shortness of breath also appeared to be associated with a decreased risk of influenza in the younger patients (younger than 14 years). This result suggests that this symptom may contribute to weaken the performance of the ECDC and CDC ILI definitions in the younger age groups and may also rather be excluded from ILI definitions.
Negative associations at age 0-4 years could be due to other respiratory tract pathogens circulating in this age group [16]. The variety of other potential co-infecting pathogens may have caused the lower performance of all case definitions in the 0-4 years age group [10]. One way to improve the specificity of the ILI definition in this particular age group would be a higher temperature cut-off because the multivariate model showed that in the youngest age group, only high body temperatures above 38.5 °C were strongly associated with influenza.
These strong age-dependent differences are likely to have contributed substantially to the variable performance of case definitions reported in different studies, in particular when the age groups 0-4 years and ≥ 65 years are underrepresented in the tested population [17]. In addition, it remains difficult to measure the impact of influenza types or subtypes as they are tightly associated with the age group. For example, stratification by Influenza virus (sub)types showed that the WHO ILI definitions had lower sensitivity for influenza B. Indeed, to our knowledge, no differences in clinical symptoms have been reported so far for outpatients infected with influenza A compared with influenza B viruses [18]. Hence, age is probably the main confounding factor as most of the patients with influenza B in our study were 5-14 years-old, whereas patients with influenza A were predominantly 0-4 years-old. Therefore the evolving epidemiology of influenza may indirectly impact the performance of surveillance networks. Those results strongly suggest that interpretation of syndromic surveillance data without information on age may be misleading [17]. It is very unlikely that a 'one size fits all' approach reaches optimal performance for all the age groups or influenza (sub)type. To do so, it may be necessary to develop age or (sub)type-specific case definitions for influenza. The temperature cut-off may be adjusted, notably in the older and younger age groups, as it greatly impacted the sensitivity and specificity [19].
Our study had some limitations. It should be noted that the case definitions were tested with those variables which were collected by the surveillance network. In the present study, the clinical diagnosis pharyngitis was used instead of the case definition variable sore throat, which might have resulted in some discrepancies, and interpretation must be done cautiously. Fever was defined as a body temperature ≥ 38 °C for all case definitions, although the ECDC ILI definition does not define any exact temperature cut-off and the CDC ILI definition defines fever for a temperature ≥ 100° F (37.8 °C). This slight alteration of the CDC definition should be taken into account when interpreting the study results. However, the impact of such an alteration should be minimal compared with other known factors that affect the measurement of body temperature such as: individual variability, daily variation, site of measurement and the natural trend for physicians to round up or round down temperatures to .5 or .0 digits (i.e in the case of American doctors the 100 °F and European doctors 38 °C [20]. Due to the predefined temperature cut-off of the GROG database, sub-or afebrile patients in our database who did not also present headache, weakness, myalgia or chills were not included. Therefore we cannot exclude that sensitivity may have been over-and specificity underestimated. These results are in accordance with data obtained by Thurksy et al. in a similar setting (the Australian influenza surveillance programme) and in the absence of a defined temperature cut-off for fever [21]. Indeed Thurksy et al. reported, over two influenza seasons, a high sensitivity (98.4-100.0%) and a very low specificity (7.1-12.9%) for the CDC definition.
However, the performance results in our study differed from other studies, which relied either on hospitalised patients [7,8] or on a cohort of self-reporting adults [22] that observed higher specificity and lower sensitivity values for similar clinical definitions.
It is still questionable to what extent the results could be applied to other surveillance systems. Indeed, the patients were sampled according to the GROG case definition, which may have influenced the results. In general, it is challenging to fully investigate the relation between clinical features and healthcare seeking behaviour that strongly determine the characteristics of the study population (demographic and clinical) and most probably impact the performance of a case definition, as already suggested by Jiang et al. [22]. Another open question is how these surveillance definitions will perform in the context of an influenza epidemic caused by an emerging influenza virus with more atypical clinical symptoms, for example conjunctivitis in the context of infection with an avian influenza virus.

Conclusions
The study compares the performance of clinical influenza definitions in the setting of a national network based on outpatient surveillance. The revised WHO ILI definition could be chosen for surveillance purposes for its higher specificity and better performance in all age groups, which allowed a more accurate estimation of influenza case numbers and an increase in the proportion of influenza-positive samples. In any case, the diagnosis among children younger than 5 years remains challenging, as only fever was highly predictive of influenza infection, suggesting that the temperature cut-off in the case definition is critical to accurately predict influenza among the large number of differential diagnoses in that age group.