Event-based biosurveillance of respiratory disease in Mexico, 2007–2009: connection to the 2009 influenza

The emergence of the 2009 pandemic influenza A(H1N1) virus in North America and its subsequent global spread highlights the public health need for early warning of infectious disease outbreaks. Event-based biosurveillance, based on local- and regional-level Internet media reports, is one approach to early warning as well as to situational awareness. This study analyses media reports in Mexico collected by the Argus biosurveillance system between 1 October 2007 and 31 May 2009. Results from Mexico are compared with the United States and Canadian media reports obtained from the HealthMap system. A significant increase in reporting frequency of respiratory disease in Mexico during the 2008-9 influenza season relative to that of 2007-8 was observed (p<0.0001). The timing of events, based on media reports, suggests that respiratory disease was prevalent in parts of Mexico, and was reported as unusual, much earlier than the microbiological identification of the pandemic virus. Such observations suggest that abnormal respiratory disease frequency and severity was occurring in Mexico throughout the winter of 2008-2009, though its connection to the emergence of the 2009 pandemic influenza A(H1N1) virus remains unclear.


Introduction
The emergence in North America and global spread of the novel 2009 pandemic influenza A(H1N1) virus of swine origin was unanticipated in the spring of 2009 by governments and health agencies around the world. Many nations have limited ability to detect outbreaks or maintain situational awareness of them within their borders, making reporting and early warning of emerging influenza viruses with pandemic potential problematic. Event-based biosurveillance, based on local-and regional-level Internet media reports, is an internationally recognised and considered approach to early warning and situational awareness [1,2]. In this study we present the observations of the Argus biosurveillance system on respiratory disease in Mexico between 1 October 2007 and 31 May 2009. These results are compared with observations of the HealthMap system [3,4] on respiratory disease in the United States and Canada, just before widespread media coverage of the 2009 pandemic influenza A(H1N1), then called 'swine flu', in 21-23 April 2009. In the United States and Canada, media reporting related to swine flu was not observed before that time.

Argus biosurveillance system
The Argus system, a web-based global biosurveillance system hosted at the Georgetown University Medical Center (Washington, DC, United States) and funded by the United States Government, is designed to report and track the evolution of biological events threatening human, plant and animal health globally, excluding the United States [5]. It collects, in an automated process, local, native-language Internet media reports, including blogs and official sources, e.g. World Health Organization (WHO) and World Organisation for Animal Health (OIE), and interprets their relevance according to a specific set of concepts and keywords relevant to infectious disease surveillance (i.e. a taxonomy of media reporting of infectious disease). Argus does not use scientific journals as a primary source for identifying emerging events. Elements of the taxonomy define direct indicators (i.e. reports of disease) and six categories of indirect indicators of disease (Table 1).
Project analysts -about 40 regional specialists who collectively are fluent in approximately 40 languagesmonitor several thousand Internet sources daily. They use Boolean keyword searching and Bayesian model tools [6] to select reports from a dynamic database of media reports collected from Internet sources six times daily. A complete archive is maintained for retrospective analyses and refinement of biosurveillance methodology [7]. The project analysts write event reports, which are based on relevant media reports, and a stage is assigned to the report based on observed event progression according to a previously described heuristic model [8], ranging from preparatory (e.g. prevention activities and conditions conducive to disease emergence and transmission) to degree of disease spread to degrees of social disruption to recovery (Table 2) [9,10]. The reports are posted on a secure Internet portal for the diverse set of Argus users [5,11] to view.

HealthMap system
HealthMap is an automated multilingual real-time disease outbreak detection, tracking and visualization system, which, like Argus, relies on publically available information, including social media and official sources, from the Internet for its data [2]. It provides global media coverage, which, unlike Argus, includes the United States. HealthMap data on 2009 pandemic influenza A(H1N1) is therefore used to describe the emergence and evolution of swine-origin influenza in the United States and Canada [3,4].

Aims of this study
Event-based surveillance, as conducted by the Argus and HealthMap systems, has been shown to be able to identify emerging outbreaks [1,5] from information in publically available media sources. This information can be used by public health professionals to investigate an emerging or changing pathogen earlier than they would otherwise. This study was conducted to demonstrate quantitatively how event-based surveillance is a useful tool complementary to traditional public health surveillance methods for providing early warning and tracking of an emerging outbreak.

Selection of Argus reports
We retrieved from the Argus archive reports written by project analysts based on Spanish-and Englishlanguage Internet media reports of respiratory disease, including the 2009 pandemic influenza A(H1N1), in Mexico between 1 October 2007 and 31 May 2009 (thus covering the 2007-8 and 2008-9 respiratory disease seasons). We reviewed the Argus reports and identified the geographical locations of the events described in them. The number of sources and media reports archived did not vary substantially between the 2007-8 and 2008-9 respiratory disease seasons.

Determining frequency of reporting of respiratory disease in Mexico
As Argus does not categorise articles by topic as they are archived, the total number of reports written on respiratory disease by the project analysts was used as the numerator. Each Argus report is based on one or more media reports. The rate of reporting was defined as the ratio of the number of written reports meeting the inclusion criteria to the total number of media reports in the archive from Mexican sources, which were computed as a function of time (in days) for the study period.
Descriptive statistics, including reporting frequency and mean reporting rate, were also computed. Argus heuristic report staging [8] was also analysed for the study period and descriptive statistics computed, including the frequency of each stage and mean stage.
The Shapiro-Wilk test was used to assess the normality of the data. The reporting rate and stage data for  A sample of Argus reports (n=133) classified as stage 2 or greater from 1 January 2009 to 23 April 2009, was randomly selected using an R-random number generator and assembled into a table of events, before widespread media reporting of the pandemic influenza in the international media. The sample was reviewed and compared with reports not selected by randomisation from the time period and was determined to be representative of the larger dataset.

Selection of HealthMap reports
A

Frequency of Argus reports on respiratory disease in Mexico
From an archive of 2.1 million Internet media reports collected in 2007-2008 and 2.0 million articles in 2008-2009, 722 Argus reports were identified, 684 of which met the inclusion criteria. Figure 1 shows an increase in reporting frequency during the 2008-9 respiratory disease season relative to that of 2007-8. The increase in the number of Argus reports in the 2008-9 season (n=491) compared with those in the

Outbreak severity
An assessment of outbreak severity via media reports provides context for interpreting the events. As described above, Argus does this by staging reports according to a heuristic model of societal disruption ( Table 2) [8]. As depicted in Figure 3, reports on respiratory disease outbreaks in Mexico were predominately stage 2 and occasionally stage 3 (mean=2.35) throughout the 2008-9 season, whereas in the corresponding period the previous year, reports were consistently staged at 2 or less (mean=1.80). Thus, using this staging system, the mean stage of the reports in the 2008-9 season was significantly higher than that of the reports in the 2007-8 season (W=30,184.5, p<0.0001), indicating higher social disruption in the later season.

Unusual timing and extent of respiratory disease reporting in Mexico
We reviewed the Argus reports to estimate when respiratory illness reporting frequency became prominent and anomalous in the Mexican media in the 2008-9 respiratory disease season. Table 3 illustrates the timing of events, based on a random selection of reports classed as stage 2 and greater from 1 January to 23 April 2009, before widespread reporting of 2009 pandemic influenza A(H1N1) in the international media. It illustrates that respiratory disease was prevalent in parts of Mexico, and reported as unusual, much earlier than the microbiological identification of the pandemic virus in late April 2009 [14].

Emergence of pandemic influenza A(H1N1) in the United States and Canada
In the United States, two cases developed symptoms of swine influenza A(H1N1) in late March 2009 in California and were reported in mid-April [15]. The timing of the emergence and evolution of the pandemic influenza in the United States and Canada, based on data collected by the HealthMap system from 21 to 23 April 2009, before recognition of the novel virus and reporting by the international media, is depicted in Table 4. As can be seen, the United States media began reporting on the two cases on 21 April [2 and sources cited therein, 15]. This was followed by the reporting of three

Discussion and conclusion
Increased Argus event reporting frequency, longer duration of the respiratory disease season and significantly increased stage (social disruption) of Argus event reports together provide evidence of an anomalous respiratory disease season in 2009. The timing of events, based on media reports, suggests that respiratory disease was prevalent in parts of Mexico, and was reported as unusual, much earlier than the microbiological identification of the pandemic virus. While it is impossible to estimate the earliest date of emergence of the 2009 pandemic influenza A(H1N1) virus in the absence of historical serological collections and microbiological test results, Figure 1 illustrates the significantly higher frequency of respiratory disease reporting in January to April in the 2008-9 influenza season than in the 2007-8 season. Likewise, Figure 2 shows the significantly higher mean stage in the 2008-9 season than in the 2007-8 season. These observations suggest a connection between the anomalous respiratory disease season in Mexico in 2008-9, detected through event-based biosurveillance, and the 2009 pandemic influenza A(H1N1). The connection remains unclear, however, without laboratory confirmation of the cases described in the media reports and without more historical laboratory data on the time frame of the pandemic emergence. Nonetheless, event-based biosurveillance provides a tool for early detection of emerging outbreaks complementary to traditional public health approaches, though quantitative analysis of such data is in the early stages. To the best of our knowledge, this is the first study to attempt to quantitatively analyse event-based biosurveillance data over time.
Event-based biosurveillance systems function as cueing and alerting systems to government officials and public health professionals by disseminating reports of potentially emerging biological events. Public health professionals can act on these surveillance reports mainly when they are considered within the context of societal, epidemiological and immunological factors.
Biosurveillance reports, such as those analysed in this study, provide information on these factors. Such event-based surveillance data can cue public health professionals to investigate an emerging or changing pathogen earlier than they otherwise would, as well as provide a means for monitoring the spread of disease and its severity in a population or region. Examples of early public health response include directed sample collection for laboratory confirmation, deployment of diagnostic test kits to affected regions, and initiation of preventive measures, such as border closings, wearing of face masks and limiting public events.
Event-based biosurveillance, such as that described in this study, is a tool complementary to traditional public health surveillance methods used to identify an outbreak and track its progression. Media reports can sometimes be difficult to confirm, highlighting the need for clinic-based syndromic surveillance in conjunction with microbiological surveillance for verification and diagnosis of disease(s) present in suspected outbreaks. In nations where traditional public health surveillance is not possible, event-based biosurveillance data may be helpful in situational awareness until other methods are deployed.

Table 4
Emergence and evolution of swine-origin influenza in the United States and Canada, 21-23 April 2009 a The value added by event-based biosurveillance is that media reports encompass both direct indicators of disease occurrence and indirect indicators based on societal response to an emerging event, and they are produced in real time. Indirect indicators may allow identification of media signals of anomalous disease activity, and tracking such anomalous events in the media may provide clues to emerging events. In July 2009, WHO recommended that countries monitor unexpected, unusual or notable changes in patterns of influenza transmission or severity of the disease, including spikes in rates of absenteeism from schools or workplaces, or increases in the number of emergency department visits [16]. This type of monitoring became more crucial as the 2009 influenza pandemic evolved, particularly as global diagnostic capabilities became overwhelmed and monitoring of case counts became increasingly problematic. Regional, cultural and linguistic expertise is key to tracking anomalous events through recognising local signatures of social disruption, identifying indirect indicators and assigning the appropriate event staging. . This highlights the need for improved baseline data in order to distinguish more precisely anomalous signal from background reporting. Such baseline data, which can be generated from quantitative studies evaluating archival multiyear surveillance data from Argus, HealthMap and other eventbased surveillance systems, would provide a means for better understanding of media signatures, enabling more specific signal generation and the establishment of signal thresholds. Consequently, a more robust cueing and alerting capability would be available for officials responsible for determining when to trigger an investigation of a pertinent emerging event. Spatial and temporal modelling have been investigated as methods of distinguishing abnormal from normal patterns in syndromic surveillance data [17], and may be applicable to event-based biosurveillance. Such algorithms will become more applicable as additional biosurveillance records are collected from continuing global media coverage.
This retrospective study highlights a number of factors that are important for developing an actionable event-based biosurveillance prospective methodology. Clearly defined inclusion criteria, validated on past outbreaks such as the 2009 influenza pandemic, are a basic component of such an approach. An established multiyear geospatial baseline of disease reporting will form the basis for quantifying anomaly. Lastly, continual quantification and assessment of the impact of different types of biosurveillance data and data sources upon system sensitivity, specificity and timeliness must be undertaken [18]. An important goal is to make it possible to identify needed improvements in the operation of event-based biosurveillance systems, enabling desired performance targets to be achieved. The results of retrospective studies of event-based biosurveillance systems, in conjunction with lessons learnt from event detection and response, can be used to establish thresholds for early alerting of future pandemics, facilitating more timely official intervention and public health response.