29 July 2010
Event-based biosurveillance of respiratory disease in Mexico, 2007–2009: connection to the 2009 influenza A(H1N1) pandemic?
The emergence of the 2009 pandemic influenza A(H1N1) virus in North America and its subsequent global spread highlights the public health need for early warning of infectious disease outbreaks. Event-based biosurveillance, based on local- and regional-level Internet media reports, is one approach to early warning as well as to situational awareness. This study analyses media reports in Mexico collected by the Argus biosurveillance system between 1 October 2007 and 31 May 2009. Results from Mexico are compared with the United States and Canadian media reports obtained from the HealthMap system. A significant increase in reporting frequency of respiratory disease in Mexico during the 2008–9 influenza season relative to that of 2007–8 was observed (p<0.0001). The timing of events, based on media reports, suggests that respiratory disease was prevalent in parts of Mexico, and was reported as unusual, much earlier than the microbiological identification of the pandemic virus. Such observations suggest that abnormal respiratory disease frequency and severity was occurring in Mexico throughout the winter of 2008–2009, though its connection to the emergence of the 2009 pandemic influenza A(H1N1) virus remains unclear.
The emergence in North America and global spread of the novel 2009 pandemic influenza A(H1N1) virus of swine origin was unanticipated in the spring of 2009 by governments and health agencies around the world. Many nations have limited ability to detect outbreaks or maintain situational awareness of them within their borders, making reporting and early warning of emerging influenza viruses with pandemic potential problematic. Event-based biosurveillance, based on local- and regional-level Internet media reports, is an internationally recognised and considered approach to early warning and situational awareness [1,2]. In this study we present the observations of the Argus biosurveillance system on respiratory disease in Mexico between 1 October 2007 and 31 May 2009. These results are compared with observations of the HealthMap system [3, 4] on respiratory disease in the United States and Canada, just before widespread media coverage of the 2009 pandemic influenza A(H1N1), then called ‘swine flu’, in 21–23 April 2009. In the United States and Canada, media reporting related to swine flu was not observed before that time.
Argus biosurveillance system
The Argus system, a web-based global biosurveillance system hosted at the Georgetown University Medical Center (Washington, DC, United States) and funded by the United States Government, is designed to report and track the evolution of biological events threatening human, plant and animal health globally, excluding the United States . It collects, in an automated process, local, native-language Internet media reports, including blogs and official sources, e.g. World Health Organization (WHO) and World Organisation for Animal Health (OIE), and interprets their relevance according to a specific set of concepts and keywords relevant to infectious disease surveillance (i.e. a taxonomy of media reporting of infectious disease). Argus does not use scientific journals as a primary source for identifying emerging events. Elements of the taxonomy define direct indicators (i.e. reports of disease) and six categories of indirect indicators of disease (Table 1).
Table 1. Argus taxonomy elements of media reporting of infectious disease
Project analysts – about 40 regional specialists who collectively are fluent in approximately 40 languages – monitor several thousand Internet sources daily. They use Boolean keyword searching and Bayesian model tools  to select reports from a dynamic database of media reports collected from Internet sources six times daily. A complete archive is maintained for retrospective analyses and refinement of biosurveillance methodology . The project analysts write event reports, which are based on relevant media reports, and a stage is assigned to the report based on observed event progression according to a previously described heuristic model , ranging from preparatory (e.g. prevention activities and conditions conducive to disease emergence and transmission) to degree of disease spread to degrees of social disruption to recovery (Table 2) [9,10]. The reports are posted on a secure Internet portal for the diverse set of Argus users [5,11] to view.
Table 2. Argus staging system
HealthMap is an automated multilingual real-time disease outbreak detection, tracking and visualization system, which, like Argus, relies on publically available information, including social media and official sources, from the Internet for its data . It provides global media coverage, which, unlike Argus, includes the United States. HealthMap data on 2009 pandemic influenza A(H1N1) is therefore used to describe the emergence and evolution of swine-origin influenza in the United States and Canada [3,4].
Aims of this study
Event-based surveillance, as conducted by the Argus and HealthMap systems, has been shown to be able to identify emerging outbreaks [1,5] from information in publically available media sources. This information can be used by public health professionals to investigate an emerging or changing pathogen earlier than they would otherwise. This study was conducted to demonstrate quantitatively how event-based surveillance is a useful tool complementary to traditional public health surveillance methods for providing early warning and tracking of an emerging outbreak.
Selection of Argus reports
We retrieved from the Argus archive reports written by project analysts based on Spanish- and English-language Internet media reports of respiratory disease, including the 2009 pandemic influenza A(H1N1), in Mexico between 1 October 2007 and 31 May 2009 (thus covering the 2007–8 and 2008–9 respiratory disease seasons). We reviewed the Argus reports and identified the geographical locations of the events described in them. The number of sources and media reports archived did not vary substantially between the 2007–8 and 2008–9 respiratory disease seasons.
Determining frequency of reporting of respiratory disease in Mexico
As Argus does not categorise articles by topic as they are archived, the total number of reports written on respiratory disease by the project analysts was used as the numerator. Each Argus report is based on one or more media reports. The rate of reporting was defined as the ratio of the number of written reports meeting the inclusion criteria to the total number of media reports in the archive from Mexican sources, which were computed as a function of time (in days) for the study period.
Descriptive statistics, including reporting frequency and mean reporting rate, were also computed. Argus heuristic report staging  was also analysed for the study period and descriptive statistics computed, including the frequency of each stage and mean stage.
The Shapiro–Wilk test was used to assess the normality of the data. The reporting rate and stage data for each season, 1 October 2007 to 31 May 2008 and 1 October 2008 to 31 May 2009, were not normally distributed (2007–8 rate: W=0.7067, p<0.0001; 2008–9 rate: W=0.647, p<0.0001; 2007–8 stage: W=0.5679, p<0.0001; 2008–9 stage: W=0.6879, p<0.0001). Thus the non-parametric Wilcoxon rank-sum test was used to assess the difference in mean rate of respiratory disease reporting and mean stage between the 2007–8 and 2008–9 seasons. All statistics were computed using R Version 2.9-0 .
A sample of Argus reports (n=133) classified as stage 2 or greater from 1 January 2009 to 23 April 2009, was randomly selected using an R-random number generator and assembled into a table of events, before widespread media reporting of the pandemic influenza in the international media. The sample was reviewed and compared with reports not selected by randomisation from the time period and was determined to be representative of the larger dataset.
Selection of HealthMap reports
A table of events from HealthMap, based on English- and Spanish-language sources covering the United States and Canada, was also assembled. HealthMap uses automated crawling and filtering tools to identify relevant media reports, which are posted to the HealthMap website [2, 4, 13]. At least one of eight analysts employed by HealthMap reviews all information posted to the site for accuracy, relevance and correct categorisation. Any report from 21–23 April 2009 referring to swine influenza was included in the table of events.
Frequency of Argus reports on respiratory disease in Mexico
From an archive of 2.1 million Internet media reports collected in 2007–2008 and 2.0 million articles in 2008–2009, 722 Argus reports were identified, 684 of which met the inclusion criteria. Figure 1 shows an increase in reporting frequency during the 2008–9 respiratory disease season relative to that of 2007–8.
Figure 1. Mean rate per day of Argus reporting of respiratory disease in Mexico, based on Internet media reports, 1 October 2007 – 31 May 2009
Between 1 October 2007 and 31 May 2008, the mean rate of reporting per day was 2.19, whereas in the same period in 2008–2009, the mean rate per day was significantly higher (4.08) (W=3,985, p<0.0001). In 2008, the reporting rate declined by almost twofold from January to February and decreased further in April, whereas in 2009, the reporting rate also decreased from January to February by about twofold, though it remained higher than in the same period in 2008. The reporting rate started to increase in March 2009 and continued to rise in April, spiking more than fivefold by the end of the month. This higher rate of reporting from 1 January to 30 April 2009, compared with the same period in 2008, was also significant (W=780, p<0.001).
The increase in the number of Argus reports in the 2008–9 season (n=491) compared with those in the 2007–8 season (n=193) is illustrated by Mexico total, Mexico unattributed and Mexican state in Figure 2.
Figure 2. Argus respiratory disease reports by Mexican state, based on Internet media reports, for the 2007–8 and 2008–9 respiratory disease seasons
Geographical focus of respiratory disease reporting in Mexico
No clear geographical focus of respiratory disease reporting during the 2007–8 respiratory disease season was discernable (Figure 2). Potential clusters of increased reporting in the 2008–9 season compared with the 2007–8 season were evident in the states of Chihuahua, Distrito Federal, Guanajuato, Hidalgo, Oaxaca, San Luis Potosi, Sonora, Tamaulipas, Tlaxacala, Veracruz-Llave and Zacatecas. Argus reports based on media reports from these states in the 2008–9 season represented a greater than 60% increase in reporting frequency compared with that in the 2007–8 season. Statistical significance of state trends was not determined.
An assessment of outbreak severity via media reports provides context for interpreting the events. As described above, Argus does this by staging reports according to a heuristic model of societal disruption (Table 2) . As depicted in Figure 3, reports on respiratory disease outbreaks in Mexico were predominately stage 2 and occasionally stage 3 (mean=2.35) throughout the 2008–9 season, whereas in the corresponding period the previous year, reports were consistently staged at 2 or less (mean=1.80). Thus, using this staging system, the mean stage of the reports in the 2008–9 season was significantly higher than that of the reports in the 2007–8 season (W=30,184.5, p<0.0001), indicating higher social disruption in the later season.
Figure 3. Mean stage of Argus reports of respiratory disease, based on Mexican Internet media reports, 1 October 2007 – 31 May 2009
Unusual timing and extent of respiratory disease reporting in Mexico
We reviewed the Argus reports to estimate when respiratory illness reporting frequency became prominent and anomalous in the Mexican media in the 2008–9 respiratory disease season. Table 3 illustrates the timing of events, based on a random selection of reports classed as stage 2 and greater from 1 January to 23 April 2009, before widespread reporting of 2009 pandemic influenza A(H1N1) in the international media. It illustrates that respiratory disease was prevalent in parts of Mexico, and reported as unusual, much earlier than the microbiological identification of the pandemic virus in late April 2009 .
Table 3. Emergence and evolution of respiratory disease in Mexico, 1 January – 23 April 2009a
Emergence of pandemic influenza A(H1N1) in the United States and Canada
In the United States, two cases developed symptoms of swine influenza A(H1N1) in late March 2009 in California and were reported in mid-April . The timing of the emergence and evolution of the pandemic influenza in the United States and Canada, based on data collected by the HealthMap system from 21 to 23 April 2009, before recognition of the novel virus and reporting by the international media, is depicted in Table 4. As can be seen, the United States media began reporting on the two cases on 21 April [2 and sources cited therein, 15]. This was followed by the reporting of three additional cases in California and two in Texas on 23 April. The following day, there were 75 suspected cases in Queens, New York. By 27 April, widespread informal reporting across the United States was observed by HealthMap. Canadian reporting had significant overlap with United States reporting on this event from 21 to 23 April 2009. Thus the recognition of the novel virus in the United States and Canadian media occurred only days before widespread recognition in the international media, in contrast to indications of emergence of the event much earlier in the Mexican media.
Table 4. Emergence and evolution of swine-origin influenza in the United States and Canada, 21–23 April 2009a
Discussion and conclusion
Increased Argus event reporting frequency, longer duration of the respiratory disease season and significantly increased stage (social disruption) of Argus event reports together provide evidence of an anomalous respiratory disease season in 2009. The timing of events, based on media reports, suggests that respiratory disease was prevalent in parts of Mexico, and was reported as unusual, much earlier than the microbiological identification of the pandemic virus. While it is impossible to estimate the earliest date of emergence of the 2009 pandemic influenza A(H1N1) virus in the absence of historical serological collections and microbiological test results, Figure 1 illustrates the significantly higher frequency of respiratory disease reporting in January to April in the 2008–9 influenza season than in the 2007–8 season. Likewise, Figure 2 shows the significantly higher mean stage in the 2008–9 season than in the 2007–8 season. These observations suggest a connection between the anomalous respiratory disease season in Mexico in 2008–9, detected through event-based biosurveillance, and the 2009 pandemic influenza A(H1N1). The connection remains unclear, however, without laboratory confirmation of the cases described in the media reports and without more historical laboratory data on the time frame of the pandemic emergence. Nonetheless, event-based biosurveillance provides a tool for early detection of emerging outbreaks complementary to traditional public health approaches, though quantitative analysis of such data is in the early stages. To the best of our knowledge, this is the first study to attempt to quantitatively analyse event-based biosurveillance data over time.
Event-based biosurveillance systems function as cueing and alerting systems to government officials and public health professionals by disseminating reports of potentially emerging biological events. Public health professionals can act on these surveillance reports mainly when they are considered within the context of societal, epidemiological and immunological factors. Biosurveillance reports, such as those analysed in this study, provide information on these factors. Such event-based surveillance data can cue public health professionals to investigate an emerging or changing pathogen earlier than they otherwise would, as well as provide a means for monitoring the spread of disease and its severity in a population or region. Examples of early public health response include directed sample collection for laboratory confirmation, deployment of diagnostic test kits to affected regions, and initiation of preventive measures, such as border closings, wearing of face masks and limiting public events.
Event-based biosurveillance, such as that described in this study, is a tool complementary to traditional public health surveillance methods used to identify an outbreak and track its progression. Media reports can sometimes be difficult to confirm, highlighting the need for clinic-based syndromic surveillance in conjunction with microbiological surveillance for verification and diagnosis of disease(s) present in suspected outbreaks. In nations where traditional public health surveillance is not possible, event-based biosurveillance data may be helpful in situational awareness until other methods are deployed.
The value added by event-based biosurveillance is that media reports encompass both direct indicators of disease occurrence and indirect indicators based on societal response to an emerging event, and they are produced in real time. Indirect indicators may allow identification of media signals of anomalous disease activity, and tracking such anomalous events in the media may provide clues to emerging events. In July 2009, WHO recommended that countries monitor unexpected, unusual or notable changes in patterns of influenza transmission or severity of the disease, including spikes in rates of absenteeism from schools or workplaces, or increases in the number of emergency department visits . This type of monitoring became more crucial as the 2009 influenza pandemic evolved, particularly as global diagnostic capabilities became overwhelmed and monitoring of case counts became increasingly problematic. Regional, cultural and linguistic expertise is key to tracking anomalous events through recognising local signatures of social disruption, identifying indirect indicators and assigning the appropriate event staging.
Our study made use of historical data available in the Argus archive from 2007 and for HealthMap from 2006. Though not presented in this study, HealthMap reports from the Spanish media during the 2006–7 season, including but not limited to Mexican sources, show a high frequency of reporting in March similar to that observed in March 2009. This highlights the need for improved baseline data in order to distinguish more precisely anomalous signal from background reporting. Such baseline data, which can be generated from quantitative studies evaluating archival multiyear surveillance data from Argus, HealthMap and other event-based surveillance systems, would provide a means for better understanding of media signatures, enabling more specific signal generation and the establishment of signal thresholds. Consequently, a more robust cueing and alerting capability would be available for officials responsible for determining when to trigger an investigation of a pertinent emerging event. Spatial and temporal modelling have been investigated as methods of distinguishing abnormal from normal patterns in syndromic surveillance data , and may be applicable to event-based biosurveillance. Such algorithms will become more applicable as additional biosurveillance records are collected from continuing global media coverage.
This retrospective study highlights a number of factors that are important for developing an actionable event-based biosurveillance prospective methodology. Clearly defined inclusion criteria, validated on past outbreaks such as the 2009 influenza pandemic, are a basic component of such an approach. An established multiyear geospatial baseline of disease reporting will form the basis for quantifying anomaly. Lastly, continual quantification and assessment of the impact of different types of biosurveillance data and data sources upon system sensitivity, specificity and timeliness must be undertaken . An important goal is to make it possible to identify needed improvements in the operation of event-based biosurveillance systems, enabling desired performance targets to be achieved. The results of retrospective studies of event-based biosurveillance systems, in conjunction with lessons learnt from event detection and response, can be used to establish thresholds for early alerting of future pandemics, facilitating more timely official intervention and public health response.
We acknowledge the Argus staff of the Georgetown University Imaging Science and Information Systems Center for producing the reports that made this study possible. We thank Ronald A. Walters for his thorough review of the manuscript and valuable comments.
The Georgetown University authors acknowledge the financial support of the United States Government for this research. John S. Brownstein is supported by G08LM009776-01A2 from the National Library of Medicine, National Institutes of Health and a research grant from Google.org.
- Wilson K, Brownstein JS. Early detection of disease outbreaks using the Internet. CMAJ. 2009;180(8):829-31.
- Brownstein JS, Freifeld CC, Madoff LC. Influenza A (H1N1) virus, 2009--online monitoring. N Engl J Med. 2009;360(21):2156.
- Freifeld CC, Mandl KD, Reis BY, Brownstein JS. HealthMap: global infectious disease monitoring through automated classification and visualization of Internet media reports. J Am Med Inform Assoc. 2008;15(2):150-7.
- Brownstein JS, Freifeld CC, Reis BY, Mandl KD. Surveillance Sans Frontière: Internet-based emerging infectious disease intelligence and the HealthMap project. PLoS Med. 2008;5(7): e151.
- Hartley D. One year later: implementing the bio-surveillance requirements of the 9/11 Act. Statement by Dr David Hartley, Georgetown University Medical Center. Hearing before the Subcomm. on Emerging Threats, Cybersecurity, and Science and Technology of the House Comm. on Homeland Security, 110th Cong., 2nd Sess. (July 16, 2008). [Accessed 19 Jul 2010]. Available from: http://www.fas.org/irp/congress/2008_hr/biosurv.pdf
- McCallum A, Nigam K. A comparison of event models for naive Bayes text classification. In: AAAI/ICML-98 Workshop on learning for text categorization: 26–27 July, 1998, Madison, Wisconsin. p. 41-8. [Accessed 19 Jul, 2010]. Available from: http://www.cs.cmu.edu/~knigam/papers/multinomial-aaaiws98.pdf
- Collmann J, Robinson A. Designing ethical practice in biosurveillance: the project Argus doctrine. In: Castillo-Chavez C, Chen H, Lober WB, Thurmond M, Zeng D, editors. Infectious disease informatics and biosurveillance: research, systems and case studies. Springer. Forthcoming 2011.
- Wilson JM 5th, Polyak MG, Blake JW, Collmann J. A heuristic indication and warning staging model for detection and assessment of biological events. J Am Med Inform Assoc. 2008;15(2):158-71.
- McGrath JW. Biological impact of social disruption resulting from epidemic
disease. Am J Phys Anthropol. 1991;84(4):407-19.
- Walters R, Harlan P, Nelson NP, Hartley DM. Data sources for biosurveillance. In: Voeller J, editor. Wiley handbook of science and technology for homeland security. New York: John Wiley & Sons; 2010. p. 1-17.
- Centers for Disease Prevention and Control (CDC). CDC global health e-brief, building USG interagency collaboration through global health engagement. First quarter 2008. [Accessed 27 Jul 2010]. Available from: http://www.cdc.gov/washington/EGlobalHealthEditions/E-brief_first_quarter_2008.pdf
- R Development Core Team. A language and environment for statistical computing. Vienna: Foundation for Statistical Computing; 2010.
- Brownstein JS, Freifeld CC. HealthMap: the development of automated real-time internet surveillance for epidemic intelligence. Euro Surveill. 2007; 12(48). pii=3322. Available from: http://www.eurosurveillance.org/ViewArticle.aspx?ArticleId=3322
- Jon Cohen. Swine flu outbreak. Out of Mexico? Scientists ponder swine flu’s origins. Science. 2009;324(5928):700-2.
- Centers for Disease Control and Prevention (CDC). Swine influenza A (H1N1) infection in two children--southern California, March-April 2009. MMWR Morb Mortal Wkly Rep. 2009;58(15):400-2. Available from: http://www.cdc.gov/mmwr/preview/mmwrhtml/mm5815a5.htm
- World Health Organization (WHO). Changes in reporting requirements for pandemic (H1N1) 2009 virus infection. Pandemic (H1N1) 2009 briefing note 3 (revised), 16 July 2009. [Accessed 24 Jul 2009]. Available from: http://www.who.int/csr/disease/swineflu/notes/h1n1_surveillance_20090710/en/index.html
- Mandl KD, Overhage JM, Wagner MM, Lober WB, Sebastiani P, Mostashari F, et al. Implementing syndromic surveillance: a practical guide informed by the early experience. J Am Med Inform Assoc. 2004;11(2):141-50.
- Hartley DM, Nelson NP, Walters R, Arthur R, Yangarber R, Madoff L, et al. The landscape of international event-based biosurveillance. Emerging Health Threats Journal 2009, 3:e3 doi: 10.3134/ehtj.10.003. [Accessed 19 Jul 2010]. Available from: http://www.eht-forum.org/JournalStore/Journal