HealthMap: the development of automated real-time internet surveillance for epidemic intelligence
1. Children’s Hospital Informatics Program at the Harvard-MIT Division of Health Sciences and Technology, Boston, Massachusetts, United States
2. Division of Emergency Medicine, Children’s Hospital Boston, Boston, Massachusetts, United States
3. Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, United States
With the recent entry into force of the new International Health Regulations (IHR 2005), there is still significant concern as to whether broad compliance will be feasible given the challenges associated with reporting mechanisms and multilateral coordination . Despite advances in indicator-based public health surveillance, lack of technical infrastructure and trained public health professionals is a significant roadblock in many parts of the world [2-4]. To help fill this gap, free or low-cost sources of event-based information, including Internet news outlets and online discussion sites, can provide detailed local and near real-time data on disease outbreaks [3-10]. Event-based surveillance now represents a critical source of epidemic intelligence – almost all major outbreaks investigated by the World Health Organization (WHO) are first identified through such sources [3-5]. However, most of this information is dispersed and largely unstructured, precluding an easily obtained global view of all ongoing disease threats. In order to construct such an integrated view of emerging infections, we developed HealthMap, a free multi-stream real-time knowledge management system that aggregates and maps health alerts across numerous key data sources .
The HealthMap project
HealthMap (http://www.healthmap.org) is a web-based system designed to collect and visualise outbreak data according to geography, time, and infectious disease agent [12-14]. The system provides a starting point for real-time information on a broad range of emerging infectious diseases and has particular use for public health officials and international travelers. HealthMap thus serves to bring structure to an information flow that would otherwise be overwhelming to the user or obscure important and urgent elements. A freely available website operating since September 2006, HealthMap currently receives approximately 15,000 unique visitors per month from around the world. It is cited as a resource on the websites of, among others, the United Nations, the United States’ (US) Food and Drug Administration, Italy’s national epidemiology agency (Centro Nazionale di Epidemiologia, Sorveglianza e Promozione della Salute) and many library websites and university course materials. It has also been featured in a number of mainstream media publications, including Wired News and Science, indicating the broad utility of such a system, extending beyond public health practice [12,13]. Based on our usage tracking and subscriptions to our email announcement list, we infer that our most avid users tend to come from government-related domains, including the WHO, the US’ Centers for Disease Control (CDC), the European Centre for Disease Prevention and Control (ECDC, and other national, state and local bodies around the world. A number of organisations use the HealthMap data stream for day-to-day surveillance activities ranging from local health departments to national organisations such as US Health and Human Services and the US Department of Defense.
The system integrates outbreak data from a variety of electronic sources: online news wires, Really Simple Syndication (RSS) feeds, expert-curated accounts (ProMED Mail), and validated official alerts (WHO). We also use Eurosurveillance, as it provides near real-time RSS publishing of multi-national outbreak news and often features research not found in other sources. Through this multi-stream approach, we have begun to achieve a unified and comprehensive view of current global infectious disease outbreaks. Data is acquired automatically every hour and characterised via text mining to determine the disease category and location of the outbreak. Currently alerts are geocoded to the country scale with province-, state-, or city-level resolution for select countries. Improved geographic resolution is a critical area of future development. Surveillance is multi-lingual and includes English, Spanish and French; other languages, including Portuguese, Chinese, and Russian, are in progress.
Once collected, the data are aggregated by source, disease and geographic location, and overlaid on an interactive map for user-friendly access to the original report. HealthMap also addresses the computational challenges of integrating multiple sources of unstructured information by generating “meta-alerts” colour-coded based on the reliability of the data sources and volume of reports. While we collect a broad range of information relating to infectious disease outbreaks, not all information will be relevant for users. We are especially concerned with limiting information overload and providing focused news of immediate interest. Thus, after we have categorised location and disease, we apply article category tags for improved filtering. Our primary tags include:
- Breaking News: newly discovered outbreak;
- Warning: initial concerns of disease emergence, for example, in a conflict zone or natural disaster area;
- Follow-up: reference to a past, already known outbreak;
- Background/Context: information on disease context, such as vaccination campaigns, preparedness planning, research results and
- Not Disease Related: information not relating to any disease or health condition
Tags 2-5 are filtered from the display. We also remove duplicate articles by calculating a similarity score based on text and category matching. Finally, in addition to providing mapped content, each alert is linked to a “Related Information” window that provides details on articles with similar content as well as recent reports concerning either the same disease or location, and links for further research (for example, the WHO, CDC, PubMED, Wikipedia etc).
HealthMap currently processes 133.5 disease alerts per day on average (95% Confidence Interval: 124.1-142.8), with approximately 50% categorised as Breaking News (65.3 reports/day). With a 30-day default window, the system may display over 800 Breaking News alerts on a given day. As of 20 November 2007, HealthMap had processed over 35,749 alerts across 171 disease categories and 202 countries or semi-autonomous/overseas territories since it was launched. The majority of alerts come from news media (92.8%), followed by ProMED reports (6.5%).
We currently have an extensive plan to address technological and methodological barriers to achieving a timely, synthesised, comprehensive and customised view of global health. Improvement of HealthMap will proceed along the four key components of surveillance: (1) data acquisition, (2) information characterization, (3) signal interpretation, and (4) knowledge dissemination. While the automation of the system provides many advantages, HealthMap would be significantly enhanced by human analysis, as demonstrated by the success of GPHIN  and ProMED . Taking inspiration from this model, we plan to implement collaborative networks as a mechanism for alert acquisition and classification. Our recent collaboration with ProMED (http://www.healthmap.org/promed) will help to pave the way for a bidirectional system of classification and moderation of information flow . We also plan to conduct rigorous evaluation and user testing to ensure that HealthMap successfully uses informal sources for epidemic intelligence. Our ultimate goal is that HealthMap serves as a vital freely available resource to both the public health community and the general public.