Epitweetr: Early warning of public health threats using Twitter data

Background The European Centre for Disease Prevention and Control (ECDC) systematically collates information from sources to rapidly detect early public health threats. The lack of a freely available, customisable and automated early warning tool using data from Twitter prompted the ECDC to develop epitweetr, which collects, geolocates and aggregates tweets generating signals and email alerts. Aim This study aims to compare the performance of epitweetr to manually monitoring tweets for the purpose of early detecting public health threats. Methods We calculated the general and specific positive predictive value (PPV) of signals generated by epitweetr between 19 October and 30 November 2020. Sensitivity, specificity, timeliness and accuracy and performance of tweet geolocation and signal detection algorithms obtained from epitweetr and the manual monitoring of 1,200 tweets were compared. Results The epitweetr geolocation algorithm had an accuracy of 30.1% at national, and 25.9% at subnational levels. The signal detection algorithm had 3.0% general PPV and 74.6% specific PPV. Compared to manual monitoring, epitweetr had greater sensitivity (47.9% and 78.6%, respectively), and reduced PPV (97.9% and 74.6%, respectively). Median validation time difference between 16 common events detected by epitweetr and manual monitoring was -48.6 hours (IQR: −102.8 to −23.7). Conclusion Epitweetr has shown sufficient performance as an early warning tool for public health threats using Twitter data. Since epitweetr is a free, open-source tool with configurable settings and a strong automated component, it is expected to increase in usability and usefulness to public health experts.


Supplement S2. Details of epitweetr signal detection algorithm
The algorithm is applied on the counts from the past seven 24-hour blocks prior to the current 24hour block of the signal detection. The running mean and the running standard deviation are calculated: where yt,t = …, −2, −1, 0 denotes the observed count data time series with time index 0 denoting the current block. Furthermore, the time index −7, …, −1 denote the seven blocks prior to the current block.
Under the null hypothesis of no spikes, it is assumed that the yt are identically and independently N(μ,σ 2 ) distributed with unknown mean μ and unknown variance σ 2 . Hence, the upper limit of a simple one-sided (1− α)× 100% plug-in prediction interval for y0 based on y−7, …, y−1 is given as where z1−α is the (1 − α)-quantile of the standard normal distribution. An alert is raised if y0>U0 . Using α=0.025, it corresponds to investigating if y0 exceeds the estimate for the mean plus 1.96 times the standard deviation. However, as pointed out by Allévius and Höhle (2017), the correct approach would be to compare the observation to the upper limit of a two-sided 95% prediction interval for y0, because this considers both the sampling variation of a new observation and the uncertainty originating from the parameter estimation of the mean and variance. Hence, the statistical appropriate form is to compute the upper limit by where t1−α(k−1) denotes the 1 − α quantile of the t-distribution with k − 1 degrees of freedom.
If previous signals are included without modification in the historic values when calculating the running mean and standard deviation for the signal detection, then the estimated mean and standard deviation might become too large. This may mean that important current signals will not be detected.
To address this issue, epitweetr downweights previous signals, such that the mean and standard deviation estimation is adjusted for such outliers using an approach similar to that used in the Farrington et al. (1996). Historic values that are not identified as previous signals are given a weight of "1". Similarly, historic values identified as signals are given a weight lower than one and a new fit is performed using these weights. Details on the downweighting procedure can be found in Annex I of this user documentation.
Signal detection is carried out based on "days", which are moving windows of 24 hours, moving according to the detect span. The baseline is calculated on these "days" from -1 to -7, considering the current "day" as zero.
A key attribute of signal detection is the ability of an algorithm to detect true threats or events without overloading the investigators with too many false positives. In this way, the alpha parameter determines the threshold of the detection interval. If the alpha is high, then more potential signals are generated and if the alpha is low fewer potential signals are generated (but potential threats or events could be missed). The setting of the alpha is often done empirically and depends also on the resources of those investigating the signals and the importance of missing a potential threat or event.
Currently, this attribute is merely statistically based only considering previous parameters explained in the algorithm.
To account for multiple testing, a Bonferroni correction for multiple testing can be applied. For country-specific signal detection, as a default, the alpha is divided by the number of countries. For continent-specific signal detection, the alpha is divided by the number continents.

Supplement S3. Additional methods: epitweetr evaluation
The manual monitoring (hereafter referred to as the manual method) consisted of screening twice a day most recent tweets posted by a list of over 100 validated Twitter users followed by ECDC EI team. In this method, we defined a signal as a tweet that fulfils ECDC criteria and required further action (e.g., validation of the information). The time of this tweet was recorded as the signal time and, in case several Twitter accounts were tweeting about the same topic in the same screening round, the earliest set the signal time.
The epitweetr screening consisted of screening twice a day email alerts (i.e., unexpected increase in the number of tweets by topic, location and time) sent by epitweetr at approximately 4.30 Central European Time (CET) and 13.30 CET. In this method, we defined a signal as an alert for a specific topic, location and time which top words and other information included in the email suggested it fulfilled ECDC criteria. The time of the earliest alert of the day was recorded as the signal time.
In both epitweetr and the manual method, an event was a validated signal deemed trustful and reliable by an official source. The approximate time at which this validation was done was recorded as the event time, both in case of positive and negative validation.
Two EI experts screened during alternate weeks Twitter data twice a day using both methods and recorded signals and events detected by each method until the minimum sample size was achieved.
Considering there is no prior estimate available for the sensitivity and specificity and having a maximal marginal error of 0.15 for sensitivity and specificity, we defined the minimum sample size as: where n is the minimum sample size, z is the 97·5% percentile of the standard normal distribution, and d is the maximal marginal error.
In order to achieve a minimum sample size, a time period with the following criteria was selected: at least 43 different events found by any of the methods, at least 43 signals found by each method and at least 10 events found by both methods.
Since it is difficult to evaluate the classification accuracy of the generated events by the two methods, because no independent gold standard exists and there is no available information on all events that should be detected by both methods, we used instead an inter-rater agreement (IRA) between the two methods as a relative definition of sensitivity 1 . We defined the IRA of the manual method ( ) and the IRA of epitweetr ( ), with their 95% confidence interval (CI), as: where a was the number of events detected by both methods, b was the number of events detected only by epitweetr, and c the number of events detected only by the manual method.
Since the estimation of the specificity was not feasible in this context, we calculated the PPV as the proportion of signals corresponding to a validated event. We defined the manual method PPV ( ) and the epitweetr PPV ( ), with their 95% CI, as: