Research Open Access
Like 0



With growing amounts of data available, identification of clusters of persons linked to each other by transmission of an infectious disease increasingly relies on automated algorithms. We propose cluster finding to be a two-step process: first, possible transmission clusters are identified using a cluster algorithm, second, the plausibility that the identified clusters represent genuine transmission clusters is evaluated.


To introduce visual tools to assess automatically identified clusters.


We developed tools to visualise: (i) clusters found in dimensions of time, geographical location and genetic data; (ii) nested sub-clusters within identified clusters; (iii) intra-cluster pairwise dissimilarities per dimension; (iv) intra-cluster correlation between dimensions. We applied our tools to notified mumps cases in the Netherlands with available disease onset date (January 2009 – June 2016), geographical information (location of residence), and pathogen sequence data (n = 112). We compared identified clusters to clusters reported by the Netherlands Early Warning Committee (NEWC).


We identified five mumps clusters. Three clusters were considered plausible. One was questionable because, in phylogenetic analysis, genetic sequences related to it segregated in two groups. One was implausible with no smaller nested clusters, high intra-cluster dissimilarities on all dimensions, and low intra-cluster correlation between dimensions. The NEWC reports concurred with our findings: the plausible/questionable clusters corresponded to reported outbreaks; the implausible cluster did not.


Our tools for assessing automatically identified clusters allow outbreak investigators to rapidly spot plausible transmission clusters for mumps and other human-to-human transmissible diseases. This fast information processing potentially reduces workload.


Article metrics loading...

Loading full text...

Full text loading...



  1. Unkel S, Farrington CP, Garthwaite PH, Robertson C, Andrews N. Statistical methods for the prospective detection of infectious disease outbreaks: a review. J R Stat Soc Ser A Stat Soc. 2012;175(1):49-82.  https://doi.org/10.1111/j.1467-985X.2011.00714.x 
  2. Farrington CP, Andrews NJ, Beale AD, Catchpole MA. A Statistical Algorithm for the Early Detection of Outbreaks of Infectious Disease. J R Stat Soc Ser A Stat Soc. 1996;159(3):547-63.  https://doi.org/10.2307/2983331 
  3. Hutwagner LC, Maloney EK, Bean NH, Slutsker L, Martin SM. Using laboratory-based surveillance data for prevention: an algorithm for detecting Salmonella outbreaks. Emerg Infect Dis. 1997;3(3):395-400.  https://doi.org/10.3201/eid0303.970322  PMID: 9284390 
  4. Le Strat Y, Carrat F. Monitoring epidemiologic surveillance data using hidden Markov models. Stat Med. 1999;18(24):3463-78.  https://doi.org/10.1002/(SICI)1097-0258(19991230)18:24<3463::AID-SIM409>3.0.CO;2-I  PMID: 10611619 
  5. Stroup DF, Williamson GD, Herndon JL, Karon JM. Detection of aberrations in the occurrence of notifiable diseases surveillance data. Stat Med. 1989;8(3):323-9, discussion 331-2.  https://doi.org/10.1002/sim.4780080312  PMID: 2540519 
  6. Nobre FF, Stroup DF. A monitoring system to detect changes in public health surveillance data. Int J Epidemiol. 1994;23(2):408-18.  https://doi.org/10.1093/ije/23.2.408  PMID: 8082970 
  7. Stern L, Lightfoot D. Automated outbreak detection: a quantitative retrospective analysis. Epidemiol Infect. 1999;122(1):103-10.  https://doi.org/10.1017/S0950268898001939  PMID: 10098792 
  8. Bédubourg G, Le Strat Y. Evaluation and comparison of statistical methods for early temporal detection of outbreaks: A simulation-based study. PLoS One. 2017;12(7):e0181227.  https://doi.org/10.1371/journal.pone.0181227  PMID: 28715489 
  9. Salmon M, Schumacher D, Höhle M. Monitoring Count Time Series in R: Aberration Detection in Public Health Surveillance. J Stat Softw. 2016;70(10).  https://doi.org/10.18637/jss.v070.i10 
  10. Kulldorff M, Heffernan R, Hartman J, Assunção R, Mostashari F. A space-time permutation scan statistic for disease outbreak detection. PLoS Med. 2005;2(3):e59.  https://doi.org/10.1371/journal.pmed.0020059  PMID: 15719066 
  11. Watkins RE, Eagleson S, Veenendaal B, Wright G, Plant AJ. Disease surveillance using a hidden Markov model. BMC Med Inform Decis Mak. 2009;9(1):39.  https://doi.org/10.1186/1472-6947-9-39  PMID: 19664256 
  12. Hossain MM, Lawson AB. Space-time Bayesian small area disease risk models: development and evaluation with a focus on cluster detection. Environ Ecol Stat. 2010;17(1):73-95.  https://doi.org/10.1007/s10651-008-0102-z  PMID: 20953293 
  13. Ragonnet-Cronin M, Hodcroft E, Hué S, Fearnhill E, Delpech V, Brown AJ, et al. , UK HIV Drug Resistance Database. Automated analysis of phylogenetic clusters. BMC Bioinformatics. 2013;14(1):317.  https://doi.org/10.1186/1471-2105-14-317  PMID: 24191891 
  14. Bouckaert R, Heled J, Kühnert D, Vaughan T, Wu CH, Xie D, et al. BEAST 2: a software platform for Bayesian evolutionary analysis. PLOS Comput Biol. 2014;10(4):e1003537.  https://doi.org/10.1371/journal.pcbi.1003537  PMID: 24722319 
  15. Campbell F, Didelot X, Fitzjohn R, Ferguson N, Cori A, Jombart T. outbreaker2: a modular platform for outbreak reconstruction. BMC Bioinformatics. 2018;19(S11) Suppl 11;363.  https://doi.org/10.1186/s12859-018-2330-z  PMID: 30343663 
  16. Ypma RJ, Donker T, van Ballegooijen WM, Wallinga J. Finding evidence for local transmission of contagious disease in molecular epidemiological datasets. PLoS One. 2013;8(7):e69875.  https://doi.org/10.1371/journal.pone.0069875  PMID: 23922835 
  17. Donker T, Bosch T, Ypma RJ, Haenen AP, van Ballegooijen WM, Heck ME, et al. Monitoring the spread of meticillin-resistant Staphylococcus aureus in The Netherlands from a reference laboratory perspective. J Hosp Infect. 2016;93(4):366-74.  https://doi.org/10.1016/j.jhin.2016.02.022  PMID: 27105754 
  18. Cori A, Nouvellet P, Garske T, Bourhy H, Nakouné E, Jombart T. A graph-based evidence synthesis approach to detecting outbreak clusters: An application to dog rabies. PLOS Comput Biol. 2018;14(12):e1006554.  https://doi.org/10.1371/journal.pcbi.1006554  PMID: 30557340 
  19. Hetman BM, Mutschall SK, Thomas JE, Gannon VPJ, Clark CG, Pollari F, et al. The EpiQuant Framework for Computing Epidemiological Concordance of Microbial Subtyping Data. J Clin Microbiol. 2017;55(5):1334-49.  https://doi.org/10.1128/JCM.01945-16  PMID: 28202797 
  20. Keddy KH, Sooka A, Ismail H, Smith AM, Weber I, Letsoalo ME, et al. Molecular epidemiological investigation of a typhoid fever outbreak in South Africa, 2005: the relationship to a previous epidemic in 1993. Epidemiol Infect. 2011;139(8):1239-45.  https://doi.org/10.1017/S0950268810002207  PMID: 20875199 
  21. Yu IT, Li Y, Wong TW, Tam W, Chan AT, Lee JH, et al. Evidence of airborne transmission of the severe acute respiratory syndrome virus. N Engl J Med. 2004;350(17):1731-9.  https://doi.org/10.1056/NEJMoa032867  PMID: 15102999 
  22. Ford CB, Shah RR, Maeda MK, Gagneux S, Murray MB, Cohen T, et al. Mycobacterium tuberculosis mutation rate estimates from different lineages predict substantial differences in the emergence of drug-resistant tuberculosis. Nat Genet. 2013;45(7):784-90.  https://doi.org/10.1038/ng.2656  PMID: 23749189 
  23. Gire SK, Goba A, Andersen KG, Sealfon RS, Park DJ, Kanneh L, et al. Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak. Science. 2014;345(6202):1369-72.  https://doi.org/10.1126/science.1259657  PMID: 25214632 
  24. Hatherell H-A, Colijn C, Stagg HR, Jackson C, Winter JR, Abubakar I. Interpreting whole genome sequencing for investigating tuberculosis transmission: a systematic review. BMC Med. 2016;14(1):21.  https://doi.org/10.1186/s12916-016-0566-x  PMID: 27005433 
  25. Ware C. Information Visualization: Perception for Design. Elsevier Science; 2004.
  26. Jansen VA, Stollenwerk N, Jensen HJ, Ramsay ME, Edmunds WJ, Rhodes CJ. Measles outbreaks in a population with declining vaccine uptake. Science. 2003;301(5634):804.  https://doi.org/10.1126/science.1086726  PMID: 12907792 
  27. De Serres G, Gay NJ, Farrington CP. Epidemiology of transmissible diseases after elimination. Am J Epidemiol. 2000;151(11):1039-48, discussion 1049-52.  https://doi.org/10.1093/oxfordjournals.aje.a010145  PMID: 10873127 
  28. Sane J, Gouma S, Koopmans M, de Melker H, Swaan C, van Binnendijk R, et al. Epidemic of mumps among vaccinated persons, The Netherlands, 2009-2012. Emerg Infect Dis. 2014;20(4):643-8.  https://doi.org/10.3201/eid2004.131681  PMID: 24655811 
  29. Gouma S, Sane J, Gijselaar D, Cremer J, Hahné S, Koopmans M, et al. Two major mumps genotype G variants dominated recent mumps outbreaks in the Netherlands (2009-2012). J Gen Virol. 2014;95(Pt 5):1074-82.  https://doi.org/10.1099/vir.0.062943-0  PMID: 24603524 
  30. Ladbury G, Ostendorf S, Waegemaekers T, van Binnendijk R, Boot H, Hahne S. Smoking and older age associated with mumps in an outbreak in a group of highly-vaccinated individuals attending a youth club party, the Netherlands, 2012. Euro Surveill. 2014;19(16):20776. .https://www.ncbi.nlm.nih.gov/pubmed/24786261 https://doi.org/10.2807/1560-7917.ES2014.19.16.20776  PMID: 24786261 
  31. Greenland K, Whelan J, Fanoy E, Borgert M, Hulshof K, Yap KB, et al. Mumps outbreak among vaccinated university students associated with a large party, the Netherlands, 2010. Vaccine. 2012;30(31):4676-80.  https://doi.org/10.1016/j.vaccine.2012.04.083  PMID: 22579874 
  32. Whelan J, van Binnendijk R, Greenland K, Fanoy E, Khargi M, Yap K, et al. Ongoing mumps outbreak in a student population with high vaccination coverage, Netherlands, 2010. Euro Surveill. 2010;15(17):19554.  https://doi.org/10.2807/ese.15.17.19554-en  PMID: 20460086 
  33. Gouma S, Cremer J, Parkkali S, Veldhuijzen I, van Binnendijk RS, Koopmans MPG. Mumps virus F gene and HN gene sequencing as a molecular tool to study mumps virus transmission. Infect Genet Evol. 2016;45:145-50. https://www.ncbi.nlm.nih.gov/pubmed/?term=Mumps+virus+F+gene+and+HN+gene+sequencing+as+a+molecular+tool+to+study+mumps+virus+transmission.+Infection%2C+genetics+and+evolution%3A+journal+of+molecular+epidemiology+and+evolutionary+genetics+in+infectious+diseases https://doi.org/10.1016/j.meegid.2016.08.033  PMID: 27590714 
  34. Monge S, Benschop K, Soetens L, Pijnacker R, Hahné S, Wallinga J, et al. Echovirus type 6 transmission clusters and the role of environmental surveillance in early warning, the Netherlands, 2007 to 2016. Euro Surveill. 2018;23(45):1800288.  https://doi.org/10.2807/1560-7917.ES.2018.23.45.1800288  PMID: 30424830 
  35. Jin L, Örvell C, Myers R, Rota PA, Nakayama T, Forcic D, et al. Genomic diversity of mumps virus and global distribution of the 12 genotypes. Rev Med Virol. 2015;25(2):85-101.  https://doi.org/10.1002/rmv.1819  PMID: 25424978 
  36. McGill R, Tukey JW, Larsen WA. Variations of Box Plots. Am Stat. 1978;32(1):12-6.
  37. Best DJ, Roberts DE. Algorithm AS 89: The Upper Tail Probabilities of Spearman’s Rho. J R Stat Soc Ser C Appl Stat. 1975;24(3):377-9.
  38. R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria, R Foundation for Statistical Computing, 2017.
  39. Jenkins GM, Rambaut A, Pybus OG, Holmes EC. Rates of molecular evolution in RNA viruses: a quantitative phylogenetic analysis. J Mol Evol. 2002;54(2):156-65.  https://doi.org/10.1007/s00239-001-0064-3  PMID: 11821909 

Data & Media loading...

Supplementary data

Submit comment
Comment moderation successfully completed
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error