Research Open Access
Like 0



Whole genome sequencing (WGS) is a reliable tool for studying tuberculosis (TB) transmission. WGS data are usually processed by custom-built analysis pipelines with little standardisation between them.


To compare the impact of variability of several WGS analysis pipelines used internationally to detect epidemiologically linked TB cases.


From the Netherlands, 535 complex (MTBC) strains from 2016 were included. Epidemiological information obtained from municipal health services was available for all mycobacterial interspersed repeat unit-variable number of tandem repeat (MIRU-VNTR) clustered cases. WGS data was analysed using five different pipelines: one core genome multilocus sequence typing (cgMLST) approach and four single nucleotide polymorphism (SNP)-based pipelines developed in Oxford, United Kingdom; Borstel, Germany; Bilthoven, the Netherlands and Copenhagen, Denmark. WGS clusters were defined using a maximum pairwise distance of 12 SNPs/alleles.


The cgMLST approach and Oxford pipeline clustered all epidemiologically linked cases, however, in the other three SNP-based pipelines one epidemiological link was missed due to insufficient coverage. In general, the genetic distances varied between pipelines, reflecting different clustering rates: the cgMLST approach clustered 92 cases, followed by 84, 83, 83 and 82 cases in the SNP-based pipelines from Copenhagen, Oxford, Borstel and Bilthoven respectively.


Concordance in ruling out epidemiological links was high between pipelines, which is an important step in the international validation of WGS data analysis. To increase accuracy in identifying TB transmission clusters, standardisation of crucial WGS criteria and creation of a reference database of representative MTBC sequences would be advisable.


Article metrics loading...

Loading full text...

Full text loading...



  1. van Embden JDA, Cave MD, Crawford JT, Dale JW, Eisenach KD, Gicquel B, et al. Strain identification of Mycobacterium tuberculosis by DNA fingerprinting: recommendations for a standardized methodology. J Clin Microbiol. 1993;31(2):406-9. PMID: 8381814 
  2. Supply P, Allix C, Lesjean S, Cardoso-Oelemann M, Rüsch-Gerdes S, Willery E, et al. Proposal for standardization of optimized mycobacterial interspersed repetitive unit-variable-number tandem repeat typing of Mycobacterium tuberculosis. J Clin Microbiol. 2006;44(12):4498-510.  https://doi.org/10.1128/JCM.01392-06  PMID: 17005759 
  3. Kamerbeek J, Schouls L, Kolk A, van Agterveld M, van Soolingen D, Kuijper S, et al. Simultaneous detection and strain differentiation of Mycobacterium tuberculosis for diagnosis and epidemiology. J Clin Microbiol. 1997;35(4):907-14. PMID: 9157152 
  4. Lipworth S, Jajou R, de Neeling A, Bradley P, van der Hoek W, Maphalala G, et al. SNP-IT tool for identifying subspecies and associated lineages of Mycobacterium tuberculosis complex. Emerg Infect Dis. 2019;25(3):482-8.  https://doi.org/10.3201/eid2503.180894  PMID: 30789126 
  5. Coll F, McNerney R, Guerra-Assunção JA, Glynn JR, Perdigão J, Viveiros M, et al. A robust SNP barcode for typing Mycobacterium tuberculosis complex strains. Nat Commun. 2014;5(1):4812.  https://doi.org/10.1038/ncomms5812  PMID: 25176035 
  6. Allix-Béguec C, Arandjelovic I, Bi L, Beckert P, Bonnet M, Bradley P, et al. Prediction of susceptibility to first-line tuberculosis drugs by DNA sequencing. N Engl J Med. 2018;379(15):1403-15.  https://doi.org/10.1056/NEJMoa1800474  PMID: 30280646 
  7. Ezewudo M, Borens A, Chiner-Oms Á, Miotto P, Chindelevitch L, Starks AM, et al. Integrating standardized whole genome sequence analysis with a global Mycobacterium tuberculosis antibiotic resistance knowledgebase. Sci Rep. 2018;8(1):15382.  https://doi.org/10.1038/s41598-018-33731-1  PMID: 30337678 
  8. Gardy JL, Johnston JC, Ho Sui SJ, Cook VJ, Shah L, Brodkin E, et al. Whole-genome sequencing and social-network analysis of a tuberculosis outbreak. N Engl J Med. 2011;364(8):730-9.  https://doi.org/10.1056/NEJMoa1003176  PMID: 21345102 
  9. Bjorn-Mortensen K, Soborg B, Koch A, Ladefoged K, Merker M, Lillebaek T, et al. Tracing Mycobacterium tuberculosis transmission by whole genome sequencing in a high incidence setting: a retrospective population-based study in East Greenland. Sci Rep. 2016;6(1):33180.  https://doi.org/10.1038/srep33180  PMID: 27615360 
  10. Ford C, Yusim K, Ioerger T, Feng S, Chase M, Greene M, et al. Mycobacterium tuberculosis--heterogeneity revealed through whole genome sequencing. Tuberculosis (Edinb). 2012;92(3):194-201.  https://doi.org/10.1016/j.tube.2011.11.003  PMID: 22218163 
  11. Gurjav U, Outhred AC, Jelfs P, McCallum N, Wang Q, Hill-Cawthorne GA, et al. Whole Genome Sequencing demonstrates limited transmission within identified Mycobacterium tuberculosis clusters in New South Wales, Australia. PLoS One. 2016;11(10):e0163612.  https://doi.org/10.1371/journal.pone.0163612  PMID: 27737005 
  12. Luo T, Yang C, Peng Y, Lu L, Sun G, Wu J, et al. Whole-genome sequencing to detect recent transmission of Mycobacterium tuberculosis in settings with a high burden of tuberculosis. Tuberculosis (Edinb). 2014;94(4):434-40.  https://doi.org/10.1016/j.tube.2014.04.005  PMID: 24888866 
  13. Nikolayevskyy V, Kranzer K, Niemann S, Drobniewski F. Whole genome sequencing of Mycobacterium tuberculosis for detection of recent transmission and tracing outbreaks: A systematic review. Tuberculosis (Edinb). 2016;98:77-85.  https://doi.org/10.1016/j.tube.2016.02.009  PMID: 27156621 
  14. Roetzer A, Diel R, Kohl TA, Rückert C, Nübel U, Blom J, et al. Whole genome sequencing versus traditional genotyping for investigation of a Mycobacterium tuberculosis outbreak: a longitudinal molecular epidemiological study. PLoS Med. 2013;10(2):e1001387.  https://doi.org/10.1371/journal.pmed.1001387  PMID: 23424287 
  15. Yang C, Luo T, Shen X, Wu J, Gan M, Xu P, et al. Transmission of multidrug-resistant Mycobacterium tuberculosis in Shanghai, China: a retrospective observational study using whole-genome sequencing and epidemiological investigation. Lancet Infect Dis. 2017;17(3):275-84.  https://doi.org/10.1016/S1473-3099(16)30418-2  PMID: 27919643 
  16. Folkvardsen DB, Norman A, Andersen AB, Michael Rasmussen E, Jelsbak L, Lillebaek T. Genomic epidemiology of a major Mycobacterium tuberculosis outbreak: retrospective cohort study in a low-incidence setting using sparse time-series sampling. J Infect Dis. 2017;216(3):366-74.  https://doi.org/10.1093/infdis/jix298  PMID: 28666374 
  17. Walker TM, Ip CLC, Harrell RH, Evans JT, Kapatai G, Dedicoat MJ, et al. Whole-genome sequencing to delineate Mycobacterium tuberculosis outbreaks: a retrospective observational study. Lancet Infect Dis. 2013;13(2):137-46.  https://doi.org/10.1016/S1473-3099(12)70277-3  PMID: 23158499 
  18. Tagliani E, Cirillo DM, Ködmön C, van der Werf MJ, Anthony R, van Soolingen D, et al. EUSeqMyTB to set standards and build capacity for whole genome sequencing for tuberculosis in the EU. Lancet Infect Dis. 2018;18(4):377.  https://doi.org/10.1016/S1473-3099(18)30132-4  PMID: 29582760 
  19. Kohl TA, Diel R, Harmsen D, Rothgänger J, Walter KM, Merker M, et al. Whole-genome-based Mycobacterium tuberculosis surveillance: a standardized, portable, and expandable approach. J Clin Microbiol. 2014;52(7):2479-86.  https://doi.org/10.1128/JCM.00567-14  PMID: 24789177 
  20. Kohl TA, Harmsen D, Rothgänger J, Walker T, Diel R, Niemann S. Harmonized genome wide typing of tubercle bacilli using a web-based Gene-by-Gene nomenclature system. EBioMedicine. 2018;34:131-8.  https://doi.org/10.1016/j.ebiom.2018.07.030  PMID: 30115606 
  21. de Beer JL, Akkerman OW, Schürch AC, Mulder A, van der Werf TS, van der Zanden AG, et al. Optimization of standard in-house 24-locus variable-number tandem-repeat typing for Mycobacterium tuberculosis and its direct application to clinical material. J Clin Microbiol. 2014;52(5):1338-42.  https://doi.org/10.1128/JCM.03436-13  PMID: 24501023 
  22. Jajou R, de Neeling A, van Hunen R, de Vries G, Schimmel H, Mulder A, et al. Epidemiological links between tuberculosis cases identified twice as efficiently by whole genome sequencing than conventional molecular typing: A population-based study. PLoS One. 2018;13(4):e0195413.  https://doi.org/10.1371/journal.pone.0195413  PMID: 29617456 
  23. National Institute for Public Health and the Environment (RIVM). Osiris-NTR Tuberculose ziekte Vragenlijst en handleiding. [Osiris-NTR Tuberculosis disease Questionnaire and manual]. Bilthoven: RIVM. [Accessed 3 Dec 2019]. Dutch. Available at: https://www.rivm.nl/sites/default/files/2019-09/Osiris-NTR%20Ziekte%20vragenlijst%202019.pdf
  24. Feuerriegel S, Schleusener V, Beckert P, Kohl TA, Miotto P, Cirillo DM, et al. PhyResSE: a Web Tool Delineating Mycobacterium tuberculosis Antibiotic Resistance and Lineage from Whole-Genome Sequencing Data. J Clin Microbiol. 2015;53(6):1908-14.  https://doi.org/10.1128/JCM.00025-15  PMID: 25854485 
  25. National Institute for Public Health and the Environment (RIVM). Tuberculose kerncijfers 2016. [Tuberculosis keypoints 2016]. Bilthoven: RIVM. [Accessed 3 Dec 2019]. Dutch. Available at: https://www.rivm.nl/sites/default/files/2018-11/Tuberculose%20Kerncijfers%202016%20document%20website%2017032017.pdf
  26. Kohl TA, Utpatel C, Schleusener V, De Filippo MR, Beckert P, Cirillo DM, et al. MTBseq: a comprehensive pipeline for whole genome sequence analysis of Mycobacterium tuberculosis complex isolates. PeerJ. 2018;6:e5895.  https://doi.org/10.7717/peerj.5895  PMID: 30479891 
  27. Deatherage DE, Barrick JE. Identification of mutations in laboratory-evolved microbes from next-generation sequencing data using breseq. Methods Mol Biol. 2014;1151:165-88.  https://doi.org/10.1007/978-1-4939-0554-6_12  PMID: 24838886 
  28. Lunter G, Goodson M. Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res. 2011;21(6):936-9.  https://doi.org/10.1101/gr.111120.110  PMID: 20980556 
  29. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078-9.  https://doi.org/10.1093/bioinformatics/btp352  PMID: 19505943 
  30. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754-60.  https://doi.org/10.1093/bioinformatics/btp324  PMID: 19451168 
  31. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20(9):1297-303.  https://doi.org/10.1101/gr.107524.110  PMID: 20644199 
  32. Comas I, Chakravartti J, Small PM, Galagan J, Niemann S, Kremer K, et al. Human T cell epitopes of Mycobacterium tuberculosis are evolutionarily hyperconserved. Nat Genet. 2010;42(6):498-503.  https://doi.org/10.1038/ng.590  PMID: 20495566 
  33. Warren RM, Victor TC, Streicher EM, Richardson M, Beyers N, Gey van Pittius NC, et al. Patients with active tuberculosis often have different strains in the same sputum specimen. Am J Respir Crit Care Med. 2004;169(5):610-4.  https://doi.org/10.1164/rccm.200305-714OC  PMID: 14701710 
  34. Pang Y, Zhou Y, Wang S, Song Y, Ou X, Zhao B, et al. Prevalence and risk factors of mixed Mycobacterium tuberculosis complex infections in China. J Infect. 2015;71(2):231-7.  https://doi.org/10.1016/j.jinf.2015.03.012  PMID: 25936744 
  35. World Health Organization (WHO). Global tuberculosis report 2018. Geneva: WHO; 2018. Available from: https://apps.who.int/iris/bitstream/handle/10665/274453/9789241565646-eng.pdf?ua=1
  36. Wyllie DH, Robinson E, Peto T, Crook DW, Ajileye A, Rathod P, et al. Identifying mixed Mycobacterium tuberculosis infection and laboratory cross-contamination during mycobacterial sequencing programs. J Clin Microbiol. 2018;56(11):e00923-18. . Available from: https://doi.org/10 .1128/JCM.00923-18 https://doi.org/10.1128/JCM.00923-18  PMID: 30209183 
  37. Sobkowiak B, Glynn JR, Houben RMGJ, Mallard K, Phelan JE, Guerra-Assunção JA, et al. Identifying mixed Mycobacterium tuberculosis infections from whole genome sequence data. BMC Genomics. 2018;19(1):613.  https://doi.org/10.1186/s12864-018-4988-z  PMID: 30107785 
  38. Dixit A, Freschi L, Vargas R, Calderon R, Sacchettini J, Drobniewski F, et al. Whole genome sequencing identifies bacterial factors affecting transmission of multidrug-resistant tuberculosis in a high-prevalence setting. Sci Rep. 2019;9(1):5602.  https://doi.org/10.1038/s41598-019-41967-8  PMID: 30944370 
  39. Biek R, Pybus OG, Lloyd-Smith JO, Didelot X. Measurably evolving pathogens in the genomic era. Trends Ecol Evol. 2015;30(6):306-13.  https://doi.org/10.1016/j.tree.2015.03.009  PMID: 25887947 
  40. Campbell F, Didelot X, Fitzjohn R, Ferguson N, Cori A, Jombart T. outbreaker2: a modular platform for outbreak reconstruction. BMC Bioinformatics. 2018;19(S11) Suppl 11;363.  https://doi.org/10.1186/s12859-018-2330-z  PMID: 30343663 
  41. Didelot X, Gardy J, Colijn C. Bayesian inference of infectious disease transmission from whole-genome sequence data. Mol Biol Evol. 2014;31(7):1869-79.  https://doi.org/10.1093/molbev/msu121  PMID: 24714079 
  42. Stimson J, Gardy J, Mathema B, Crudu V, Cohen T, Colijn C. Beyond the SNP threshold: Identifying outbreak clusters using inferred transmissions. Mol Biol Evol. 2019;36(3):587-603.  https://doi.org/10.1093/molbev/msy242  PMID: 30690464 
  43. Lambregts-van Weezenbeek CSB, Sebek MMGG, van Gerven PJHJ, de Vries G, Verver S, Kalisvaart NA, et al. Tuberculosis contact investigation and DNA fingerprint surveillance in The Netherlands: 6 years’ experience with nation-wide cluster feedback and cluster monitoring. Int J Tuberc Lung Dis. 2003;7(12) Suppl 3;S463-70. PMID: 14677839 
  44. Bryant JM, Schürch AC, van Deutekom H, Harris SR, de Beer JL, de Jager V, et al. Inferring patient to patient transmission of Mycobacterium tuberculosis from whole genome sequencing data. BMC Infect Dis. 2013;13(1):110.  https://doi.org/10.1186/1471-2334-13-110  PMID: 23446317 
  45. Walker TM, Lalor MK, Broda A, Ortega LS, Morgan M, Parker L, et al. Assessment of Mycobacterium tuberculosis transmission in Oxfordshire, UK, 2007-12, with whole pathogen genome sequences: an observational study. Lancet Respir Med. 2014;2(4):285-92.  https://doi.org/10.1016/S2213-2600(14)70027-X  PMID: 24717625 

Data & Media loading...

Supplementary data

Submit comment
Comment moderation successfully completed
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error