Research Open Access
Like 0



Improving the surveillance of tuberculosis (TB) is especially important for multidrug-resistant (MDR) and extensively drug-resistant (XDR) TB. The large amount of publicly available whole genome sequencing (WGS) data for TB gives us the chance to re-use data and to perform additional analyses at a large scale.


We assessed the usefulness of raw WGS data of global MDR/XDR isolates available from public repositories to improve TB surveillance.


We extracted raw WGS data and the related metadata of isolates available from the Sequence Read Archive. We compared this public dataset with WGS data and metadata of 131 MDR- and XDR isolates from Germany in 2012 and 2013.


We aggregated a dataset that included 1,081 MDR and 250 XDR isolates among which we identified 133 molecular clusters. In 16 clusters, the isolates were from at least two different countries. For example, Cluster 2 included 56 MDR/XDR isolates from Moldova, Georgia and Germany. When comparing the WGS data from Germany with the public dataset, we found that 11 clusters contained at least one isolate from Germany and at least one isolate from another country. We could, therefore, connect TB cases despite missing epidemiological information.


We demonstrated the added value of using WGS raw data from public repositories to contribute to TB surveillance. Comparing the German with the public dataset, we identified potential international transmission events. Thus, using this approach might support the interpretation of national surveillance results in an international context.


Article metrics loading...

Loading full text...

Full text loading...



  1. Matteelli A, Rendon A, Tiberi S, Al-Abri S, Voniatis C, Carvalho ACC, et al. Tuberculosis elimination: where are we now? Eur Respir Rev. 2018;27(148):180035.  https://doi.org/10.1183/16000617.0035-2018  PMID: 29898905 
  2. World Health Organization (WHO). Global tuberculosis report 2019. Geneva: WHO; 2019. Available from: https://apps.who.int/iris/bitstream/handle/10665/329368/9789241565714-eng.pdf?ua=1
  3. Roetzer A, Diel R, Kohl TA, Rückert C, Nübel U, Blom J, et al. Whole genome sequencing versus traditional genotyping for investigation of a Mycobacterium tuberculosis outbreak: a longitudinal molecular epidemiological study. PLoS Med. 2013;10(2):e1001387.  https://doi.org/10.1371/journal.pmed.1001387  PMID: 23424287 
  4. Hatherell H-A, Colijn C, Stagg HR, Jackson C, Winter JR, Abubakar I. Interpreting whole genome sequencing for investigating tuberculosis transmission: a systematic review. BMC Med. 2016;14(1):21.  https://doi.org/10.1186/s12916-016-0566-x  PMID: 27005433 
  5. European Centre for Disease Prevention and Control (ECDC). Molecular typing for surveillance of multidrug-resistant tuberculosis in the EU/EEA – March 2017. Stockholm: ECDC; 2017. Available from: https://www.ecdc.europa.eu/sites/default/files/documents/MDR-TB-molecular-typing-surveillance-mar-2017_1.pdf
  6. Wyllie DH, Davidson JA, Grace Smith E, Rathod P, Crook DW, Peto TEA, et al. A quantitative evaluation of MIRU-VNTR typing against whole-genome sequencing for identifying Mycobacterium tuberculosis transmission: a prospective observational cohort study. EBioMedicine. 2018;34:122-30.  https://doi.org/10.1016/j.ebiom.2018.07.019  PMID: 30077721 
  7. van der Werf MJ, Ködmön C. Whole-genome sequencing as tool for investigating international tuberculosis outbreaks: a systematic review. Front Public Health. 2019;7:87.  https://doi.org/10.3389/fpubh.2019.00087  PMID: 31058125 
  8. Muir P, Li S, Lou S, Wang D, Spakowicz DJ, Salichos L, et al. The real cost of sequencing: scaling computation to keep pace with data generation. Genome Biol. 2016;17(1):53.  https://doi.org/10.1186/s13059-016-0917-0  PMID: 27009100 
  9. Leinonen R, Sugawara H, Shumway M, International Nucleotide Sequence Database Collaboration. The sequence read archive. Nucleic Acids Res. 2011;39(Database issue):D19-21.  https://doi.org/10.1093/nar/gkq1019  PMID: 21062823 
  10. Ohta T, Nakazato T, Bono H. Calculating the quality of public high-throughput sequencing data to obtain a suitable subset for reanalysis from the Sequence Read Archive. Gigascience. 2017;6(6):1-8.  https://doi.org/10.1093/gigascience/gix029  PMID: 28449062 
  11. Ball CA, Sherlock G, Brazma A. Funding high-throughput data sharing. Nat Biotechnol. 2004;22(9):1179-83.  https://doi.org/10.1038/nbt0904-1179  PMID: 15340487 
  12. Meehan CJ, Goig GA, Kohl TA, Verboven L, Dippenaar A, Ezewudo M, et al. Whole genome sequencing of Mycobacterium tuberculosis: current standards and open issues. Nat Rev Microbiol. 2019;17(9):533-45. "https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=31209399&dopt=Abstract" https://doi.org/10.1038/s41579-019-0214-5  PMID: 31209399 
  13. Robert Koch Institute (RKI). RKI-Bericht zur Epidemiologie der Tuberkulose in Deutschland für 2018. [RKI report on the epidemiology of tuberculosis in Germany for 2018]. Berlin: RKI; 2019. German. Available from: https://www.rki.de/DE/Content/InfAZ/T/Tuberkulose/Download/TB2018.pdf;jsessionid=5BCAC5554CB57C583B64705148FFC1B0.1_cid372?__blob=publicationFile
  14. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114-20.  https://doi.org/10.1093/bioinformatics/btu170  PMID: 24695404 
  15. Magoč T, Salzberg SL. FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics. 2011;27(21):2957-63.  https://doi.org/10.1093/bioinformatics/btr507  PMID: 21903629 
  16. Jandrasits C, Dabrowski PW, Fuchs S, Renard BY. seq-seq-pan: building a computational pan-genome data structure on whole genome alignment. BMC Genomics. 2018;19(1):47.  https://doi.org/10.1186/s12864-017-4401-3  PMID: 29334898 
  17. Jandrasits C, Kröger S, Haas W, Renard BY. Computational pan-genome mapping and pairwise SNP-distance improve detection of Mycobacterium tuberculosis transmission clusters. PLOS Comput Biol. 2019;15(12):e1007527.  https://doi.org/10.1371/journal.pcbi.1007527  PMID: 31815935 
  18. Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303.3997 [q-bio.GN]. 2013. Available from: https://arxiv.org/abs/1303.3997
  19. DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011;43(5):491-8.  https://doi.org/10.1038/ng.806  PMID: 21478889 
  20. Feuerriegel S, Schleusener V, Beckert P, Kohl TA, Miotto P, Cirillo DM, et al. PhyResSE: a web tool delineating Mycobacterium tuberculosis antibiotic resistance and lineage from whole-genome sequencing data. J Clin Microbiol. 2015;53(6):1908-14.  https://doi.org/10.1128/JCM.00025-15  PMID: 25854485 
  21. Sandgren A, Strong M, Muthukrishnan P, Weiner BK, Church GM, Murray MB. Tuberculosis drug resistance mutation database. PLoS Med. 2009;6(2):e2.  https://doi.org/10.1371/journal.pmed.1000002  PMID: 19209951 
  22. Miotto P, Tessema B, Tagliani E, Chindelevitch L, Starks AM, Emerson C, et al. A standardised method for interpreting the association between mutations and phenotypic drug resistance in Mycobacterium tuberculosis. Eur Respir J. 2017;50(6):1701354.  https://doi.org/10.1183/13993003.01354-2017  PMID: 29284687 
  23. CRyPTIC Consortium and the 100,000 Genomes ProjectAllix-Béguec C, Arandjelovic I, Bi L, Beckert P, Bonnet M, et al. Prediction of susceptibility to first-line tuberculosis drugs by DNA sequencing. N Engl J Med. 2018;379(15):1403-15.  https://doi.org/10.1056/NEJMoa1800474  PMID: 30280646 
  24. Roetzer A, Schuback S, Diel R, Gasau F, Ubben T, di Nauta A, et al. Evaluation of Mycobacterium tuberculosis typing methods in a 4-year study in Schleswig-Holstein, Northern Germany. J Clin Microbiol. 2011;49(12):4173-8.  https://doi.org/10.1128/JCM.05293-11  PMID: 21998434 
  25. Comas I, Chakravartti J, Small PM, Galagan J, Niemann S, Kremer K, et al. Human T cell epitopes of Mycobacterium tuberculosis are evolutionarily hyperconserved. Nat Genet. 2010;42(6):498-503.  https://doi.org/10.1038/ng.590  PMID: 20495566 
  26. Walker TM, Ip CLC, Harrell RH, Evans JT, Kapatai G, Dedicoat MJ, et al. Whole-genome sequencing to delineate Mycobacterium tuberculosis outbreaks: a retrospective observational study. Lancet Infect Dis. 2013;13(2):137-46.  https://doi.org/10.1016/S1473-3099(12)70277-3  PMID: 23158499 
  27. Kohl TA, Harmsen D, Rothgänger J, Walker T, Diel R, Niemann S. Harmonized genome wide typing of tubercle bacilli using a web-based gene-by-gene nomenclature system. EBioMedicine. 2018;34:131-8.  https://doi.org/10.1016/j.ebiom.2018.07.030  PMID: 30115606 
  28. Jajou R, Kohl TA, Walker T, Norman A, Cirillo DM, Tagliani E, et al. Towards standardisation: comparison of five whole genome sequencing (WGS) analysis pipelines for detection of epidemiologically linked tuberculosis cases. Euro Surveill. 2019;24(50):1900130.  https://doi.org/10.2807/1560-7917.ES.2019.24.50.1900130  PMID: 31847944 
  29. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13(11):2498-504.  https://doi.org/10.1101/gr.1239303  PMID: 14597658 
  30. Coll F, McNerney R, Guerra-Assunção JA, Glynn JR, Perdigão J, Viveiros M, et al. A robust SNP barcode for typing Mycobacterium tuberculosis complex strains. Nat Commun. 2014;5(1):4812.  https://doi.org/10.1038/ncomms5812  PMID: 25176035 
  31. Merker M, Blin C, Mona S, Duforet-Frebourg N, Lecher S, Willery E, et al. Evolutionary history and global spread of the Mycobacterium tuberculosis Beijing lineage. Nat Genet. 2015;47(3):242-9.  https://doi.org/10.1038/ng.3195  PMID: 25599400 
  32. Kohl TA, Utpatel C, Schleusener V, De Filippo MR, Beckert P, Cirillo DM, et al. MTBseq: a comprehensive pipeline for whole genome sequence analysis of Mycobacterium tuberculosis complex isolates. PeerJ. 2018;6:e5895.  https://doi.org/10.7717/peerj.5895  PMID: 30479891 
  33. Rosenthal A, Gabrielian A, Engle E, Hurt DE, Alexandru S, Crudu V, et al. The TB portals: an open-access, web-based platform for global drug-resistant-tuberculosis data sharing and analysis. J Clin Microbiol. 2017;55(11):3267-82.  https://doi.org/10.1128/JCM.01013-17  PMID: 28904183 
  34. Lieberman TD, Wilson D, Misra R, Xiong LL, Moodley P, Cohen T, et al. Genomic diversity in autopsy samples reveals within-host dissemination of HIV-associated Mycobacterium tuberculosis. Nat Med. 2016;22(12):1470-4.  https://doi.org/10.1038/nm.4205  PMID: 27798613 
  35. Xu Y, Liu F, Chen S, Wu J, Hu Y, Zhu B, et al. In vivo evolution of drug-resistant Mycobacterium tuberculosis in patients during long-term treatment. BMC Genomics. 2018;19(1):640.  https://doi.org/10.1186/s12864-018-5010-5  PMID: 30157763 
  36. Odone A, Tillmann T, Sandgren A, Williams G, Rechel B, Ingleby D, et al. Tuberculosis among migrant populations in the European Union and the European Economic Area. Eur J Public Health. 2015;25(3):506-12.  https://doi.org/10.1093/eurpub/cku208  PMID: 25500265 
  37. Murray M, Alland D. Methodological problems in the molecular epidemiology of tuberculosis. Am J Epidemiol. 2002;155(6):565-71.  https://doi.org/10.1093/aje/155.6.565  PMID: 11882530 
  38. Rodriguez MZ, Comin CH, Casanova D, Bruno OM, Amancio DR, Costa LDF, et al. Clustering algorithms: A comparative approach. PLoS One. 2019;14(1):e0210236.  https://doi.org/10.1371/journal.pone.0210236  PMID: 30645617 

Data & Media loading...

Supplementary data

Submit comment
Comment moderation successfully completed
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error