The utility of multiple molecular methods including whole genome sequencing as tools to differentiate Escherichia coli O157:H7 outbreaks

BM Berenger 1 2 , C Berry 3 , T Peterson 3 , P Fach 4 , S Delannoy 4 , V Li 1 , L Tschetter 3 , C Nadon 3 , L Honish 5 , M Louie 1 , L Chui 1 6 1. Alberta Provincial Laboratory for Public Health, Alberta, Canada 2. University of Alberta Department of Medical Microbiology and Immunology, Edmonton, Alberta, Canada 3. Public Health Agency of Canada National Microbiology Laboratory, Winnipeg, MB, Canada 4. ANSES (French Agency for Food, Environmental and Occupational Health and Safety), Food Safety Laboratory, Maisons-Alfort, France 5. Environmental Public Health, Alberta Health Services, Alberta, Canada 6. University of Alberta Department of Laboratory Medicine, Edmonton, Alberta, Canada


Introduction
Shiga toxin-producing Escherichia coli (STEC) consisting of O157 and non-O157 serogroups are a major public health concern.Cattle and other ruminants are natural reservoirs for STEC organisms, shedding the organisms in their faeces, which can cause food and/ or water contamination [1].Consumption of contaminated meat, dairy products, vegetables/fruit, water, contact with animals [1] and person-to-person transmission [2] have all been associated with STEC infections.Infection may be asymptomatic or can cause gastrointestinal symptoms, including mild diarrhoea to haemorrhagic colitis [3].In five to 20% of the infected patients post-diarrhoea haemolytic uraemic syndrome (HUS) occurs, which is characterised by haemolytic anaemia, thrombocytopenia and kidney injury or failure [2,3].Paediatric and elderly patients are at greatest risk for developing systemic STEC complications, which are not limited to HUS and can include cardiac, central nervous system, pancreatic, and pulmonary complications [3][4][5].Shiga toxins (Stx1 and Stx2) are the major virulence determinants responsible for symptoms associated with both haemorrhagic colitis and systemic infections [5].
Due to the public health importance of STEC infections, epidemiological and molecular surveillance systems are essential for early outbreak detection.In recent years, rapid advancements in the use of molecular typing methods have improved STEC surveillance and outbreak detection.The application of these tools helps to identify disease clusters, refine outbreak case definitions, facilitate case finding, and link human cases to environmental sources.In order to achieve these outcomes, molecular typing assays must possess the discriminatory power required to distinguish between related and nonrelated bacterial isolates, have high reproducibility, and be easy to perform.Furthermore, the results generated need to be easy to interpret, portable and allow inter-laboratory comparison.All typing results, especially during an outbreak, must be able to correlate with epidemiological data for accurate interpretation [6][7][8].
In Alberta, Canada, all E. coli O157:H7 are routinely typed by pulsed-field gel electrophoresis (PFGE) and multilocus variable-number tandem repeat analysis (MLVA) under surveillance practices.In July 2014, a unique cluster of E. coli O157:H7 was identified by PFGE with concordant MLVA analysis.The subsequent molecular and epidemiological investigation revealed that the cluster was associated with one of the largest human E. coli O157:H7 outbreaks in Canada since the implementation of PulseNet Canada (PNC; a national molecular subtyping network for food-borne disease surveillance) in 2000 (Linda Chui, PNC internal communications).To investigate the relatedness of isolates, whole genome sequencing (WGS) and virulence gene profiling were performed separately in real-time with concomitant analysis.The objectives of this study were twofold: (i) We sought to determine the relatedness of this large outbreak event to a concurrent, albeit smaller outbreak as well as to all sporadic cases occurring in the summer of 2014 in Alberta.In addition, a representative panel of isolates from a socially and economically significant, 2012 beef-associated outbreak was included for comparison.(ii) Using combined PFGE and MLVA profiles as a molecular typing standard, we assessed the individual ability of WGS-based methods (core single nt variants (SNV) and k-mer analysis) or virulence gene profiling to differentiate sporadic cases from simultaneously occurring E. coli O157:H7 outbreak clusters.

Molecular detection of Escherichia coli O157:H7 outbreaks in Alberta
Following established protocols from frontline microbiology diagnostic laboratories for enteric bacteria isolation, all presumptive E. coli O157:H7 isolates are forwarded to the Alberta Provincial Laboratory for Public Health (ProvLab) for serotype confirmation and molecular typing.Routinely, all E. coli O157:H7 isolates are subjected to PFGE and MLVA using standardised PulseNet protocols (www.pulsenetinternational.org).For E. coli, XbaI endonuclease is the primary restriction enzyme used for chromosomal DNA digestion and is followed by secondary enzyme digestion with BlnI.Images (tagged image file format) of the PFGE profiles for all isolates are uploaded to the PNC Public Health Agency of Canada National Microbiology Laboratory in Winnipeg, Manitoba (PHAC-NML) secure national database for national pattern designation.Participating PNC public health laboratories across Canada are alerted of clusters (n = 2 indistinguishable patterns) through the PNC web discussion board.In Alberta, the identification of STEC PFGE clusters triggers a public health investigation, involving the local public health authority and the Alberta ProvLab.MLVA analysis is performed at PHAC-NML on all E. coli O157:H7-confirmed isolates.A PFGE and MLVA cluster is defined as isolates with indistinguishable PFGE and MLVA patterns.

Outbreak identification
In Alberta, the Medical Officer of Health is notified of each case of E. coli O157:H7 in the province, which prompts an investigation into the case by an Alberta   Restriction enzyme digestion was done using XbaI and BlnI.
Cluster-α pattern was found in a clinical and an environmental isolate associated with outbreak B (see results for detail).Two sporadic isolates are also included for a reference (sporadic I and sporadic II).Pulsed-field gel electrophoresis (PFGE) national pattern designation is represented by the following: for XbaI restriction pattern as ECXAI (4 numerals) and for BlnI restriction pattern as ECBNI (4 numerals).
Health Services Environmental Health Officer who uses a standard questionnaire to identify potential sources of exposure.An outbreak investigation is initiated when a group of E. coli O157:H7 cases is identified with a common source or the aforementioned molecular typing criteria are met.ProvLab and public health officials are updated on the epidemiological and laboratory investigations through teleconferences, an online portal, and the distribution of line lists.

Genomic DNA isolation for virulence factor detection
A

Shiga toxin typing
Detection of Stx genes, stx 1 and stx 2 was determined using a real-time multiplex polymerase chain reaction (PCR) assay consisting of two separate reactions run on a ABI Prism 7500FAST Sequence Detection System (Life Technologies, Inc., Burlington, ON, Canada) as previously described [9,10].Conventional PCR was used to subtype stx 1 and stx 2 using primers from the World Health Organization Collaborating Centre for Reference and Research on Escherichia and Klebsiella [11].

Genome sequencing and assembly
WGS of isolates was performed at the PHAC-NML Core Genomics facility.Sample libraries were prepared using Nextera XT library preparation kit (Illumina, Inc., San Diego, CA, US).Sequencing was performed on the Illumina MiSeq platform with the MiSeq Reagent Kit V2 to achieve an average genome coverage greater than 50x for all isolates.Raw sequence reads are available under National Center for Biotechnology Information (NCBI) Bioproject PRJNA291542.
Sequencing reads were de novo assembled into contigs using SPAdes [17] and annotated with Prokka [18].SPAdes-assembled contigs smaller than 1 kb were removed from the analysis.

Core single nucleotide variant data preprocessing, quality control and data reduction
All read data available for each genome were processed using the following steps: (i) FFastq files were converted to Sanger quality encoding Fastq format, (ii) all Fastq files for each isolate were concatenated into one Fastq file per isolate.The concatenated Fastq files were next subjected to a quality control step using a custom Perl script that trims the reads up to a maximum of 10 bases on either end of the reads if the average base call quality in that region was below 25.Next, reads less than 36 bp in length and reads with an average base quality call below 25 were discarded from the analysis.Following the quality control step, all data for each isolate were reduced to a maximum of 200 x coverage (estimated based on the total bp length of the E. coli O157:H7 strain Sakai genome, including a 92,721bp plasmid, pO157) by random selection of reads.

Core single nucleotide variant calling
Core genome analysis was performed using the PHAC-NML bioinformatics custom Single Nt Variant Phylogenomic pipeline (SNVPhyl) [19] consisting of open-source software and custom Perl scripts.Briefly, sequencing reads were mapped against the complete reference genome, E. coli O157:H7 strain Sakai using SMALT v.0.7.0.1 [20] with a k-mer size of 13, a step size 6, and a minimum alignment fraction of 0.5.Variants were called using FreeBayes v0.9.8 [21] with variant reporting for all variants, no complex variants, minimum mapping quality of 30, minimum base quality of 30, minimum alternate fraction of variant bases in agreement of 75%, and minimum coverage of 20 reads at every position in the reference sequence.

Core single nucleotide variant phylogeny
For each of the variant call format files created by FreeBayes, complex variant calls were split into single variant calls to create new variant call format files.All variant calls were merged into a single alignment file.SAMtools v 0.1.1.18[22] was used to investigate positions where not every genome had a variant call.Positions where no variant was called with SAMtools mpileup and with at least a minimum coverage of 20 were assigned the reference base in the alignment.Positions where a variant was called with SAMtools mpileup and with a minimum coverage of 20 were excluded from the alignment.Other positions were excluded.The alignment files were used to generate a phylogenetic tree with PhyML v3.0 [23] using a generalised time reversible (GTR) model and the best of both nearest neighbour interchange (NNI) and Subtree Pruning and Regrafting (SPR) tree topology searching strategies.Horizontally transferred elements arising from recombination events were identified using Progressive Mauve [24], PHAge Search Tool (PHAST) [25] and Island Viewer [26].These coordinates were then masked from the SNV PHYLogenomics (SNVPhyl) analysis as were repetitive regions identified using the nucmer programme in the Mummer sequence alignment package.

Core single nucleotide variant minimum spanning trees
The high-quality (hq) core SNV for each isolate, identified in the SNVPhyl pipeline and used to generate the core SNV phylogenies were also visualised using minimum spanning trees (MST) generated with the open source Phyloviz goeBurst algorithm [27].Each unique set of hq core SNVs and corresponding isolates were assigned a unique identifier or sequence type, and a table of core SNV positions and the unique sequence types were entered into the Phyloviz goeBURST algorithm.Additional metadata, including outbreak event, MLVA and PFGE patterns were annotated onto the MST.

O u t b r e a k B
Outbreaks A, B (including cluster-α), and C are labelled by a specified colour.Sporadic and E. coli O157:H7 str.Sakai branches are in black.
14-6547 and 14-6537 (in black) are a cluster of two isolates unrelated to other isolates with exposure to ground beef.

k-mer clustering
The frequencies of all nt sequences of predefined lengths (k-mers) in the entire genome of each isolate was compared with the frequency of k-mers in all other isolates to determine a k-mer phylogeny tree.The alignment-free feature frequency profile (FFP) method [28] was used at a k-mer length of 25 nt and the SPAdes assembled contigs for each isolate.The optimal k-mer length was determined using the ffpvoacb utility part of the FFP package and the ffpreprof -e32 to determine the upper bound and the ffpvprof -f 2 to determine the lower bound.Phylogenetic trees were inferred from the resulting divergence distance matrix using the neighbour-joining method installed in the Phylip [29] package.

Description of the outbreaks
From 14 July to 17 September 2014, 149 clinical isolates of E. coli O157:H7 were received by the Alberta ProvLab for molecular typing (Figure 1).Two outbreaks (designated as A and B) were identified in Alberta during this time period along with multiple sporadic isolates that were unrelated to either outbreak based on definitions involving epidemiological and PFGE/MLVA profiles.For comparison to an outbreak not temporally associated with outbreak A and B, a 2012 outbreak (outbreak C) was also included in this study.
The first outbreak (outbreak A), involved ten patients (four of whom developed HUS) with isolates collected from 14 to 22 July.Epidemiological investigations revealed a common exposure of visiting the same summer fair.There was one unique PFGE pattern combination and a total of three MLVA patterns among outbreak A isolates (Table and Figure 2).No food or environmental isolates were recovered in regards to this outbreak.
The second outbreak (outbreak B) was of particular interest because it was the largest in Alberta since PNC's inception in the year 2000.In 2014, as of 17 September, 182 clinical isolates of E. coli O157:H7 were collected and referred to the Alberta ProvLab, whereas the mean annual case number for the previous five years was 87.6 (95% confidence interval: +/ − 10.99).
One hundred and eleven of these cases were associated with outbreak B during this time period, in which five developed HUS.Epidemiological investigations revealed a common exposure to contaminated pork products that were produced and distributed in Alberta [30].This finding was confirmed by matching PFGE and MLVA profiles in human and food/environmental outbreak isolates (five food and one environmental isolates were received by Alberta ProvLab in regards to this outbreak).
PulseNet Canada routine surveillance using PFGE analysis revealed the presence of two concurrent clusters with closely related PFGE patterns (ECXAI.0023,ECBNI.0430 and ECXAI.0634,ECBNI.430)(Figure 2).As more isolates were received at the ProvLab, PFGE and MLVA subtyping revealed that the outbreak consisted of several isolates with variant, yet closely related PFGE and MLVA patterns (Figure 2).Outbreak B included a cluster of two isolates that was designated cluster-α.One isolate, 14-7110, recovered from a swab of a food tray at a distributor under investigation for a possible link with outbreak B was PFGE typed with an indistinguishable BlnI DNA restriction pattern as the primary outbreak B pattern (ECBNI.0430),but a variant, though closely related XbaI pattern (ECXAI.2098),and a variant MLVA pattern from the other outbreak B isolates (15_9_18_3_8_7_6_8) (Table and Figure 2).This isolate (14-7110) was collected one month after an isolate (14-5378) was recovered from an individual with an indistinguishable PFGE/MLVA profile.An epidemiological linkage between cluster-α and outbreak B was demonstrated in part by the recovery of an isolate (14-5369) with the most predominant PFGE/MLVA profile in outbreak B from an individual who dined at the same restaurant within three days of the individual from whom 14-5378 was isolated.

Whole genome sequencing
The genetic relationships among outbreak-associated isolates were also determined by WGS using the PHAC-NML bioinformatics SNVphyl pipeline and by k-mer tree analysis.All aforementioned clinical, food, and environmental isolates were sequenced for each outbreak.For outbreak C, isolates from eight respective human cases that had occurred in Alberta and two food isolates were available for sequencing.One clinical isolate was excluded because it did not meet the sequence quality threshold.The sporadic isolates (no epidemiological or PFGE/MLVA link to an outbreak) that were sequenced included a cluster of two isolates unrelated to any outbreak with suspected exposure to ground beef (14-6537 and 6547), one isolate (14-5400) that came from an individual who worked with pork and one isolate (14-5618) that was randomly chosen.

Core single nucleotide variant phylogenetic analysis
The interrogation of hq core SNVs revealed that clinical human isolates within an outbreak event varied by 0-5 SNVs from one another other (0-5 SNVs for outbreak A and 0-5 SNVs for outbreak B) (Figure 3).Food/environmental isolates from outbreak B also clustered within 0-5 SNVs from the clinical human isolates.Outbreak A and outbreak B isolates clustered into distinct and welldefined branches, separated by a distance of 231-257 SNVs (Figure 3).Both outbreaks also clustered away from the sporadic isolates and the reference strain, E. coli O157:H7 str.Sakai.
Within outbreak B, the major PFGE/MLVA clusters ECXAI.0634,ECBNI.0430/13_9_19_2_8_6_11_8 and ECXAI.0023,ECBNI.0430/13_9_18_2_8_6_11_8 were separated by 2-4 SNV (Table and Figure 3).The HUSassociated isolates from outbreak B clustered with the other outbreak isolates with four in cluster 74 and one in cluster 23 (Figure 3).Upon further examination of the core SNV phylogenetic trees, the two isolates composing cluster-α were separated by 22-27 hq core SNVs from all other outbreak B isolates and were distanced by only one core SNV from each other (Figure 3).The isolate that was not part of cluster-α, but isolated from a diner at the same restaurant (14-5369) was found in the cluster of 74 isolates.
Performing core SNV analysis on the seven Alberta clinical human isolates involved in outbreak C (the 2012 beef-related outbreak) and two food isolates, revealed no core SNV differences between these nine outbreakassociated isolates.Moreover, this method differentiated the branch corresponding to the outbreak C isolates from the outbreak B branch by a distance of 157-161 core SNVs and outbreak A by 74-77 core SNVs Several sporadic isolates were identified concurrent with outbreaks A and B, and possessed variant PFGE/ MLVA profiles.These isolates have a minimum genetic distance of 57 hq core SNVs from all three outbreak clusters.Notably, one sporadic case (14-5400) recovered from an individual who worked directly with pork had a similar, but distinguishable BlnI pattern to outbreak A and a distinct MLVA profile (ECXAI.3108,ECBNI.0181/9_10_12_7_7_6_3_6) (denoted as Sporadic I in Figure 2).Core SNV phylogenetic analysis demonstrated that this isolate was distant from outbreak B isolates by a high number of core SNVs, 214-237.

k-mer clustering
The k-mer method also delineated outbreak A from outbreak B with each group of outbreak isolates located on distantly related nodes (Figure 4).There were isolates found within the outbreak B node (orange in Figure 4) with greater horizontal branch distances than the average branch lengths for outbreak B isolates, demonstrating additional resolution of certain isolates from other outbreak B isolates.These more distant isolates included ones with the most and least frequently observed MLVA and PFGE profiles.Cluster-α was also distinguished from the main outbreak B branch and was identified in a separate node, but originated from the same node as the outbreak B branch and thus, share a most recent common ancestor (Figure 4).In concordance with core SNV analysis, the isolate that was not part of cluster-α, but isolated from a diner at the same restaurant (14-5369) was not differentiated from other outbreak B isolates by k-mer clustering.k-mer analysis also did not indicate a close relationship between sporadic isolates and outbreak-associated isolates.

Shiga toxin genes typing
The presence of stx 1 and stx 2 was determined for all isolates received during the sampling period.For stx 1 and stx 2 subtyping, 36 isolates were selected from the 2014 sampling period comprising of outbreak-related food isolates (n = 3), HUS-associated isolates (n = 2 for outbreak A and n = 5 for outbreak B), all isolates from household members of HUS cases (based on address) (n = 4, all outbreak B), both the environmental and clinical isolate from cluster-α, a representative isolate for each outbreak B variant PFGE and MLVA profile not covered by other selection criteria (n = 19), and one isolate (14-6547) from the sporadic cluster of two individuals.
In addition, all clinical and food isolates from outbreak C were subtyped (n = 10).All outbreak A isolates (n = 3) were stx 1 negative and stx 2 positive (subtype stx 2a ).Most outbreak B isolates (27 of 31), including HUSassociated isolates were positive for both stx genes and subtyped as stx 1a and stx 2a.The clinical human isolate from cluster-α (14-5378) tested positive for stx 1a , stx 2a and stx 2c and the environmental isolate in this cluster (14-7110) was stx 1a and stx 2 positive, but no subtype was identified for stx 2 .The other variant stx subtypes in outbreak B included two clinical isolates, one stx 1a /stx 2 untypeable and one stx 1 negative /stx 2 untypeable.All outbreak C isolates and isolate 14-6547 from the sporadic cluster were stx 1a and stx 2a positive.

Virulence gene profiling
Virulence gene profiling was performed on all isolates (n = 155 received by ProvLab for molecular typing from 14 July to 17 September 2014.Outbreak A, outbreak B and sporadic isolates were indistinguishable based on gene profiling of 49 STEC virulence genes.All isolates were negative for the pilin subunit gene found in sorbitol fermenting STEC, sfpA and all but six isolates were positive for all other genes tested.Two outbreak B isolates, 14-6543 and 14-5377, tested negative for the putative proteins ZZ2096 and Z2098.In the k-mer analysis result, these isolates clustered within outbreak B, but both were distinguishable from each other and all other outbreak B isolates (Figure 4).A single outbreak B isolate, 14-5380, tested negative for the secreted effector proteins, espK, espN, espX7, and espO1-1 and was found to be related to outbreak B, but distinguishable from all other outbreak B isolates by k-mer analysis (Figure 4).All of the outbreak B isolates with genes not detected on the virulence gene array were indistinguishable from other outbreak B isolates using SNV analysis.The other three isolates that typed negative for virulence genes were sporadic isolates with the first testing negative for espX7, the second negative for espP, and the third testing negative for efa1, efa2, ent, nleB, nleE, pagC, and Z4331.None of the HUS cases were negative for any virulence genes other than sfpA.

Discussion
Technologies such as high throughput screening of virulence genes [32,33] and WGS have the potential to be used for early outbreak detection and characterisation.To be used as such, these methods must be compared with current international standards of bacterial typing.For E. coli O157:H7, the currently validated and widely employed standard typing methods are MLVA and PFGE.Furthermore, a bacterial typing method must have the following characteristics: accuracy, inter and intra-laboratory reproducibility (including with multiple passaging of isolates), high discriminatory power, concordance with epidemiological data, rapid and ease-of use, cost effectiveness, and amenability to computerised analysis [34].This work demonstrates that for typing of E. coli O157:H7, WGS is a suitable typing method already meeting many of these criteria, but not virulence gene profiling involving the genetic targets used in this study.
No virulence factors tested in this work could reliably distinguish between outbreak and non-outbreak strains.The high prevalence of the genetic markers used in this study in E. coli O157:H7 (reference 13 and this study), prevents the use of any assay that detects the presence or absence of these genes for the purpose of accurate discrimination between isolates.Shiga toxin subtyping was able to differentiate isolates only in cases where the subtypes differed from the most prevalent subtypes (stx 1a and stx 2a ) , a result similarly observed in an analysis of Albertan E. coli O157:H7 isolates collected from 2004 to 2012 [15].
Phylogenetic analysis using whole genome or core SNVs derived by a variety of methods is the most frequently published method to determine relatedness between isolates of E. coli or other Enterobacteriaceae such as Salmonella species [33][34][35][36][37][38].To the best of our knowledge, this is the first example of a real-time, largescale study comparing virulence gene profiling, hq core SNV or k-mer analysis to MLVA and PFGE profiling.
Using different bioinformatics pipelines and groups of E. coli isolates, four previously published studies have demonstrated the ability of WGS to discriminate between isolates of E. coli [35][36][37][38].Core SNV phylogeny cannot only differentiate STEC isolates from other STEC isolates, but also uropathogenic E. coli from one another [35].Two other studies have used MLVA as the 'standard' reference method and demonstrated core SNV phylogenies to be equivalent or better than MLVA at identifying outbreaks [36,37].One of these two studies used E. coli O157:H7 isolates from the United Kingdom and identified outbreaks using core SNV analysis that were missed by epidemiological investigations and MLVA analysis [37].Another study analysing isolates from a beef-associated outbreak of E. coli O157:H7 in Denmark demonstrated that core SNV phylogenetic analysis and nt-distance based trees (built using 17-base k-mers) methods were each capable of differentiating between outbreak isolates and concurrently occurring, non-outbreak sporadic E. coli O157:H7 and non-O157 STEC isolates.In the Danish study, PFGE was also performed on selected isolates, but its utility for outbreak detection was not compared with the two types of WGS approaches [38].The study herein adds to this knowledge by demonstrating that core SNV or k-mer phylogenies alone are concordant with combined PFGE and MLVA data when used to differentiate between concurrently occurring outbreak and non-outbreak isolates of E. coli O157:H7.
k-mer analysis showed concordance with PFGE and MLVA profiling while advantageously revealing increased discriminatory power when compared with PFGE and MLVA profiling or core SNV.Feature frequency profiling generates k-mers-based profiles using the entire genomic sequence of test organisms to determine relatedness, whereas core SNV analysis uses only the 'conserved' portion of the genome; therefore one may hypothesise that k-mer analysis would provide additional discriminatory power.With this increased discriminatory power, there is the concomitant risk that k-mer analysis may be too discriminatory by including not only 'core' conserved features, but also genetic elements resulting from horizontal gene transfer.These elements can be easily lost and gained as isolates undergo natural and passaging in the laboratory.Therefore, the inclusion of these elements in the analysis has the potential to mask core phylogenetic inference, but no evidence of this was observed in our study.
Cluster-α consisted of two isolates that differed from all other outbreak B isolates by at least 22 SNV, but differed from each other by only one core SNV.This cluster was included in outbreak B based on PFGE and epidemiological links (an outbreak B implicated restaurant), but was differentiated from other outbreak B isolates by MLVA, k-mers and core SNV analysis.Furthermore, the stx 2 subtype for these two isolates differed from each other and the other subtyped outbreak B isolates.This cluster likely represents different strains contributing to the outbreak.Similar observations have been made previously in our laboratory as well as by Gilmour et al. [39].These observations emphasise the need to consider careful correlation of epidemiological together with molecular subtyping data.
The core SNV genetic distances observed between cluster-α and outbreak B exemplifies the utility of predefining a SNV threshold for isolate relatedness.
Cluster-α differed from outbreak B by 22-27 SNVs and other outbreak B isolates differed by 0-5 SNVs whereas outbreak-unrelated, sporadic isolates differed by ≥ 58 SNVs.Despite using different SNV calling and phylogeny methods, other studies have also identified ≤ 5 SNVs as a potential threshold for genetic relatedness among E. coli O157 [36,37].In one study of extendedspectrum beta-lactamase-E.coli outbreak-associated isolates, 0 SNVs were identified between outbreakassociated isolates [35].It is more difficult to assign a numerical 'threshold' or pattern for relatedness with k-mer analysis, which produces trees based on a distance matrix generated from k-mer profiles and their presence or absence.Unlike MLVA and/or PFGE using PulseNet standardised methods, both k-mer and core SNV methods do not generate a numerical 'barcode' for the organisms, which makes it difficult to compare isolates from different laboratories.Therefore, a large shared and curated database will likely be required to generate cluster identifiers that could act as the organism's 'barcode'.
Currently, PulseNet networks worldwide conduct timely surveillance and facilitate outbreak detection.For WGS to be used in this manner, standardised protocols, quality validation metrics and a robust method to determine isolate relatedness will need to be established.Published reports use different sequencing platforms and bioinformatics pipelines to assemble and analyse WGS data, prohibiting direct comparisons between studies.Head-to-head comparison of these different methods will help determine the appropriate standard(s).The inter-laboratory comparison of WGS data also requires communal database of isolates, which requires adequate computing infrastructure and secure electronic networks capable of transmitting large datasets.Other considerations encompass the ethical, legal or political barriers to sharing complete genomics data between various health authorities.
In conclusion, WGS holds significant potential to replace current gold-standard typing methods such as PFGE for the routine surveillance and detection of enteric outbreaks.This shift will be driven by the advantages offered by WGS such as increased discriminatory power and genetic resolution.However, before this technology can be widely implemented, certain barriers remain to be addressed such as initial capital expense, computing infrastructure, and validated automated, user-friendly WGS analysis software.Moreover, quality metrics and standardised protocols including standardised definitions for isolate relatedness are required before routine application of WGS within public health laboratories.

Figure 1
Figure 1Distribution of Escherichia coli O157:H7 cases in residents of Alberta according to time and outbreak, Canada, 14 July-17 September 2014 (n=149)

Figure 3
Figure 3Core single nucleotide variant analysis, represented as a minimum spanning tree, of Escherichia coli O157:H7 sequences from isolates submitted to the Alberta Provincial Laboratory for Public Health, Canada, 14 July-17 September 2014 (n=140 sequences)

Figure 4 k
Figure 4k-mer phylogeny, represented as a minimum spanning tree, of Escherichia coli O157:H7 isolates submitted to the Alberta Provincial Laboratory for Public Health, 14 July-17 September 2014 (n=140 isolates)

Table ) .
The most predominant difference in MLVA profiles was at the third locus, which varied from 17 to 20 repeats (Table).Further complicating the analysis were isolates collected from household members that were found to possess variant PFGE/MLVA profiles.For instance, isolates from one family with a HUS case all had the PFGE pattern ECXAI.0230,ECBNI.0430,however two MLVA patterns were observed, 13_9_19_2_8_6_11_8 (n=1) and 13_9_20_2_8_6_11_8 (n=2, including the HUS case).Another family with two infected members had their isolates collected on the same day with different PFGE/MLVA profiles (ECXAI.0230,ECBNI.0430/13_9_18_2_8_6_11_8 and ECXAI.0634,ECBNI.0430/13_9_19_2_8_6_11_8).No temporal associations were observed with the variant PFGE/ MLVA profiles.There were also no distinct PFGE or MLVA patterns among HUS cases, which had the PFGE patterns ECXAI.0230 or ECXAI.0634,ECBNI.0430, and MLVA profile 13_9_19_2_8_6_11_8 except for one HUS isolate with the MLVA profile 13_9_20_2_8_6_11_8.Overall, 23 different PFGE and MLVA profiles and five clusters (two or more isolates with the same profile) were identified in outbreak B. 2).