Proof-of-concept study for successful inter-laboratory comparison of MLVA results

Multiple-locus variable-number of tandem repeats analysis (MLVA) is widely used for typing of pathogens. Methods such as MLVA based on determining DNA fragment size by the use of capillary electrophoresis have an inherent problem as a considerable offset between measured and real (sequenced) lengths is commonly observed. This discrepancy arises from variation within the laboratory set-up used for fragment analysis. To obtain comparable results between laboratories using different set-ups, some form of calibration is a necessity. A simple approach is to use a set of calibration strains with known allele sizes and determine what compensation factors need to be applied under the chosen set-up conditions in order to obtain the correct allele sizes. We present here a proof-ofconcept study showing that using such a set of calibration strains makes inter-laboratory comparison possible. In this study, 20 international laboratories analysed 15 test strains using a five-locus Salmonella enterica serovar Typhimurium MLVA scheme. When using compensation factors derived from a calibration set of 33 isolates, 99.4% (1,461/1,470) of the MLVA alleles of the test strains were assigned correctly, compared with 64.8% (952/1,470) without any compensation. After final analysis, 97.3% (286/294) of the test strains were assigned correct MLVA profiles. We therefore recommend this concept for obtaining comparable MLVA results.


Introduction
Multiple-locus variable-number of tandem repeats analysis (MLVA) has become an increasingly popular method for fast, reproducible and inexpensive subtyping of many bacterial species including Salmonella enterica serovar Typhimurium [1,2].The principle of MLVA is a concurrent analysis of loci with tandem repeated DNA sequences (variable number of tandem repeats, VNTRs).Polymerase chain reaction (PCR) is used to amplify DNA containing the VNTR sites and electrophoresis is used to distinguish the alleles according to their sizes.In S. Typhimurium, the majority of informative loci are relatively short, 6-9 base pairs (bp), requiring capillary electrophoresis (CE) for reliable length measurement.It is known that CE, as employed by common sequencing equipment, is notorious for having a set-up-dependent discrepancy between measured and real (sequenced) fragment lengths [3][4][5][6].Production of data that are comparable between laboratories is crucial for the usefulness of typing methods for food-borne pathogens, e.g. to enable detection of common outbreaks in different regions or countries and to track the pathogens in the food production chain.
This study is a follow-up to a previous study that provided recommendations for the MLVA nomenclature of S. Typhimurium -a scheme that is based on the actual number of repeats in each locus and where the MLVA profile is described as a string of five numbers [7].The objective of this study was to test whether comparable MLVA results can be obtained between laboratories by the use of a set of calibration strains.In this report, we show that MLVA results from 20 laboratories using different laboratory MLVA primers and/or CE equipment can be compared in a relevant way by the use of calibration strains.In all, 20 public health, food and veterinary institutes agreed to participate and were provided with two sets of strains: a calibration set comprising 33 strains and a set of 15 test strains (Table 1).Along with the shipment of strains came a suggested protocol [8] and Excel templates that could be used for adjusting test results based on the participants' calibration results.Participants were not obligated to use the suggested protocol but were free to use methods and primers as they wished.The only requirements were to analyse the allele sizes for the same five loci and to report results as the number of repeats at each of these loci.A total of 19 participants used the primers described by Lindstedt et al. [1] and one participant used primers from the PulseNet US protocol [9].
The strains in the calibration panel are either S. Typhimurium or a monophasic variant O:4,5,12;i:-.The strains were selected from the Danish public health and food database to provide a good coverage of the alleles known to occur in each MLVA locus.These strains should not be seen as a representative selection of the Danish or any other S. Typhimurium population.

Test panel
The strains in the test panel (Table 1) were chosen among strains obtained through the Danish public health surveillance.The test set was designed to fulfil four criteria: (i) include alleles not present in the calibration set; (ii) include identical profiles from patient clusters; (iii) include profiles very similar to each other, i.e. single locus variants; and (d) provide a good distribution of allele sizes in order to test whether the calibration set is good enough to fulfil its role for calibration of short and long alleles.

Allele assignment
Participants were asked to determine the number of repeats in each locus of the test strains in accordance with the previously suggested nomenclature [7].The conversion of measured fragment size into correct allele assignment was to be done by using the results obtained from analysing the fragment sizes of the various VNTRs for the calibration strains with sequenced alleles.The participants were free to use any method for this.However, as a suggested help, two Excel files with calculations were provided.The first used the results from testing the 33-strain calibration set to convert the discrepancies between real and measured fragment length into a matrix with compensation factors for each possible length.The second was a template that used the compensation matrix to calculate real fragment lengths from the apparent fragment lengths of test strains.In this second file, the compensated fragment lengths were also converted into repeat counts.This two-phase approach makes it possible to assign repeat counts to alleles that are not present in the calibration set.
Secondary DNA structure formation and stability was calculated with mfold [10].
The amplification of STTR6 using PulseNet International ST-5 primers in order to investigate the discrepancy in amplification of this locus was performed according to the recommended protocol [9].

Results
Of the 20 participants, one responded with results from two different CE set-ups, so the study comprised 21 data sets in all.One of the test strains, Test-11, was not viable or was missing in several strain shipments and was therefore excluded from the results analysis.

Calibration set analysis
The laboratory set-up of each laboratory and a summary of the results are presented in Table 2. Four participants had strains that had lost a repeat in a single locus.One of these strains was probably a mixed population when shipped, since two participating laboratories found the same allele difference and an additional laboratory detected a double peak corresponding to the two sizes.Laboratory 4 reported a peak at the wrong coordinates.This was found to be an error from reading the chromatogram.Laboratory 8 had a general problem with the accuracy of their CE equipment, which affected the results obtained from both the calibration and test sets to such a degree that creation of reliable compensation factors and correct assigning of alleles was not possible.
Laboratory 13 was the only participant that used the PulseNet US primers and produced data by using two CE machines of different brands.The use of alternative primers created different results for two loci in a minority of the strains.This laboratory did not detect STTR3 alleles in STm-SSI21 and STm-SSI31 (alleles 0314 and 0511).The explanation for this was that the PulseNet primers produced fragments that were longer than the largest fragment of their size marker.Furthermore, a distinct STTR6 fragment in STm-SSI03 was detected with the PulseNet ST5 primers.This allele was not amplified with the Lindstedt et al. primers [1].
In order to investigate this discrepancy in STTR6 fragment production, we tested all available strains (222 of 380) from Danish surveillance of human infections (from 2001 to 2011), in which STTR6 was not amplified by the Lindstedt et al. primers.Using the corresponding PulseNet ST5 primers, a product was amplified from 51 (23%) of the 222 strains (data not shown).The total number of S. Typhimurium and monophasic variant MLVA-typed strains obtained through Danish surveillance during these years was 6,007, resulting in a MLVA typing uncertainty of approximately 1.5% when using the different primer set.The range of compensation needed is visualised in Figure 1, where the five VNTRs from all datasets are plotted.The equipment used by each of the participants is listed in Table 1.Figures 1 and 2 show that different equipment setups generate very different results for the same strain set.When using the same equipment and marker, the results were similar for most laboratories and the difference between real and measured sizes followed a fairly smooth progression for STTR9, 5, 6 and 10.The STTR3 locus comprises a combination of 27 bp and 33 bp repeats.The plotted error curves for STTR3 are more erratic and when analysed in detail the 27 bp repeats migrates differently from the 33 bp repeats in this locus (in Figure 1 panel F, allele numbers as a combination of the number of 27 bp and 33 bp repeats are indicated below the data points).This means that the STTR3 locus is harder to compensate for when it comes to alleles not present in the calibration set.
Regarding choice of size marker, it is noted that all laboratories using the Chimerx Geneflo 625 marker (both ROX and TAMRA labelled) experienced an erratic area between 150 and 350 bp, seen in Figure 2. It is most likely that this is due to the size marker since the same pattern is seen in all loci with different polymers, filter sets and primers.This suspicion is strengthened when plotting instrument time against size marker fragment length where the same 'roller coaster'-like trend is seen (Robert Söderlund, personal communication, 5 May 2012).This roller coaster-like curve is not observed by participants using the GeneScan ladders.
The participating laboratories also provided data on fluorophores used for labelling primers.The analysis indicates that variations in labelling have a negligible impact on the measured results.

Test set analysis
In order to compare with a situation in which no allele compensation factors were applied, the participants' raw data were translated directly into number of repeats with the simple calculation: (fragment length − flanking region size)/repeat size.The results of this showed that 64.8% (952/1,470) of all fragment sizes were converted to the correct number of repeats and 3.4% (10/294) of the strains were assigned the correct MLVA profile.
When applying compensation factors derived from the calibration set, the participants initially scored correctly 97.5% (1,433/1,470) of the alleles and assigned the correct MLVA profiles to 90.1% (265/294) of the test strains.Most of the errors were not related to the calibration method itself.They occurred in four laboratories (3,5,7,15) making entry errors in the response scheme and one laboratory (15) that had an allele that had lost a repeat.Four laboratories (3,16,20,21) did not notice allele changes in their calibration set, which subsequently affected the analysis of the test set.

Figure 2
Examples of how laboratory equipment affects the discrepancy between real and measured fragment lengths, five-locus Salmonella enterica serovar Typhimurium MLVA MLVA: multiple-locus variable-number of tandem repeats analysis.Data in all three panels were obtained using an ABI3130XL.Panel A is using filter set G5 and GeneScan 600LIZ, Panel B is using filter set D and the Geneflo625-ROX marker, Panel C uses filter set D and the Geneflo625-ROX marker but with the PulseNet primer set.The area between 150 and 350 base pairs experiences a 'roller coaster'-like profile in all loci in panels B and C. Laboratory 16 failed altogether to include compensation from the calibration set and consequently scored only one isolate correctly out of 14.Other errors were related to raw input data and could consequently not be amended by any calibration analysis.As mentioned above, Laboratory 8 had a very large general variation, which caused four alleles to be erroneously read.Laboratory 3 detected alleles in four situations where none should be found and initially failed to detect one STTR3 peak.This laboratory recorded very large differences between peak intensities, which probably were the cause of these problems.Laboratory 18 performed their initial analysis with presumably poor DNA preparations, which resulted in erroneous data.
In one instance, a laboratory (Laboratory 12) observed a fragment (compensated length 387.9 bp) for the STTR3 allele for Test-2, which was low compared with the expected compensated size of 391 bp and so a corresponding allele name could not be assigned.The allele was subsequently sequenced in duplicate by the Statens Serum Institut in Denmark and was confirmed to have the 0208 allele as expected.The participant was supplied with a new sample of Test-2 and again found a fragment slightly too short for making a secure allele assignment.
After indicating to the nine affected laboratories that they had problems in a particular area of the analysis, the participants re-analysed their data and the correct number of MLVA profiles rose from 90.1% (265/294) to 97.3% (286/294) (from 97.5% (1,433/1,470) to 99.4% (1,461/1,470) when counting individual alleles).

Discussion
A total of 20 laboratories from multiple continents participated in this inter-laboratory study to evaluate the efficacy of using a set of calibration strains for obtaining comparable MLVA results despite the use of different laboratory set-ups.A wide spectrum of CE machines, size markers and dye-sets was represented.This proof-of-concept study was based on the widely used five-locus MLVA for S. Typhimurium developed by Lindstedt et al. [1], but the concept of using calibration strains has also been suggested for other MLVA protocols [11,12].Most participating laboratories used the originally published primers, however, the principle of using the actual number of repeats in each locus as the universal nomenclature [7] allows for the use of alternative primers.The primers of the PulseNet US protocol [9] were used by one laboratory performing the analysis with two different laboratory setups.
In principle, no steps in the data analysis or laboratory procedures were standardised between the laboratories.As expected, the raw data obtained by participants varied considerably and were not useful for direct comparison of results.A difference in measured fragment length of up to 13 bp was seen for the same allele depending on the CE machine, size marker, dye set, etc.When using the calibration strains with known fragment lengths to produce a specific compensation system for each laboratory, all laboratories were able to obtain comparable results for most loci of the test strains.
Due to the nature of MLVA analysis, the VNTRs are not perfectly stable [13,14].It is therefore not unexpected to occasionally find single locus variants, also in the calibration set.Four laboratories had a single calibration strain with a single repeat change in one locus.This is not detrimental to creating correct calibration factors as long as the changes are accounted for when calculating the compensation factors.The same is true in the case where one participant detected an STTR6 allele with an alternative set of primers when the allele could not be detected with the other primer set.However, if the changes are not noticed, the compensation factors will be offset and the subsequent allele assignment loses some fidelity.It should be emphasised that laboratories using a calibration set should be careful to control whether there are any repeat changes in their particular set.This is easiest done visually via a scatter plot, like the one in Figure 2. If a locus has lost or gained repeats, this will be readily visible.
As previously stated, participants could freely choose how to use the calibration strain set together with the test strains.The calibration set is a general solution, with flexibility to deal with a large variation in set-up conditions and it can readily be used also to assign alleles not present in the set itself.But, as seen in the results, it is not the only possible way to achieve a correct allele assignment.An alternative approach is the one taken by the US Centers of Disease Control and Prevention (CDC) [15], where instead of compensating for different laboratory set-ups, the testing protocol is standardised to a few precisely defined setups.One participant used this latter approach to carefully craft a table with bins from their own large data set and controlled allele nomenclature by sequenced alleles within these bins.This approach requires thorough standardisation at both the equipment and method levels.As can be seen in Figure 1, when standardising to the same CE machine, polymer, primer set and size marker, most of the laboratories in this study showed results with high similarity, but there were also deviant results, e.g. in STTR3 (Figure 1, panels E-F), where the same equipment set-ups resulted in up to 6 bp difference between laboratories.Another participant in this study chose to use only part of the supplied calibration set.The correct size of a useful calibration set depends on how linear the progressive error is in a particular set-up.With a very linear plot, such as panel A of Figure 2, the number of calibration strains can be reduced considerably.
The migration discrepancies between real and measured fragment length is likely a function of secondary structure formation.Examples of this are STTR6 and STTR10, where the former migrates as a progressively shorter fragment and STTR10 as a longer fragment.When modelling these repeats with mfold [10], the STTR6 repeat sequence readily forms stable secondary structures while STTR10 hardly forms any internal base pairing at all -hence the trend for STTR10 to migrate as a longer fragment in the electrophoresis.
For the STTR3 locus, the 27 bp repeat has a stronger tendency to form stable secondary structures than the 33 bp repeat, resulting in erratic discrepancy plots as the 27 bp repeats migrate differently from the 33 bp repeats.Consequently the STTR3 locus is harder to compensate for when it comes to alleles not present in the calibration set.This effect is seen in the single error that could not be prevented by correct data analysis -the low 0208 allele in Test-2 when analysed by Laboratory 12. Looking at the calibration set, the alleles closest in size to 0208 (theoretical 391 bp using Lindstedt et al. primers) is 0009 and 0011 (370 and 436 bp, respectively).These are both without 27 bp repeats and hence will be expected to be measured as longer by CE.The calibration values for 0208 are therefore calculated wrongly and 0208 is not compensated enough.This is a deficit in the calibration set, which can be amended by adding a strain having this repeat to the calibration set.With exception of STTR3, there is very little mutational variation in the repeat regions, as previously shown [7], and therefore the variation in measured fragment length due to mutations is negligible for these other STTRs.
The absence/presence of null alleles can be quite troublesome when standardising.This was shown clearly with the calibration set using the PulseNet primers, where in one case an apparent fragment was amplified whereas all participants using the Lindstedt et al. primers had an obvious null allele.Null alleles should perhaps be regarded as absence of information rather than information of absence.
Participants had access to a standard operating procedure [8] that included suggested laboratory procedures as well as guidance to suggested data analysis.Without any further guidance, the test set was perfectly analysed in 13 of the 21 submitted datasets.Several of the participants did not use the MLVA routinely, while others ran this assay every week.Errors in the analysis were made by inexperienced as well as experienced participants.All but one of the erroneously analysed alleles would not have occurred with a well-standardised workflow.They involved keyboard entry error, false peaks due to intensity problems, failing to actually use the calibration data, general precision problems and cases where calibration strains had lost a repeat and hence gave a faulty compensation for the test strains.As with other types of analyses, it is important to look critically at the results and use checkpoints to control the quality.A guide outlining the most common pitfalls should be written to alleviate most of these problems.
The use of the previously suggested nomenclature [7], and the calibration approach validated in this study, makes the MLVA profiles unambiguous and directly comparable and thereby making exchange of profiles independent of any central reference type repository.
After pointing out problems to the eight participants without an initial 100% score, they resubmitted a new analysis.This resulted in a perfect analysis score for 18 of the 21 data sets.The remaining three were Laboratory 8 (with general accuracy problems), Laboratory 3 (with intensity problems) and Laboratory 12 (with an actual analysis problem in a single allele).
In conclusion, we have provided a comprehensive tool that enables laboratories to compare the vast majority of their MLVA results regardless of what hardware, software, primers and conditions they are using.The participants assigned the correct MLVA profiles to 97.3% (286/294) of the strains, they could correctly assign allele names to alleles not present in the calibration set, they could group identical profiles together, and they were able to separate out single locus variants.We therefore recommend the concept described in this paper for obtaining inter-laboratory comparable MLVA results.

Participants
Participants of an expert consultation in Copenhagen, Denmark, in May 2011, organised by the United States (US) Centers for Disease Control and Prevention, the European Centre for Disease Prevention and Control, the Association of Public Health Laboratories in United States, the Public Health Agency of Canada and the Statens Serum Institut, Denmark, and additional interested parties were invited to participate in this study.

Table 1
Strains in the five-locus Salmonella enterica serovar Typhimurium MLVA test panel MLVA: Multiple-locus variable-number of tandem repeats analysis; NA: locus not present (no polymerase chain reaction (PCR) product obtained).Alleles were verified via direct sequencing.Test-6 and Test-15 are from the same cluster and have identical profiles.Test-4 is a one-locus variant of Test-8.Alleles marked in grey cells are not found in the calibration set.

Table 2
Participating laboratories, equipment, primers and detected discrepancies in the five-locus Salmonella enterica serovar Typhimurium MLVA Measured error for all calibration results in the five-locus Salmonella enterica serovar Typhimurium MLVA [9]A: Multiple-locus variable-number of tandem repeats analysis.aPrimerset 1 is described by Lindstedt et al.[1], primer set 2 is from the PulseNet United States (US) protocol[9].bLaboratoryset-up groups were assigned based on size marker family, dye set and primer set.Group G5 (ABI 3000 series instrument using G5 filters and GeneScan LIZ markers), group D (ABI 3000 series but with D filters and GenFlo-625 ROX markers), group D-alt (same as D but with PulseNet US primers), group D-mm (same as D but with MapMarker 100 marker), group C (ABI 310 with filter set C), group B (Beckman instrument) and group B-alt (Beckman instrument with PulseNet US primers).bp:basepairs; MLVA: multiple-locus variable-number of tandem repeats analysis.The laboratory set-up groups were defined as group G5 (ABI 3000 series instrument using G5 filters and GeneScan LIZ markers), group D (ABI 3000 series but with D filters and GenFlo-625 ROX markers), group D-alt (same as D but with PulseNet United States (US) primers), group D-mm (same as D but with MapMarker 100 marker), group C (ABI 310 with filter set C), group B (Beckman instrument) and group B-alt (Beckman instrument with PulseNet US primers).It can be seen that one red line deviates from the general trend for group G5 in STTR9, 5, 6 and 10: this is the same participant in all cases.In panel F, allele numbers as a combination of the number of 27 bp and 33 bp repeats are indicated below the data points.