Genome sequence analysis of Ebola virus in clinical samples from three British healthcare workers , August 2014 to March 2015

A Bell1,2, K Lewandowski (kuiama.lewandowski@phe.gov.uk)1,2, R Myers3, D Wooldridge3, E Aarons1, A Simpson1, R Vipond1,4, M Jacobs5, S Gharbia3,4, M Zambon3 1. Public Health England, Porton Down, Salisbury, United Kingdom 2. These authors contributed equally to the work and are joint first authors 3. Public Health England, Colindale, London, United Kingdom 4. NIHR Health Protection Research Unit in Emerging and Zoonotic Infections, Liverpool, United Kingdom 5. Department of Infection, Royal Free London NHS Foundation Trust, London, United Kingdom

We determined complete viral genome sequences from three British healthcare workers infected with Ebola virus (EBOV) in Sierra Leone, directly from clinical samples.These sequences closely resemble those previously observed in the current Ebola virus disease outbreak in West Africa, with glycoprotein and polymerase genes showing the most sequence variation.Our data indicate that current PCR diagnostic assays remain suitable for detection of EBOV in this epidemic and provide confidence for their continued use in diagnosis.
Monitoring of the evolution of the viral genome during the ongoing outbreak of Ebola virus disease (EVD) in West Africa is crucial for the early detection of mutants that may evade sequence-based diagnostics and for monitoring efficacy of therapeutic options.We present here our analysis of Ebola virus (EBOV) sequences obtained from blood samples from three British healthcare workers (HCWs) who were infected with EBOV in Sierra Leone.
Two were repatriated from Sierra Leone and the third became symptomatic upon return to the UK.All were transferred to the specialist isolation ward at the Royal Free Hospital in London, where they subsequently recovered.Informed consent was sought and received from each of the patients for viral whole genome sequencing and publication of the findings.Viral genomes from pre-intervention whole blood and EDTA plasma samples were sequenced and analysed to provide a baseline for any subsequent transmission of EBOV in the UK and to identify and monitor mutations that may affect the sensitivity of treatment and diagnostics (Table 1).

Sequence analysis
RNA was extracted from patient samples using the EZ1 RNA Universal Tissue Kit (QIAgen).Confirmation of EVD diagnosis in all three patients was performed using PCR assays targeting the NP gene [1].Samples for sequencing were treated with DNase I (Life Technologies) and purified using an RNA Clean and Concentrator kit (Zymo).Single primer isothermal linear amplification (SPIA) cDNA was prepared from total RNA following the Ovation RNA-seq V2 (NuGens) protocol [2], with the exception that RNA was denatured for 5 min at 85 °C before first-strand synthesis.Samples were purified using a MinElute column (QIAgen).Following amplification, paired-end libraries were prepared for Illuminia MiSeq sequencing following the Nextra XT protocol using 1.5 ng of SPIA cDNA.Reads were trimmed to a minimum of Q30.Genomes were mapped to KM233113.1 using BWA 0.7.5 and consensus called with Quasibam 1.0 using a local instance of The Galaxy Project [3][4][5].Consensus sequences were produced at a minimum depth of five reads and single nucleotide polymorphisms (SNPs) at a minimum depth of 20.Ambiguous bases were included when present in 20% of reads.
Full viral genome sequences were obtained from samples from all three infected HCWs patient samples and were submitted to GenBank (accession numbers are listed in Table 1).Sequence analysis showed that across the length of the EBOV genome, UK3 showed the most nucleotide variation (22 and 23 SNPs), but no insertions or deletions, compared with UK1 and UK2, respectively (Figure 1).These gave rise to seven and eight amino acid changes, respectively.
No nucleotide changes within the open reading frames (ORFs) for the virion protein (VP) 40, VP30 and VP24 genes were observed.Within the coding region for the nucleoprotein (NP) gene, no SNPs were seen between UK1 and UK2, although UK3 showed one non-synonymous SNP (P to S at position 1,957).One synonymous SNP was seen between UK1 and UK2 in the VP35 ORF, while UK3 showed two non-synonymous SNPs to UK1 and UK2 (S to R at position 3,371 and E to G at position 3,380).
The GP gene showed no SNPs between UK1 and UK2, and three non-synonymous SNPs from UK3 to UK1 and UK2 (R to K at position 6,932, R to S at 7,265 and L to E at 7,352).The most SNPs within an ORF were found to be in the viral polymerase (L) gene, with UK1 and UK2 showing four nucleotide changes, and UK3 showing five changes in respect to UK1 and UK2.These SNPs total less than one third of SNPs found, for a gene that comprises 36% of the total genome.These data suggest that the L gene is conserved, with only two non-synonymous SNPs.One amino acid change is seen from UK2 to UK1 and UK3 (A to T at 17,848) and one amino acid change from UK3 to UK1 and UK2 (T to A at 16,894) (combined, UK3 differs in one position from UK1 and two positions from UK2).
A phylogenetic tree based on sequences from the three UK samples and all available published sequences was generated using a heuristic maximum likelihood algorithm (Figure 2).Analysis shows that the three UK sequences fall within one large Sierra Leonean clade, with UK2 and UK3 in a different subclade from UK1. UK3 appears to share a common ancestor with the group that UK2 sits within.Sequences from Mali and Liberia form a distinct outgroup from the Sierra Leonean clade.

Discussion
The ongoing EVD outbreak in West Africa is the largest known, with over 25,000 recorded cases up until April 2015 [6].In response to the outbreak, a large number of international civilian and military aid teams have been deployed alongside local workers at multiple treatment and diagnosis centres in Guinea, Sierra Leone and Liberia.Over 860 HCWs are known to have been infected [6].Monitoring of the evolution of the viral genome during outbreaks is crucial for the early detection of mutations that may have an impact on disease virulence or transmissibility or affect the sensitivity of sequence-based viral genome detection assays in widespread use.The high viral loads seen in individuals infected with Ebola virus shortly after symptom onset favours the development of whole genome sequencing using next generation sequencing.More than 450 EBOV genome sequences derived using whole genome sequencing have been reported from samples isolated in Guinea, Sierra Leone, Mali and Liberia [7][8][9].Analysis of 78 genomes isolated from samples from patients in Sierra Leone between May and June 2014 suggested an observed evolutionary rate double that seen in previous EVD outbreaks [10].The importance of tracking sequence variation in relation to molecular detection strategies was highlighted in that analysis.More recent analysis, however, identified an observed evolutionary rate equivalent to that of past outbreaks [11].
In our study presented here, sequence analysis of the NP gene, the target for widely used diagnostic detection assays [1], identified no SNPs within the regions where diagnostic primers bind.The GP gene product is the viral receptor, and the target of neutralising antibodies.Synonymous SNPs are present in locations where primers and probe bind for real-time detection methodologies based on the GP gene [1] (Table 2).
The observation of SNPs within the primer/probe binding sites of the GP gene is consistent with other sequences obtained from this outbreak in West Africa (data not shown).These SNPs are not expected to affect primer binding, although this is yet to be formally determined, but this reinforces the necessity of regular review of diagnostic detection strategies  against available sequence information.A recent analysis of sequences from nine EBOVs from Mali and other available sequences also indicated no effect of SNPs on PCR-based detection assays [12,13].
Cases 2 and 3 from whom UK2 and UK3 were obtained, respectively, worked at the same treatment centre before infection and this is reflected in the close nature of the isolates' phylogeny.The patient from whom UK1 was obtained worked elsewhere: the UK1 sequence more closely resembles those reported by Gire et al. [10], who sampled from the same location.
During the intensive and widespread EVD epidemic in West Africa, the evolution of EBOV in Sierra Leone has been driven through person-to-person transmission in community settings, with a high number of HCW infections.HCW infections are less likely, because of rapid ascertainment through strict infection control and health monitoring, to lead to further transmission events.Currently, widely and increasingly used diagnostic detection strategies based on the NP gene have remained suitable for use.Molecular detection strategies based on the GP gene require close attention to ensure that SNPs occurring in this gene, perhaps as a result of host selective pressure, are evaluated for their impact on detection strategies.Viral sequences from any further cases of EVD in UK nationals or those imported into the UK will continue to be sequenced and analysed to ensure continued effectiveness of EVD diagnosis and monitoring of viral genome evolution.

Figure 2
Phylogenetic subtree of 233 near full-length Ebola virus genomes from the West African outbreak that started in 2014 The subtree is from a larger tree containing 258 sequences, which includes sequences from earlier outbreaks of Ebola virus disease.Ebola virus genomes from patients outside the United Kingdom (UK) were obtained from GenBank (n = 255).The tree was generated using a heuristic maximum likelihood algorithm (FastTree -version 2.1.8)and the HKY model of nucleotide substitution.The position of sequences from three patients repatriated to the UK from Sierra Leone are shown in red and labelled (UK 1-3).Sequences from patients in Guinea are shown in green, those from Mali are shown in blue and those from Liberia in purple.The remaining sequences (black) are from patients in Sierra Leone.

Figure 1
Figure 1Heatmaps showing nucleotide and amino acid variation between three Ebola virus isolates from three British healthcare workers infected in Sierra Leone, August 2014-March 2015

Table 1
Sample details from three British healthcare workers with Ebola virus disease infected in Sierra Leone, August 2014-March 2015 Case Medical centre worked at in Sierra Leone Date sampled Sample type Genome length (bp) Isolate name

Table 2
[1]la virus real-time PCR assay primers and probes designed by Trombley et al.[1]Single nucleotide polymorphisms in Ebola virus sequences from three patients from the United Kingdom infected in Sierra Leone are shown in bold. a