Improving surveillance of sexually transmitted infections using mandatory electronic clinical reporting : the genitourinary medicine clinic activity dataset , England , 2009 to 2013

A new electronic surveillance system for sexually transmitted infections (STIs) was introduced in England in 2009. The genitourinary medicine clinic activity dataset (GUMCAD) is a mandatory, disaggregated, pseudoanonymised data return submitted by all STI clinics across England. The dataset includes information on all STI diagnoses made and services provided alongside demographic characteristics for every patient attendance at a clinic. The new system enables the timely analysis and publication of routine STI data, detailed analyses of risk groups and longitudinal analyses of clinic attendees. The system offers flexibility so new codes can be introduced to help monitor outbreaks or unusual STI activity. From January 2009 to December 2013 inclusive, over twenty-five million records from a total of 6,668,648 patients of STI clinics have been submitted. This article describes the successful implementation of this new surveillance system and the types of epidemiological outputs and analyses that GUMCAD enables. The challenges faced are discussed and forthcoming developments in STI surveillance in England are described.


Introduction
Sexually transmitted infections (STIs) are a major public health concern in England.The Department of Health (DH) has committed to improving the sexual health and wellbeing of the whole population and 'to continue to work to reduce the rate of STIs using evidence-based preventative interventions and treatment initiatives' [1,2].Monitoring STIs and the impact of any public health initiatives requires high-quality and timely surveillance data to determine which specific population groups are at particular risk of STIs and how STI trends respond to interventions.Prior to 2009, STI surveillance in England depended on the collection of aggregated data on a paper-based form (known as the KC60 statistical return) from all genitourinary medicine (or STI) clinics in England.
Although the KC60 return provided reasonably robust data on STI trends, it was severely limited because: (i) it was not timely; (ii) it collected no information on patient area of residence, so that local area governments were unable to determine the extent and nature of sexual health problems in their residents; (iii) it collected only limited information on patient characteristics which are critical for identifying population groups at high risk ('core groups'); and (iv) as the KC60 return was aggregated it was not possible to link individual patient records for longitudinal studies or risk factor analyses.
The genitourinary medicine clinic activity dataset (GUMCAD) was developed to address these concerns and replace the KC60 return.This dataset is an electronic, pseudo-anonymised (i.e.contains the patient's gender, age and clinic/hospital number, but does not contain patient-identifiable information such as name, date of birth, or postcode of residence) patientlevel data return that contains information on all STI diagnoses made and services provided in STI clinics in England along with patient demographic information.We describe this new surveillance system, the approach taken to implement it, and discuss how barriers and difficulties were overcome.We also present some of the insightful epidemiological analyses which are now being used to guide STI prevention activities in England and speculate on GUMCAD's future role in second and third generation surveillance.

Methods
Development, approval and implementation of the genitourinary medicine clinic activity dataset Good planning, stakeholder engagement and adequate resources are crucial for the successful delivery of any major programme of work and were key to the successful implementation of this major new national surveillance system for STIs.The planning and approval processes for GUMCAD, funded by the DH, started in 2005 and involved the participation and agreement of a wide range of stakeholders led by the Health Protection Agency (HPA) (now part of Public Health England -PHE).Two key groups were established at the start: a steering group to advise on dataset items, coordinate approval processes, and monitor rollout; and an implementation group (overseen by the steering group) to coordinate software upgrades in clinics, resolve any technical issues that arose, and oversee the initial collections from all 205 STI clinics in England.The steering group included representation from PHE, DH, the British Association for Sexual Health and human immunodeficiency virus (HIV) (BASHH), which represents sexual health clinicians, and other key public health bodies, service commissioners (i.e.those who plan and pay for sexual health services and who therefore require high quality data to assess the sexual health needs of their local population) and academics.The implementation group comprised primarily PHE national and regionally-based information managers and clinic software providers.As GUMCAD is an electronic return, all patient management software providers (i.e.private software companies who are contracted to provide clinics' patient management systems) were involved at an early stage to identify technical and practical concerns, contribute to their As part of this process, the recording, collection, extraction and analysis of GUMCAD was piloted in ten STI clinics between March and September 2007.These ten clinics were distributed throughout England and used the three patient management software providers with the majority (> 90%) of the market share.After the pilot, clinic staff were invited to provide structured feedback on the practicalities, time burden and technical challenges of recording and reporting the required data, and to comment on their experiences.Questionnaires were returned by seven of the ten pilot sites, five of which had already been collecting all the proposed GUMCAD data items.The two remaining clinics collected all but one of the data items, but indicated that collecting the additional item (country of birth and sexual orientation, respectively) would pose a minimal impact (< 20 seconds per patient registration) on the time taken to record patients' details.Overall, the feedback received was positive and resulted in minor revisions to the proposed collection.It also demonstrated that, in addition to providing more detailed and timely surveillance data, the recording and reporting of an electronic data return used considerably less staff time (50-67%) and resource to extract and submit data than the existing paper-based system.This evidence enabled PHE to better advocate the value of GUMCAD

Dataset specification
The GUMCAD dataset consists of twelve variables all of which are mandatory and must be submitted by the clinic.To ensure inter-operability between different service providers and end-users regardless of the software or platforms used, all variables and codes specified in GUMCAD were developed in accordance with national standards defined by the NHS data model and dictionary [3].The following patient demographic data are collected: gender (a self-defined classification of the current sex of the patient), age, sexual orientation, ethnicity, country of birth and area of residence (Table 1); with the exception of ethnicity and area of residence, these variables are all included in the enhanced set of variables for European Union-wide STI surveillance [4].Residence information is collected at lower super output area (LSOA) level, a geographical area with a mean population of 1,620, which is derived at the clinic from the patient's address [5].Information on diagnoses and the services provided are coded using a combination of 68 available sexual health and HIV activity property type (SHHAPT) codes.Each record contains a local patient identifier number enabling patient records (within a given clinic) to be linked, enabling longitudinal analyses.

Ensuring submission compliance and data quality
It is an accepted truth that the generation of high-quality surveillance information relies on high-quality data.For GUMCAD, this was achieved by developing a rigorous system of data validation checks, data cleaning and quality assurance systems.Each clinic is required to generate and submit to PHE a quarterly data extract of all patient attendances and associated diagnoses within six weeks of the end of each calendar quarter.The dataset must be submitted to PHE in a standardised pre-defined format through a secure web-based interface.Data submissions undergo basic automated checks for errors in data format, coding and duplication and are accepted into the database only if they are more than 90% free of errors.Records with errors are automatically returned to the clinics for correction and resubmission.Data submissions undergo a further cleaning process before epidemiological analysis and publication which includes the generation of unique episodes of care.For example, an individual patient is permitted only one record of gonorrhoea in a six week period; repeat codes for gonorrhoea within this period are removed to prevent over-counting of diagnoses [6].
As was the case with KC60, no financial incentives are given to report surveillance data; however, each annual STI data publication includes a list of reporting sites with the proportion submitting all four quarters of data [7].Additionally, after substantial health system reform in 2013 [8], local government is required to contribute to national surveillance for public health and must ensure that all contracts with sexual health service providers include provision to collect and supply mandatory data including GUMCAD [9]; high quality, timely local STI data are vital for service planning.Finally, each STI clinic is sent a comprehensive automated feedback report which, in addition to providing demographic breakdowns, STI trends, rates of STI reinfection and HIV test uptake and coverage of their patients, provides comprehensive information on the quality, completion and timeliness of information submitted, and how this compares with national standards.

Patient confidentiality
All staff within PHE have a legal duty to keep patient information confidential.Information on STIs is considered particularly sensitive and the rights of the patient for confidentiality must be maintained at all times while balancing against the need to collect information for public health action.Although no patientidentifiable information such as name, date of birth or postcode were specified in GUMCAD, the inclusion of pseudo-anonymised data (i.e. the patient's clinic ID number) meant the data were considered highly sensitive.Guidelines for publishing and sharing the data were developed to ensure that the risk of deductive disclosure of individuals due to small cell sizes would be negligible [10].

Reporting timeliness
Compliance with the required data submission deadline was relatively poor in 2009, but has improved considerably: as of 31 December 2013, 85% (177/208) of clinics reported within eight weeks of the end of the calendar quarter (Figure 1).

National and local reports
The aim of any infectious disease surveillance system is to provide 'information for action'.One of the key objectives of GUMCAD was to enable the production of timely outputs and analyses to inform national and local STI service planning, needs assessments and the development of tailored prevention initiatives.
Responding to evolving priorities and needs has been an ongoing process, which has been resource-intensive and, at times, technically challenging.
PHE publishes national-level STI tables (the official STI statistics) annually on the PHE website [7] and has also developed sexual and reproductive health profiles, a publicly available interactive tool using the Fingertips webtool, that present sexual health data at different geographical levels [11].More detailed data are

Understanding geographical inequalities using spatial mapping
By collecting information on patient area of residence and socio-demographic characteristics, GUMCAD allows detailed geographical mapping of the burden of STIs and testing and treatment services, which can help local government assess and plan improvements to service provision (Figures 2-3).Such geographical comparisons can be made using the sexual and reproductive health profiles described above [11].More in-depth exploratory analyses of the inequalities associated with STIs are also possible by combining GUMCAD data with other data sources such as the index of multiple deprivation (IMD, which is a measure of area level deprivation in England) [12,13] (Figure 4) and a Bayesian spatial modelling approach has been used to identify local sexual network effects associated with gonorrhoea in London [14].
Improving knowledge on risk groups and emerging infections GUMCAD provides comprehensive data on patient age, gender, sexual orientation, ethnic group and country of birth which facilitates assessments of the burden of sexual ill-health in high risk, often vulnerable populations.These data have shown that men who have sex with men (MSM), young people and certain black ethnic minorities experience particularly high rates of STIs in England [15,16].The collection of data on single year of age rather than age-group has helped provide evidence that the decline in diagnoses of genital warts seen in women aged 15 to 19 years between 2009 and 2013 may partly be as a result of a protective effect of human papillomavirus (HPV)-16/18 vaccination against genital warts [17,18].Furthermore, GUMCAD data have been used to develop exceedance algorithms and a spatio-temporal model to detect outbreaks of STIs in local areas [19].
Crucially, the flexibility of the coding system enables new codes to be introduced in response to need and to help monitor outbreaks or unusual STI activity.For example, codes for sex workers and prisoners were introduced in 2011 allowing routine national surveillance of STIs in these particularly vulnerable populations for the first time [20].During the London Olympics in 2012, temporary codes were introduced to STI clinics in London and Weymouth to record Olympics-related attendances and thereby assess the impact of the games on sexual health services [21].PHE has recently received approval for codes for Shigella spp.infection, which has become endemic among MSM in England [22], as well as a suite of dummy codes for release upon recognition of emerging public health concerns.

Longitudinal analyses
An important advantage of electronic patient-level data is that data in this form facilitate record linkage and longitudinal analyses, such as Cox proportional hazards modelling to determine risk factors for STI/ HIV co-infections and repeat infections.This can be used to develop targeted clinic-based interventions by determining the characteristics of those at particularly high risk of STIs or HIV and how this changes over time.Thus far, GUMCAD has been used to estimate risk factors associated with HIV incidence, STI acquisition among those who are HIV-positive, and repeat infection with gonorrhoea [6,23,24].

Discussion
The introduction of GUMCAD has ensured that England now has a timely, comprehensive and sophisticated STI surveillance system which compares favourably with STI surveillance systems in other western industrialised countries [25,26].It is particularly noteworthy because of the large population it covers and the level of detail collected, a major accomplishment given that there were ca 450,000 diagnoses of STIs in 2013 in England [16].Its introduction was facilitated by having an established network of open-access (i.e.anyone can attend without a referral), publically-funded STI clinics in England.These services dominate STI and HIV healthcare in England and are linked to an influential clinical professional body (BASHH) with a strong public health focus.
There are, of course, limitations to GUMCAD.
Longitudinal patient data are only available within a particular clinic or service -attendances by the same patient at different clinics cannot be monitored.While one of the strengths of GUMCAD is that it is a mandatory surveillance system, all proposed changes need to be piloted then go through a formal approval process by SCCI.The volume of records held in GUMCAD also leads to technical challenges for data storage, manipulation and analysis.Recruiting and retaining staff with the required technical and scientific expertise is vital for maximising GUMCAD's potential.The implementation of a new surveillance system is often complex and challenging and GUMCAD was no exception.A key lesson is that planning for a new surveillance system should start many years before the anticipated start date.A long lead-in time is required to ensure engagement and awareness is widespread among stakeholders and data providers, and that there is adequate time for software development, piloting, feedback and resolution of technical and other issues.The GUMCAD steering and implementation groups spent many months ensuring that the relevant professional bodies and clinicians were fully engaged with and supportive of the proposal.Regular newsletters providing updates on progress were sent to STI clinics and other interested parties, and these remain a useful tool for disseminating important updates and providing feedback.Furthermore, clinic-specific data quality and epidemiology reports enabled clinics to easily identify and resolve persistent data quality issues, and were particularly well received by consultant clinicians.These reports have been one of the key levers in ensuring GUMCAD's success.However, improvements in data quality also led to issues with data continuity that had not been anticipated.Unlike the aggregate KC60 return, GUMCAD enabled errors in data coding to be identified and corrected.This raised concerns about the interpretation of long-term time trends, as following removal of duplicate records the number of STI diagnoses and services reported reduced on average by ca 3%.To enable fair time-trend analyses over the transition between these two surveillance systems, numbers of diagnoses reported through KC60-based surveillance in years before 2009 had to be statistically back-adjusted using an algorithm based on the percentage difference in diagnoses reported through GUMCAD and KC60 during parallel running in 2008 and 2009.This had not been anticipated and resulted in a significant delay in the annual publication of official STI statistics in 2009.
While the vast majority of sexually transmitted infections in England are diagnosed either at a GUM clinic or are referred to a GUM clinic from general practice, there are a growing number of other services that offer STI testing, diagnosis and treatment [27,28].These include specific young people's clinics or other sexual health and reproductive services which primarily provide contraception services and STI testing [28].Since 2012, GUMCAD has been rolled out to these services and data collection is underway (the new system with the inclusion of these additional data is known as GUMCADv2).These data are currently, as of August 2014, being checked and validated, and will be published for the first time in 2015.
GUMCAD is a huge advance on its predecessor paperbased system but all surveillance systems should evolve and adapt to changing technical, political, epidemiological and microbiological developments.GUMCADv2 has already broadened system coverage.Planning for the next version of GUMCAD (GUMCADv3), which aims to capture information on sexual risk behaviours, drug and alcohol use, and partner notification outcomes, is already underway and includes a pilot in nine STI clinics across England [29].In the future, GUMCADv3 will be linked to other healthcare datasets to enable greater understanding of care pathways (the 'patient journey') and identification of missed intervention opportunities, and with microbiological datasets (including whole genomic sequencing data on STIs) to investigate the behavioural and contextual factors which are associated with poor sexual health outcomes, and the sexual network effects associated with rapid STI and resistance spread.Indeed, GUMCAD has already been linked with molecular typing data from the Gonococcal Resistance to Antimicrobials Surveillance Programme in England and Wales, an annual patient survey at sentinel STI clinics monitoring trends in gonococcal resistance, to demonstrate rapid clonal spread of strains with reduced sensitivity to cephalosporins in dense sexual networks [30].GUMCADv3 will also facilitate linkage with one-off quantitative or qualitative clinicbased surveys.In England, information on reproductive health and contraceptive services is also collected from sexual health service providers.A move towards harmonisation of this dataset with GUMCAD would be welcomed by commissioners and service providers, and is now a priority of PHE.
There is considerable inequality in the distribution of STIs across the population.Prevention efforts, such as improved health promotion, better sexual health education, greater STI screening coverage and easier access to sexual health services, are vital for controlling infection transmission.Underpinning all these efforts is the need to have good quality and timely surveillance data showing the groups most at risk of infection to better target prevention activities and to monitor their effectiveness.The successful introduction of GUMCAD has been an important step towards better STI control in England.

Figure 1
Figure 1 Proportion of sexually transmitted infection clinics a submitting data to the genitourinary medicine clinic activity dataset (GUMCAD) within 6 and 8 weeks of the end of each calendar quarter, England, 2009-2013 b

Figure 2
Figure 2Diagnosis rates a of selected sexually transmitted infections by lower-tier local authority of patient residence, England, 2013 available through a restricted-access web-portal which allows local public health professionals involved in service planning and commissioning to view and download their local quarterly STI data aggregated by risk group, time and place from two weeks after the submission deadline.Since 2011, PHE has also produced confidential detailed epidemiological reports for each of the 208 STI clinics and 326 local government authorities using their local data.They include numbers and population based-rates of new STI diagnoses by risk group, LSOA of residence and over time, as well as repeat infection rates and HIV testing uptake and coverage as markers of intervention effectiveness.These epidemiological reports facilitate robust assessment of local service needs and priorities for targeted prevention.They have been programmed in statistical software (Stata v13.0,StataCorp LP, College Station, Texas, US) to enable rapid production and dissemination.

Figure 4
Figure 4Diagnosis rate of selected sexually transmitted infections by quintiles of the index of multiple deprivation (IMD), England, 2013

Table 1
Data items collected in the genitourinary medicine clinic activity dataset (GUMCAD), England

Table 2
Percentage of the genitourinary medicine clinic activity dataset individual attendance records with 'known' information for selected variables, England, 2008-2013 LSOA is a geographical area with a mean population of 1,620 and is derived from the patient's address.
a Number of genitourinary medicine clinic activity dataset individual attendance records.b