A validation of the use of names to screen for risk of chronic hepatitis B in Victoria , Australia , 2001 to 2010

J H MacLachlan (jennifer.maclachlan@mh.org.au)1,2, Y J Wang3, B C Cowie1,3,4 1. World Health Organisation Regional Reference Laboratory for Hepatitis B, Victorian Infectious Diseases Reference Laboratory, Victoria, Australia 2. Department of Medicine, University of Melbourne, Victoria, Australia 3. Burnet Institute, Victoria, Australia 4. Victorian Infectious Diseases Service, Royal Melbourne Hospital, Victoria, Australia


Introduction
The global prevalence of chronic hepatitis B (CHB) has been estimated at 350 million people, with the greatest burden amongst those living in Asia and the Pacific [1].Despite comprising only around 5% of the Australian population [2], migrants from the Asia-Pacific region represent nearly 40% of estimated 218,000 Australians living with CHB [3,4].Without treatment, 15-40% will suffer serious complications of liver disease [5] including primary liver cancer, the fastest increasing cause of cancer mortality in Australia [6].
Early detection of CHB enables positive interventions for individual patients and the wider community.Effective antiviral treatment can significantly reduce the complications of chronic infection such as cirrhosis and hepatocellular carcinoma (HCC) [6,7], and diagnosis in an individual facilitates screening and vaccination of susceptible contacts.Such screening and treatment initiatives have been demonstrated to be cost-effective [8,9].Clinical guidelines recommend routine screening for those born in intermediate (2-8%) and high (>8%) CHB prevalence countries [5,10], however undiagnosed CHB is common [11,12].It has recently been estimated that 45% of those living with CHB in Australia have not been diagnosed [3].
The need for improved identification and management of CHB in the primary healthcare setting is well recognised [13], however, the lack of information on country of birth in patient records in Australian general practices is a substantial barrier to systematic identification of those most at risk [14].Improving recording of country of birth is important, and educating clinicians in which patients should be routinely offered testing is a key consideration [15].However, it is likely that further decision support and prompting regarding testing for hepatitis B virus (HBV) is necessary, given the existing large number of undiagnosed Australians with CHB despite these recommendations and education programmes that have been in place for years.
The linking of country of birth to patient name is a method that has previously been used for determination of ethnicity using a list of names derived from census records, and has been shown to be predictive of country of birth for both Hispanic [16] and Asian American [17][18][19] patients.In addition, clinical indicators (such as body mass index) were similar between self-identified and name list-identified groups, suggesting that this method identified a representative sample of the population within each ethnicity [19].
In addition, software tools such as Nam Pehchan and OnoMAP have been used in the United Kingdom to assign ethnicity based on names for notified cases of CHB [20] and cancer registry records [21].However, there has been no examination of this approach in the Australian context, nor, to our knowledge, has there been an assessment of the utility of a name list as a screening tool for risk of having a chronic communicable disease associated with birth country.This analysis aimed to determine whether a considerable proportion of people attending general practices would be effectively identified using a name list as a screening tool, by examining the sensitivity and specificity of the name list when compared with notifications.More broadly, this study aimed to bridge the gap in evidence and evaluate the utility of the name list in identifying risk of CHB associated with country of birth, to promote and support systematic diagnosis in general practice.

Asian-Pacific name list
The list of names used in this analysis was derived in the United States from Social Security and Medicare administration records containing country of birth information on a pool of over 400 million applicants, with the original aim of creating a method for classification of ethnicity within the broad group of Asian Americans (the full process of which is described in reference [17]).
Individual name lists were derived for the six major Asian ethnic groups in the United States: Chinese (including those from Hong Kong and Taiwan), Japanese, Filipino, Korean (North and South), Indian and Vietnamese.Names were chosen for inclusion on the basis of their predictive quality, based on association with the specific country of birth and frequency, i.e. names were only included if the majority of people of a certain name were associated with a given origin, and names occurring less than five times were excluded.
The full list contains a total of 20,693 unique names, each associated with a specific country of birth but aggregated for the purpose of this analysis as predictive of birth in any of the countries included.

Data matching: surveillance notifications
Infection with HBV is notifiable to the Department of Health in Victoria, with notification including limited patient demographic and disease information required of both the diagnosing clinician and laboratory within five days [22].Notifications are reported as either newly acquired (acute infection) or unspecified (chronic infection) according to patient history and serological evidence.The case definition for hepatitis B notification in Australia requires detection of hepatitis B surface antigen (HBsAg) or hepatitis B DNA, with acute infections differentiated by the presence of high levels of IgM to hepatitis B core antigen (anti-HBc) and/or demonstrated absence of prior infection [23].
Notification data for CHB were extracted from the Victorian Notifiable Infectious Disease Surveillance (NIDS) database by staff from the Communicable Diseases Prevention and Control Unit, Department of Health, Victoria, and compared with the name list to produce a de-identified dataset with a variable indicating a name match.Notified cases of salmonellosis were subjected to the same matching procedure with the name list and used as a control group.Salmonellosis is an acute gastroenteritis with a short incubation period and no particular association with country of birth [24].
Records were assessed for completeness in reporting, and basic demographic characteristics (median age, sex ratio, proportion born overseas) analysed for both diseases.
To determine the effectiveness of the list for the identification of persons at risk of CHB, we tested the name list as a screening tool (see Table 1).Notification data were used as the source of diagnosed cases, with notifications for CHB representing true positives, and notified cases of salmonellosis representing true negatives (analogous to a gold standard diagnostic test); the presence of a name matching the supplied list in a given case was considered a positive result in the screening test.This construct was used to calculate sensitivity and specificity of the name list when using an algorithm that matched both given name and surname ('match both') and one that matched either given name or surname ('match either').
As the positive and negative predictive values (PPV and NPV) of a screening tool are dependent on the prevalence of disease in the target population, and the sample of notifications used here included roughly equal numbers of hepatitis B and salmonellosis notifications, PPV and NPV needed to be adjusted to the prevalence of hepatitis B that would be expected in the screened population.CHB prevalence has been demonstrated in a recent serosurvey to vary considerably by geographic region, largely related to the proportion of residents who were born in endemic areas [25].Estimates of 1.5%, 3% and 6% prevalence were selected to reflect the expected number of people living with CHB attending where Se is sensitivity, Sp is specificity, and Pr is prevalence.

Statistical analysis
Sensitivity was estimated according to the 'match either' algorithm by sex, age group and across the time period of notifications used.For those notifications where country of birth information was available, sensitivity measures were calculated individually for each of the six name list countries, as well as for the 10 most commonly identified countries of birth that were not originally used to develop the name list.
The chi-square test was used to test the significance of differences in sensitivity according to notification type (CHB compared to salmonellosis), sex, age group, year of notification, as well as differences in country of birth reporting according to sex and presence of name list match.The Wilcoxon rank sum test was used to evaluate differences in age distribution between groups.Exact binomial 95% confidence intervals (CI) were calculated around screening test measures.Data were handled and graphically presented using Microsoft Excel, with statistical analyses conducted using Stata 11.

Ethics
Ethical approval for this research was granted by the Royal Australian College of General Practitioners' National Research and Evaluation Ethics Committee as a component of a broader study (NREEC 10 -011).

Results
Between The sensitivity of the name list varied substantially depending on the type of match assessed (Table 2).While around 60% of those with a notification for CHB had either a given name or a surname matching the name list, just under one third had both names matching the list.In contrast, less than 15% and 2% of salmonellosis notifications matched one name and both names, respectively (p<0.001 for both comparisons).
Specificity was correspondingly higher for matching both names instead of either name; in those with salmonellosis notification (i.e.not at increased risk of CHB), only 1.8% of persons were identified as being at risk based on this name list (specificity of 98.2%).This proportion, a measure of false positives, increased to nearly 15% when matching either given name or surname (specificity decreased to 86.4%).
The differing sensitivity and specificity values for the two types of match are reflected in the positive and negative predictive values, with a patient who matched both names much more likely to have a diagnosis of CHB than one who matched either name (PPV in a high prevalence population 51.8% for both names, compared with 22.3% for either name).The inverse was true for the NPV, which increased with improving sensitivity; however, the difference was much less marked, with the proportion of non-matches who were not CHB (true negatives) only increasing from 95.7 to 97.2% when matching either name instead of both (Table 2).
As expected, PPV was heavily impacted by reduced CHB prevalence.Using an average CHB prevalence (1.5%), PPV was calculated to be 6.39% for matching either name and 20.4% for matching both, while using moderate prevalence resulted in PPV values of 12.2% (match either) and 34.2% (match both).As expected, PPV was greatest in high prevalence areas, at 22.3% (match either) and 51.8% (match both).Prevalence had little impact on NPV, with all estimates above 95% regardless of CHB prevalence or match type (see Table 2).
Although demonstrating no trend over time (data not shown), the sensitivity of the name list differed substantially by age group, being only 33.6% in those younger than 10 years compared with 61.0% in those aged 10 years or older (p<0.001,Figure).Sensitivity was also slightly higher in women (62.3% compared with 59.5% in men, p<0.001).
The sensitivity of the name list varied substantially by country of birth.The vast majority (over 96%) of those with CHB born in China and Vietnam were identified as matching either name on the list, and these two countries alone made up around three quarters of total CHB notifications with a name list match.Sensitivity was moderate for those born in Korea (76.0%),India (60.4%) and the Philippines (48.5%), as well as Asia-Pacific countries not originally used to derive the name list, such as Malaysia, Singapore, East Timor and Laos (Table 3).Analysis of factors potentially associated with the country of birth not being reported in the notification dataset found no difference according to sex; however those with no country of birth recorded were on average older, and more likely to have a name matching the list than those with a country of birth recorded (sensitivity of 39.5% compared with 32.7%, p<0.001,Table 4).

Discussion
When assessed as a screening tool, the name list evaluated here detected the majority (61%) of the The use of surveillance data in this analysis provided a very large sample of over 33,000 notifications, resulting in narrow confidence intervals around sensitivity and specificity estimates as well as substantial power to detect differences over time and between subgroups.
It is difficult to ascertain what proportion of diagnosed cases are notified to health authorities, or if this varies according to disease, clinic, or patient demographics, but reporting is a legal requirement and compliance is thought to be reasonably high, particularly for laboratories [27].However, relying on passive surveillance data limited control over quality and completeness, demonstrated by the high proportion of notifications with no country of birth reported, limiting sub-analysis by country.The finding that records without a specified country of birth were more likely to match the name list may also indicate a bias in non-reporting towards those from Asia-Pacific countries.
This validation study also necessarily limited evaluation of the screening value of the name list to those who have already been identified and diagnosed, and therefore the results may not be generalisable to the undiagnosed population that the screening test would be applied to.The evidence of differences in name list sensitivity according to demographic factors such as age and specific country of birth indicate that the effectiveness of the name list screening tool is dependent on the characteristics of the population.
In addition, other analyses of notifications data in Victoria have shown that a notable proportion of diagnoses arise from targeted testing programmes (such as antenatal and humanitarian migrant screening), which may be associated with birth in countries on the name list and affect resultant sensitivity estimates.The effect of targeted screening programmes on the representativeness of notification data has been observed previously, with women aged 20 to 29 years and migrants from countries such as Sudan and Burma (Myanmar) making up a greater than expected proportion of CHB notifications in Victoria due to, respectively, antenatal and humanitarian entrant screening [4,28].As the migrants from countries on the name list currently entering the country are not usually part of humanitarian migration streams, they are under-represented in notifications and therefore this may have underestimated the sensitivity of the name list in detecting the population with CHB.The higher sensitivity of the name list for those born in China and Vietnam, observed here and in other studies [17,19], is particularly valuable, as people born in these two countries combined represent more than a third of people living with CHB in Australia [13,28] and a substantial proportion of migrants with CHB in other settings [29].
The screening test construct used here is also limited by the categorisation of salmonellosis notifications as those without disease, when in fact these people may have undiagnosed CHB.However, given that the majority of notifications for salmonellosis occurred among those born in Australia and among young children, the prevalence of undiagnosed CHB in this population is likely to be considerably lower than in the general population (e.g. less than 1%) [3] and would therefore not have a had a substantial effect on estimates of sensitivity and specificity.
Much of the difference in name list sensitivity according to age can be explained by the differing migration and demographic patterns according to country of birth.Those born in Asian countries made up a much smaller proportion of notifications in those aged 0 to 9 years compared with other regions of birth.In addition, migrants from the Middle East/North Africa and Sub-Saharan Africa regions are more likely to be younger than 15 years than those from Asian countries [26].The name list may also be identifying younger people with Asian names who were born in Australia and whose risk of CHB may be lower, particularly since the implementation of universal infant vaccination.This analysis is the first to investigate the validity of the name list to identify CHB cases in an Australian setting.The name list may have the potential for application in other countries where migrants born in Asia experience a disproportionate burden of CHB, and this validation process could be carried out in jurisdictions with similar communicable disease surveillance systems.These results support the application of this name list predictive tool in general practice management software to trigger testing, an initiative that is currently being piloted in practices in Melbourne, Victoria situated in areas identified with a high prevalence of CHB [25].The higher PPV of the name list as a screening tool when applied in higher prevalence populations suggests that practices serving communities with a higher burden of CHB would be optimal sites for implementing this approach.
The use of computer programmes to trigger testing in primary care based on patient characteristics has been shown to be effective in various contexts [30,31] and may be particularly effective in this case, given the need expressed by Australian clinicians for improved knowledge about HBV, particularly regarding whom to test [32,33].The results of this pilot project will allow assessment of the practical utility of the name list as a screening tool, and estimation of the sensitivity of the list in a previously untested clinical cohort, as opposed to a surveillance registry of people known to be infected.Post-implementation assessment could also provide the opportunity for improvement such as variation of match algorithm to balance sensitivity and specificity, or expansion to other countries.
The implementation of this screening in primary care settings with high CHB prevalence could help to improve access to preventive care, which is particularly imperative given the generally lower uptake of these programmes (for example cancer screening) among Australia's migrant populations [34,35].Supporting improved delivery of primary care-based opportunistic testing also minimises the potential for stigmatisation of minority groups that could result from broader public campaigns highlighting their increased risk of a chronic infectious disease.
Despite differences in the migration patterns between the United States (where the name list was developed) and Australia, the six ethnicities represented in the list make up 65% of the total Asian-born population of Victoria [2], and estimates put the total burden of chronic hepatitis B in migrants from the name list countries at 60,000 to 100,000 people nationally [13,28].However, there is still potential for inclusion of a more complete selection of countries that people living with CHB in Australia have migrated from, such as Thailand, Fiji and Indonesia [4,13].This novel screening concept could also be applied to other diseases (communicable and non-communicable) that are associated with country of birth or ethnicity, possibly involving the development of name lists for other regions.
Systematically increasing diagnostic testing through the application of any screening process, including this name list, must consider the cost-effectiveness of doing so.There has been increasing evidence that screening and appropriate treatment for CHB is costeffective; a recent study from the United States [36]  found that routine screening for CHB may be costeffective down to prevalence levels as low as 0.3%.This is lower than the average Australian prevalence of 1.02% [3], and substantially lower than the prevalence of CHB in several parts of Melbourne [25].
In conclusion, the name list evaluated here shows potential as a screening tool to trigger testing of at-risk patients for HBV in primary care situations, being associated with CHB notifications and identifying a considerable proportion of those diagnosed.Although the links between name and country of birth, and country of birth and disease risk have been individually established, this analysis bridges the gap by clarifying the direct association between name and disease, a finding that may have relevance for public health screening initiatives in the future.

Table 1
Screening test construct for name list analysis a clinical practice within a general, moderate and high prevalence area, respectively.Adjusted PPV and NPV values were calculated using the following formulae: ], those born overseas made up the vast majority (91%) of CHB notifications.Completeness of this information was similar for salmonellosis and CHB, with greater than 99% of notifications reporting sex and age of cases, but less than one in five recording country of birth.
[26: 27-46 years) for HBV, p<0.001) and less likely to be male (47.7% of salmonellosis cases were male, compared with 54.2% of CHB cases, p<0.001).The majority of notifications for salmonellosis (83%) were patients born in Australia; however despite representing only around a quarter of the total Victorian population[26

Table 2
Sensitivity, specificity, and positive and negative predictive values based on estimated prevalence of chronic hepatitis B, by type of match,Victoria, 2001Victoria,  -2010 (n=32,303)    (n=32,303) 17,438 notified cases of CHB in Victoria, Australia between 2001 and 2010.Sensitivity and PPV were highest when either name was matched, at the cost of reduced specificity and NPV.Predictive values varied according to the estimated prevalence of CHB among primary care practices.Women and those in older age groups with CHB were more likely to match the name list, and the list was most sensitive for those born in Vietnam and China, with moderate sensitivity for other name list and a number of non-name list countries.

Table 3
Proportion of chronic hepatitis B notifications matching name list (either surname or given name), by country of birth, Victoria, 2001-2010 (n=2,167) For the purposes of the name list derivation, Chinese names included those born in the People's Republic of China, Taiwan and Hong Kong, and Korean names included both the Republic (South) and Democratic People's Republic (North).b Suppressed for people born in Japan due to low numbers and excluded from name list total. a

Table 4
Country of birth recording of chronic hepatitis B and salmonellosis cases according to demographic factors and presence of name list match (for all notifications),Victoria,  2001Victoria,   -2010 (n=32,303)    (n=32,303)