Identification of risk factors associated with national transmission and late presentation of HIV-1, Denmark, 2009 to 2017

Background: of prophylaxis (PrEP), the of in the Reduction of new HIV-1 infections requires more knowledge about the profiles of high-risk transmitters and late presenters (LP). Aim: We aimed to investigate risk factors associated with HIV-1 transmission clusters and late presentation with HIV-1 in Denmark. Methods: Blood samples and epidemiological information were collected from newly diagnosed HIV-1 patients between 2009 and 2017. We genotyped pol genes and performed phylogenetic analyses to identify clusters. Risk factors for clustering and LP were investigated with partial proportional odds and logistic regression. Covariates included transmission mode, HIV-1 subtype, age, origin and cluster activity. Results: We included 1,040 individuals in the analysis, 59.6% identified with subtype B and 48.4% in a cluster. Risk factors for clustering included Danish origin (odds ratio (OR): 2.95; 95% confidence interval (CI): 2.21–3.96), non-LP (OR: 1.44; 95% CI: 1.12–1.86), and men who have sex with men (MSM). Increasing age and non-B subtype infection decreased risk (OR: 0.69; 95% CI: 0.50–0.94). Risk for late presentation was lower for active clusters (OR: 0.60; 95% CI: 0.44–0.82) and Danish origin (OR: 0.43; 95% CI: 0.27–0.67). Non-Danish MSM had a lower risk than non-Danish heterosexuals (OR: 0.34; 95% CI: 0.21–0.55). Conclusion: HIV-1 transmission in Denmark is driven by early diagnosed, young, subtype B infected MSM. These may benefit most from PrEP. Non-Danish heterosexual HIV-1 patients could benefit from improved communication to achieve earlier diagnosis and treatment.


Introduction
The use of highly active antiretroviral therapy (HAART) has substantially increased the survival of people infected with HIV-1 [1,2] and has also been used to prevent transmission between partners [3,4] and from mother to child [5]. Pre-exposure prophylaxis (PrEP) has been shown to be both effective in reducing the number of new HIV-1 infections among people at risk [6] as well as being cost-effective [7]. Despite the benefits of PrEP, the incidence rates of HIV-1 in the World Health Organization (WHO) European Region have remained unaltered at 8.3-8.4 new HIV cases diagnosed per 100,000 inhabitants per year between 2008 and 2017 [8], primarily driven by steady incidence rates in non-European Union/European Economic Area countries. In Denmark, the rate has decreased slightly from 5.2 to 4.3 per 100,000 inhabitants per year during the same time period. Recent phylogenetic investigations have provided new insights into the dynamics and drivers of local transmissions [9][10][11][12], including that the Danish national transmissions are still mainly caused by HIV-1 subtype B [11]. In order to reduce the number of new HIV-1 infections, more knowledge is needed about which risk groups are causing this persistent transmission.
Patients who present late with HIV-1, the so-called late presenters (LP) are defined as patients with CD4 + T-cell count below 350 cells/µL or the presence of an acquired immunodeficiency syndrome (AIDS)defining illness upon HIV diagnosis [13]. Their prevalence has previously been reported to vary between 38.3% to 49.8% in different European countries [14] and is currently at 49% in the WHO European Region [8]. In Denmark, the prevalence of LP is ca 50.5% and is higher among heterosexual HIV-1 patients (67.3%) compared with homosexual HIV-1 patients (34.9%) [8].
Despite their high prevalence, LP in Denmark were not found to contribute substantially to the ongoing transmission of HIV-1 in Europe [15]. However, late presentation with HIV-1 is a missed opportunity for early therapy initiation which is associated with increased risk of non-infectious multi-morbidities [16] and mortality [17].
In order to devise effective public health strategies aimed at reducing the ongoing transmission of HIV-1 and the high LP prevalence, knowledge about risk profiles and characteristics of national HIV-1 transmitters and LP is crucial. Furthermore, identifying these risk factors will inform targeted promotion and application of PrEP, behavioural or other intervention strategies to those with a high likelihood of transmission and those who benefit from early therapy. Similarly, better identification of vulnerable groups at risk for late presentation with HIV-1 would allow for improved targeting of HIV-1 screening strategies.
The aim of this study was to identify HIV-1 transmission clusters in Denmark between 2009 and 2017 by phylogenetic analysis; and to investigate whether origin (here used as born in Denmark or not), age, transmission mode, presentation status, and/or HIV-1 subtype are risk factors associated with being in a national cluster and whether these risk factors have changed during the study period. Correlates of late presentation of HIV-1 with any of these risk factors were also investigated.

Study population and characteristics
Blood samples from 1,225 newly diagnosed HIV-1 patients between 2009 and 2017, along with clinical and epidemiological information, were sent to Statens Serum Institut in Copenhagen, Denmark from infectious disease and HIV treatment centres as part of the long-running HIV-1 surveillance SERO project [18].
The SERO project forms the Danish sentinel framework for surveillance of transmitted HIV-1 resistance and molecular epidemiology [19,20]. Participation is  voluntary and can include an analysis of the first sample within a year from diagnosis from newly diagnosed HIV-1 patients with no prior history of antiviral therapy.
Genotypic characterisation of the pol gene (protease and reverse transcriptase) was performed, as described previously [21]. Inclusion criteria were HIV-1 positive patients with a serum sample obtained no later than 6 months after the first positive HIV test conducted in Denmark and no previous history of HAART. Presentation status was assigned to patients in accordance with the consensus definition [13]: patients with a CD4 + T-cell count below 350 cells/µL or with an AIDSdefining illness, regardless of CD4 + T-cell count, were classified as LP; all others were designated as non-late presenters (NLP). In a sub-analysis, and in addition to the above groups, we defined patients with a CD4 + T-cell count below 200 cells/µL as very late presenters (VLP). Exclusion criteria were patients with unknown origin, transmission mode or infection status. Furthermore, we focused on sexual transmission and thus excluded people who inject drugs (PWID), transmission through blood transfusion or other transmission of non-sexual nature. When several possible transmission modes were stated, we assigned them to one category in the following overriding order: men who have sex with men (MSM) over PWID over blood transfusion over heterosexual HIV-1 patients (HSX). Whenever bisexual contact was mentioned as transmission mode, we considered this to be MSM. There were 17 patients with more than one transmission route, all HSX, and of these nine were excluded from the study. Patients registered as both male and female were assumed to be of the male sex for the purpose of the statistical analysis; no gender neutral patients were included in the study. Figure 1 presents the selection of the study population.

Data analysis Transmission cluster identification
The pol sequences from the commonly observed subtypes A (n = 100), B (n = 713), C (n = 83) and circulating recombinant forms (CRF)01 (n = 124) and CRF02 (n = 84) were aligned separately in Mafft version 6.0 [22] and phylogenetic analysis was performed using maximum likelihood with the general time reversible model with 100 bootstrap replicates in Mega 6.0 [22]. Less common subtypes and CRF (n = 121) were aligned together. Clusters were identified with Cluster Picker [23] using the initial and main support threshold of 0.9 and a genetic distance of 4.5. Transmission clusters were classified as active if they contained a patient sampled between 2015 and 2017; all other clusters were considered inactive in terms of HIV-1 transmission. For analysis of cluster size, we differentiated between no cluster, two patients in a cluster and clusters with three or more. Other groupings were tested but did not add relevant information (these are not shown here and included no cluster and clusters of two, three to four, five to six and seven or more; two, three to four and five or more; two, three and four or more; as well as two, three, four to five, six to 10 and 11 or more).

Genetic distance
To test dependence of our results on the genetic distance used in the clustering algorithm in Cluster Picker analysis, we also tested genetic distances of 1, 1.5, 3.0 and 4.5 using only subtype B sequences. Lowering the genetic distance used for clustering groups, means that sequences with a smaller genetic distance will still be grouped and these might indicate transmissions closer in time. Using various genetic distances may thus give an idea about the clusters with higher transmission rates.

Statistical analysis
The analyses consisted of two parts. In the first analysis we focussed on risk factors for being part of a cluster. In the second analysis we focused on risk factors for LP status.

Risk factors for being part of a cluster
To identify risk factors for being part of a cluster, the associations between cluster size, as identified by the phylogenetic analysis, and potential risk factors were investigated using ordinal logistic regression. More specifically, we fitted partial proportional odds models with a logit link and transmission mode as nominal effect.

Risk factors for late presentation status
Risk factors for LP status were analysed using logistic regression. Transmission mode, HIV-1 subtype, age at the time of sample collection, origin and cluster activity were included as covariates. The latter variable distinguished between no cluster, clusters with recent transmission that contained a sample from the period 2015 to 2017 and older clusters which contained no sample from 2015 to 2017. Since patients who are part of a cluster are more likely to have shared characteristics, it is possible that the risk estimates are biased. To test for this bias, we also performed a subset analysis using only patients within clusters. In this subset we tested for the same associations using a multi-level logistic regression model with cluster id as the random effect and compared the results to a naïve logistic regression model without random effects. Interclass correlation coefficients were calculated.
For both analyses we also investigated the potential role of interactions between covariates. Statistical significance of covariates and interactions were identified by the 95% confidence intervals (CI) for the odds ratio (OR) along with p values. Best fitting models were based on AIC-values and associated chi-squared model comparison tests in the case of nested models as well as the statistical significance of included coefficients. All analyses were performed in R version 3.3.3 (R Foundation, Vienna, Austria).

Ethical statement
According to the Danish Act on Research Ethics Review of Health Research Projects, this study does not require approval by an ethics committee as it does not cause increased health risk or discomfort to patients. This was confirmed by the Committee on Health Research Ethics for the Region of Southern Denmark in a specific waiver of approval (VF20020258). Data were collected, stored and analysed as approved by the Danish data protection agency (J.nr. 2015-57-0102).

Study population and descriptive statistics
The study population consisted of 1,225 patients, of which 1,040 (85%) were eligible for analysis given completeness of the associated epidemiological information (Table 1 and Figure 1).

Risk factors associated with patients in national transmission clusters
The odds of being in any size category of cluster were higher for patients of Danish origin compared with those of non-Danish origin (OR: 2.95; 95% CI: 2.21-3.96; p < 0.00), Figure 2. Non-late presenters had higher odds of being part of a cluster than LP  Figure S1.

Risk factors associated with late presentation status
Similar risk factors were investigated for late presentation status. Patients in active clusters with recent transmission events showed a considerably lower risk of being LP than those not part of a cluster (OR: 0.60; 95% CI: 0.44-0.82; p < 0.00), while patients in a nonactive cluster did not show a higher risk compared with patients that were not part of a cluster (OR: 0.79; 95% CI: 0.56-1.14; p = 0.21).  Figure 3.
To check for influences of clustering, we focused on cases within clusters (n = 503). In this sub-group analysis, only age and transmission mode were statistically significant with ORs of 1.03 (95% CI: 1.01-1.05; p < 0.00) and 0.54 (95% CI: 0.36-0.82; p < 0.00) respectively. Correcting for cluster id in a multi-level analysis did not meaningfully change our results, with corrected ORs of 1.04 (95% CI: 1.02-1.06; p < 0.00) and 0.55 (95% CI: 0.35-0.86; p < 0.00) for age and transmission mode, respectively. The intra-class correlation coefficient was 0.07, indicating a small correlation within clusters and thus lending support for our overall analyses above.
In the main analyses, VLP were included in the LP group and subsequent analyses did not show a statistical distinction between LP and VLP for the association with cluster size (p = 0.62). However, HSX were more at risk of being VLP (OR: 2.87; 95% CI: 1.72-4.89; p < 0.00) than MSM. Table 2 shows descriptive statistics of patients clustered according to genetic distance cut-off points of 4.5, 3.0, 1.5, and 1.0. Only cluster size and cluster activity showed statistically significant differences between genetic distances. An association with cluster size is to be expected as reducing the genetic distance naturally decreases the number of larger clusters that may be more spread out in time and thus have larger genetic distances; indeed, there were proportionally more clusters of size ≥ 11 and fewer of smaller sizes at genetic distance 4.5 compared with smaller genetic distances. Contrary to our expectations, there were proportionally more active clusters at genetic distance 4.5 than at genetic distances 3.0 and 1.0. When comparing the model results for HIV subtype B between various genetic distances, the main difference is seen at genetic distance 1.5, where more recent infections are associated with being part of a cluster (see Supplementary Table S1 and S2).

Discussion
In this study we found that 48.4% of patients diagnosed with HIV-1 in Denmark were part of a cluster and of those, 29.0% were part of a cluster containing more Age has also been observed as an important factor in ongoing HIV-1 transmission in other studies [12,24]. Our finding that a substantial part of HIV-1 transmission occurs within networks of MSM infected with subtype B has also been reported in studies from other European countries, including Portugal [9,12]. In Denmark, the HIV epidemic has gradually become driven by subtype B infections.
Non-Danish origin seemed to be more related to smaller clusters or no-clusters, indicating that some of these patients may have been infected abroad and contribute to Danish clusters only to a limited degree. Interestingly, diagnosis between 2015-17 vs earlier period was not a significant risk-factor in our study, suggesting that there is no propagation in the number or size of clusters. In general, the risk profile for being part of a cluster is not surprising but does highlight a clear profile for those most at risk. Although comparable risk factors for national transmission have been identified in other European countries, recent studies from Spain [25] and Italy [26] show that newly introduced non-B subtypes can be established and transmitted relatively quickly. This highlights the importance of ongoing national surveillance in order to identify changes in the epidemic.
In total, 48% of the patients in the study were characterised as LP. Late presentation was significantly associated with not being part of an active cluster, age, and origin (overall, patients of Danish origin had lower odds of being LP). Our analysis identified HSX of non-Danish origin that were not part of an active cluster and individuals of Danish origin for whom the risk also increases with age as high-risk groups for LP status. These risk profiles are not unexpected, since transmission in Denmark is to a large extent driven by clusters of young MSM of Danish origin. These patients generally have full access to the Danish healthcare system and are thus more likely to be tested regularly and in case of infection be detected early. In contrast, those of non-Danish origin are more often infected abroad or immigrate to Denmark while already infected. This group is more commonly mixed, with a large proportion having been infected following HSX contact. Our finding that those in clusters are less likely to be LP indicates that those who are part of clusters are more likely to be tested for HIV-1 than those not in clusters, who may have a lower risk awareness of HIV. Interestingly, the odds of being a LP increased with age. This may indicate that young patients are testing more regularly compared with older patients or that those of higher age simply have had more time to become a LP.
Choice of genetic distance does not seem to have impacted our analysis. While several differences were found between genetic distance of 4.5 and 3.0, smaller genetic distance may indicate a closer time of diagnosis and thus can help identify highly active transmission clusters or superspreader events. However, the link between genetic distance and time is complex and we were not able to find clear differences between various genetic distances. Indeed, the choice of genetic distance heavily influences the size of clusters identified. For example, a genetic distance of 1.5 was unable to identify any of the large clusters of 11 or more patients in our data. When looking into the larger clusters more closely (see Supplementary Figure S2), these often seemed to consist of sub-clusters that could be identified when using smaller genetic distances.
These differences seem to imply that larger clusters with a genetic distance of 3.0 or more might depict a continuously growing transmission chain, whereas smaller clusters with a small genetic distance show the subset of transmission events more closely spaced in time. Contrary to our expectations, a smaller genetic distance was also associated with relatively fewer active clusters. It is possible that the sub-clusters are more often inactive, which implies that a few patients connect smaller clusters with one another over longer periods of time. Alternatively, it could indicate more recent changes in the dynamics of HIV-1 transmission in Denmark towards longer time intervals between infection events. However, in the current study, it was not possible to identify the reasons for this unexpected finding and future studies are needed to address this further. In addition, the epidemiological information was collected through a self-reported questionnaire, which could have impacted the accuracy of some answers. While the questionnaire included questions about risk behaviour, it did not include questions about the number of sex partners. Since the answers to these questions were often missing, they were not included in the analysis and therefore residual confounding may be present in our results. Furthermore, information on HIV positive test results is requested as part of the routine SERO survey, albeit responses on test outcomes are rarely provided. Active sentinel surveillance systems like this are commonly used for surveillance purposes in cases where it is not possible to obtain a more comprehensive surveillance system, for instance if participation is voluntary.
Another possible limitation is that we used the consensus definition for late presentation [13], which has been shown to overrepresent the proportion of LP [27]. Since HIV negative or positive test dates were only available for a small proportion of patients in this study, other means of establishing time since infection were not possible. In addition, since the consensus definition is commonly used, its use in this study will also make the results more comparable to other studies.
Despite these limitations, our analysis highlights the added value of HIV-1 surveillance systems, such as the SERO project, to public health monitoring, especially when they include the sampling of biological material. Such systems allow for more detailed analysis, which can improve the active control of HIV-1 spread, through identification of potential clusters, monitoring of the development of these clusters as well as specifically targeting the risk groups involved with intervention and communication strategies.

Conclusion
Our study identified several target groups for PrEP and communication/advice on PrEP or behavioural intervention strategies. Specifically, those in active clusters with young Danish MSM are key in the ongoing transmission of HIV in Denmark and targeting this group with PrEP/behavioural interventions may achieve the most benefit in reducing the ongoing national transmission. Also, we identified that HSX of non-Danish origin that are not part of a transmission cluster are at a high risk to present late for HIV-diagnosis and may therefore benefit from improved communication and information about the benefits of an early diagnosis and treatment initiation.