Predicting and mapping human risk of exposure to Ixodes ricinus nymphs using climatic and environmental data, Denmark, Norway and Sweden, 2016

Background Tick-borne diseases have become increasingly common in recent decades and present a health problem in many parts of Europe. Control and prevention of these diseases require a better understanding of vector distribution. Aim Our aim was to create a model able to predict the distribution of Ixodes ricinus nymphs in southern Scandinavia and to assess how this relates to risk of human exposure. Methods We measured the presence of I. ricinus tick nymphs at 159 stratified random lowland forest and meadow sites in Denmark, Norway and Sweden by dragging 400 m transects from August to September 2016, representing a total distance of 63.6 km. Using climate and remote sensing environmental data and boosted regression tree modelling, we predicted the overall spatial distribution of I. ricinus nymphs in Scandinavia. To assess the potential public health impact, we combined the predicted tick distribution with human density maps to determine the proportion of people at risk. Results Our model predicted the spatial distribution of I. ricinus nymphs with a sensitivity of 91% and a specificity of 60%. Temperature was one of the main drivers in the model followed by vegetation cover. Nymphs were restricted to only 17.5% of the modelled area but, respectively, 73.5%, 67.1% and 78.8% of the human populations lived within 5 km of these areas in Denmark, Norway and Sweden. Conclusion The model suggests that increasing temperatures in the future may expand tick distribution geographically in northern Europe, but this may only affect a small additional proportion of the human population.


Introduction
Ticks are one of the most important vectors for pathogens, impacting a wide range of vertebrates, and transmit more pathogens than any other arthropod [1,2]. In Europe, the main vector for tick-borne pathogens is Ixodes ricinus [3,4], which is also the most common tick species in Scandinavia [3][4][5]. Over the last decades, the incidence and geographical range of tick-borne diseases have increased [3,6,7] and pose a risk to both human and animal health. Scandinavia constitutes the edge of the northern distributional range of I. ricinus [4]. The incidence of Lyme borreliosis (LB) and tick-borne encephalitis (TBE) is increasing in both Norway and Sweden [5,[8][9][10]. In Norway, LB and TBE have mostly been reported along the coastline in the southern parts of the country [5,8]. However, tick-borne encephalitis virus (TBEV) has been found in I. ricinus nymphs as far north as ca 115 km from the Arctic Circle [11,12]. In Sweden, LB is widespread in the southern and eastern regions [4,13,14], whereas TBE is concentrated in the south-central and coastal regions, with the annual TBE incidence around Stockholm exceeding 4 per 100,000 inhabitants [9,10,15,16]. In Denmark, LB seems endemic and widespread [3], whereas TBEV-infected ticks have only been confirmed Forest includes the cover types: broad-leaved forest, coniferous forest and mixed forest. Meadow includes: land principally occupied by agriculture with significant areas of natural vegetation, natural grasslands, moors and heathland, and transitional woodland-shrub. The lines divide each country into equally sized northern and southern strata. Only parts of Norway and Sweden were included in the field study for logistic reasons. on the island of Bornholm and at one emerging site in northern Zealand with two human cases [17,18].
The increase in incidence and geographical range of pathogens and their tick vector is likely to be a combination of several factors, e.g. climate and availability of host species [6,7], which all affect the ticks' life cycle and therefore their distribution and the possibility of tick-borne diseases being present in specific regions [7,19]. Many hard ticks, as I. ricinus, are sensitive to climate and weather [1,6], and are restricted to live in areas with high rainfall and vegetation that keeps a humidity of at least 80%, to prevent desiccation when the ticks are off-host [1,7]. Knowing the distribution of ticks may help pinpoint potential risk areas for disease transmission and guide health authorities in determining where to focus surveillance efforts, where to use preventive measures, or where to put emphasis on informing people.
Determining tick distribution can be a difficult task depending on the size of the area of interest. Throughout Scandinavia, there have been several field studies on ticks and their associated pathogens [3,8,12,[20][21][22][23][24], but in order to predict tick presence in unsampled regions in the present but potentially also for the future, we need repeatable survey methods and to find factors associated with tick abundance that can aid us in developing models with high predictive power. In Norway, Jore et al. [2] used sheep serum antibody-positive for tick-borne Anaplasma phagocytophilum as a proxy for tick presence, finding effects of temperature, abundance of large cervids and farm animals as well as land cover on tick distribution. Studies in Sweden found significant effects of climate, vegetation parameters and length of vegetation period on tick abundance and distribution [13,14]. In Denmark, Jensen [23] found that I. ricinus nymph abundance was significantly affected by the interaction between soil water capacity and the number of hunted roe deer. Population counts per 1 km 2 a For each variable, the Fourier processing output includes mean, minimum, maximum, variance in raw data, combined variance in annual, biannual, and tri-annual cycles as well as amplitude, phase and variance of annual, bi-annual and tri-annual cycle. All predictors come as raster files with a resolution of 1 km 2 .
Several other studies from Europe and North America have also found a link between environmental factors and tick distribution, such as temperature, vegetation indexes and vapour pressure [25][26][27].
Although climate, land cover and host abundance may all play a role in tick distribution, it can often be difficult to obtain extensive data on host species, whereas environmental, weather and climate data are more readily available from satellite images and weather models. Machine learning techniques are increasingly used in developing models for vector predictions as they are flexible, can account for nonlinearity and interactions and can handle different types of predictor variables, such as satellite images of environmental data [28,29]. Machine learning techniques combined with environmental predictors have been used in modelling biting midges (Culicoides sp.) [30][31][32][33], and mosquitoes [28,34,35], and studies on ticks include modelling tick distribution or abundance [36][37][38] as well as the distribution of tick-borne human diseases [15,39].
The risk of human exposure to ticks, and potentially tick-borne diseases, depends on tick and host dynamics as well as human behaviour [40]. Several studies have reported that living in areas in close proximity to forest increases the risk of LB or TBE [41][42][43] as I. ricinus is more abundant in forest habitats [21,40].
We here present a novel map of nymphal I. ricinus distribution for Scandinavia using machine learning algorithms applied to field data, collected in a strict standardised design in the period from 15 August to 30 September 2016. Furthermore, we relate our modelling results of tick distribution to public data on human population density and to the distance to the predicted suitable tick habitats, in order to assess the potential public health impact.

Stratification of study region and site selection
This study was part of a larger study, where additional objectives were to measure tick abundance and collect nymphs for pathogen detection in Denmark, Norway and Sweden. The field collection region for I. ricinus nymphs was for logistical reasons limited to 274,660 km 2 including all of Denmark, southern Norway and southern Sweden as well as the Swedish eastern coastal zone ( Figure 1). Within this area, we excluded all altitudes of 450 m above sea level and higher (19,926 km 2 ), where ticks are rare or absent [5]; these altitudes were also excluded from the final prediction map.
We stratified the remaining land area (234,191 km 2 , excluding lakes and waterways) using Fourier processed satellite imagery of the normalised difference vegetation index (NDVI) [44] and Corine land cover data (1 km 2 resolution) [45] to define forest and meadow habitats. Other land cover categories were not sampled for ticks and were left out of the prediction map. For details about the stratification and Fourier-processed satellite imagery, see the Supplement.
We randomly selected 30 first-priority sample sites (80% forest and 20% meadow, Supplementary Table  S2) in each of the three countries (R 3.4.2 [46] and sampleStratified in the raster package). This number was logistically the maximum number of sites feasible to visit within a reasonable timeframe. We decided to collect 80% of the samples from forested areas, as forest areas are the most important tick habitat [21,40]. Furthermore, 10 alternative sites for each first-priority site were randomly selected in the same stratum and ordered in priority after shortest distance to original site. These alternative sites were created in case of problems with access to the priority area or difficulties collecting nymphs (for pathogen detection). If a priority area could not be sampled, we would move on to the first alternative site and so forth, keeping the abundance data from the original site if available. For each meadow site, we additionally created 10 alternative forest sites, to be sampled should it prove impossible to collect ticks in meadows.
Because we were interested in investigating tick abundance along the Oslo Fjord in detail, we chose a further 20 random sites along the fjord (maximum distance of 800 m from the coast), with 10 alternatives for each of the 20 sites (same setup as above, Supplementary  Table S3).

Field study
For logistical reasons, we conducted the field study between 15 August and 30 September 2016. We

Human risk of tick exposure
After identifying a final prediction map, we used the Gridded Population of the World dataset (raster with 1 km 2 resolution [47], Table 1), to identify the number of people living in areas within various distances to forest and meadows where the PP was higher or equal to 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80% and 90%. We chose distances from 1 km to 5 km to depict people living in close proximity to potential tick habitats. Details can be found in the Supplement.

Field study
We measured tick abundance at 37 sites in Denmark, 75 sites in Sweden and 47 sites in Norway. The 159 sites constitute 63.6 km of dragged transects (Table 2, Figure 1).

Presence/absence modelling
The final BRT model had an accuracy of 0.85, a sensitivity of 91% and a specificity of 60% (given a fixed cut-off of 50% PP). The area under the curve for the receiver operating characteristic was 0.86 (Supplementary Figure S2) [29]. As specificity was only 60% (with the default PP cut-off of 50%), we plotted the prediction errors (observed data -mean predicted probability of presence over the folds and the repeats) in order to visualise a potential spatial pattern (Supplementary Figure S3). From the spatial map, we concluded that the low specificity was mainly due to sites in Denmark and Norway (close to the Swedish border). The final prediction map encompassed 100%, 68.4% and 85.8% of Denmark, Norway and Sweden's total land area, respectively ( Figure 2). We only made predictions for forest and meadow habitats that corresponded to our sampling sites. Habitats with at least 50% PP of tick nymph presence (17.5% of the total modelled area) constituted 15.7% of Denmark's, 7.4% of Norway's and 23.9% of Sweden's land area within the modelled region. Assuming that tick presence in the areas of northern Norway and Sweden not included in the modelled region was below 50% PP, the percentage of a predicted tick risk of at least 50% was 5.1% and 20.5% of the total land area of Norway and Sweden, respectively.
The most important predictors in the final model were day-and night-time land surface temperatures and other parameters related to temperature, land cover (lower PP in transitional woodland-shrub compared with the other cover types), the middle infrared index and related parameters, and parameters related to the vegetation indices enhanced vegetation index (EVI) and NDVI (see plots of the top 5 predictors, Supplementary Figure S4).

Human risk of tick exposure
The modelled region incorporating all altitudes included 19.4 million people, with 5.5 million (28.4%), 4.5 million (23.2%) and 9.4 million (48.5%) in Denmark, Norway and Sweden, respectively, which corresponded to 100% of the total Danish population, 91% of the total Norwegian population and 97% of the total Swedish population (based on the population density raster file). The proportion of people living within 1 km of forest and meadow was consistently lower for Denmark (ranging from 11% to 7% with increasing PP) than for Norway (ranging from 37% to 13% with increasing PP) and Sweden (ranging from 37% to 26% with increasing PP) for all PP values (Figure 3). This number increased consistently as distance to forest and meadow reached 5 km with 76-61%, 88-44% and 85-73% of the regional population living within 5 km of forest or meadow with PP values ranging from 0.1 to 0.9 for Denmark, Norway and Sweden, respectively ( Figure 3). Figure 4 depicts

Probability of nymph presence
This map depicts the predicted region (100%, 68.4% and 85.8% of Denmark's, Norway's and Sweden's total land area). White areas within Denmark, Norway and Sweden are altitudes above 450 m or lakes, rivers and streams, or habitats other than our sampled forest and meadow habitats (not predicted).
areas where people live within 1, 3 and 5 km of forest or meadow for a fixed PP value of 50%.

Discussion
Using the machine learning technique Boosted Regression Trees, we were able to create maps of the probability of nymphal I. ricinus presence in Scandinavia with high predictive power based on a standardised repeatable procedure. The predicted distribution corresponded well with what is generally believed about tick distribution in Scandinavia, assuming that a PP lower than 50% is a true absence. The higher probabilities of presence around the southern Norwegian coast line is in agreement with the distribution maps known for Norway [5,24]. In Sweden, we found higher PPs in the southern parts, with a boundary north of the large lakes, above which PP was low. This border coincides well with the biogeographical and climatic boundary called Limes Norrlandicus (LN) that separates the species-rich boreo-nemoral zone with shorter and milder winters in the south, from the boreal zone in the northern parts of Sweden [48]. Before the 1980s, LN used to be the range limit for I. ricinus in Sweden [4], but since then, the range of I. ricinus has expanded beyond this biogeographical border albeit at low abundances [4]. Our model reflected this pattern, showing higher PP below LN and a quick drop in PP above LN, but with a low PP throughout this northern region. The distinct patches of low PP below the great lakes in Sweden follow observed lower temperatures at these two elevated areas (Supplementary Figure S5). The PP was high throughout Denmark, except for the dry heathlands and sandy habitats of central and western Jutland. This pattern corresponds well with what we know about tick biology and the need for a high relative humidity to sustain ticks in a given habitat [7].
Our model had low specificity compared with the sensitivity. Since the main priority of this study was prediction of true presences, we refrained from increasing the specificity, which could have been obtained by choosing a higher cut-off value than the fixed 50%. In general, certainty of true absence can be hard to obtain, as presence/absence is always dependent on the sampling effort. Our recorded absences may not have been true absences and our model may still have predicted presence based on the environmental variables for that specific site. Conversely, high local abundance of deer hosts may facilitate establishment of ticks in areas for which the model predicted absence. In our data, we had a low proportion of absences (21%) and for Denmark alone, this number was 15.6%. Even though we used balancing methods to account for this disproportion, it is possible that our empirically collected sample could not feed the model with enough absence data to learn how to accurately predict absences, thus resulting in low specificity.
We were able to create a model with high predictive power based on environmental predictors. We found that land surface temperatures as well as measures of high vegetation cover (middle infrared light is absorbed by leaves and vegetation, thus densely vegetated areas reflect less middle infrared light) positively influenced the probability of nymph presence. However, the resulting modelled distribution may be due to other environmental factors correlated with these predictors, such as the climatic impact on vegetation and host species. Although ticks can be directly affected by temperatures and humidity [1,4,6,7], they are also dependent on their host species for survival and dispersal [4,7,9]. Abundance of host species may in turn be directly and indirectly affected by climate and weather [4,7,13], thus making it hard to separate factors into causal and confounding. Despite lacking fine resolution data on host abundance, we were able to use environmental predictors to create a biologically plausible model for I. ricinuspresence/absence in Scandinavia.
Overlaying our distribution maps for tick nymphs with human population density maps revealed the proportion of people potentially at risk for tick exposure. Based on studies estimating the risk of LB or TBE in relation to landscape characteristics around residential homes [41][42][43], we set the maximum distance from forest or meadow to 5 km. In general, we found that a large percentage of the population in the region live within 5 km of forest and meadows with a risk of tick presence, even if we set the cut-off for PP to be higher than the default 50%. Particularly for Norway, our model predicted high probability of nymph presence only for a very small area around the coast line; with a 50% PP cut-off, this area amounted to just 5.1% of Norway's total land area. Whereas this small area seems negligible, human population densities in Norway are relatively higher in these areas, exposing more people to tick habitats than we would expect by looking at the area alone, as 67% of the Norwegian population live within 5 km of forest and meadow

Figure 4
Areas with people living at different distances to forest/meadow that have a probability of presence of nymphal with PP ≥ 50%. That changing the PP cut-off value had a larger effect on the percentage of people at risk in Norway compared with Denmark and Sweden is probably due to a steep temperature gradient as we move away from the coast, caused by elevation-dependent temperatures (Supplementary Figure S5).
In the United States, Glass et al. [42] found that the odds of contracting LB increased within ca 1 km of living close to forested habitats. The proportion of people living within 1 km of forest or meadow is particularly low for Denmark no matter the PP cut-off (11-7%). This may however be a gross underestimation of exposure risk as Denmark has many fragmented small forest patches interspersed with agricultural fields and urban areas and these small patches may not show up in our coarse resolution of 1 km 2 . However, little is known about how likely these non-sampled areas are as tick habitats. In Norway and Sweden, a higher proportion of the population (between 37% and 13% at different PP cut-off values) are living within 1 km of forest or meadow.
This study showed that given the current distribution of ticks in Scandinavia, a high percentage of inhabitants are already exposed to the risk of tick bites (within a distance of 5 km to forest or meadow with a 50% PP, respectively 73.5%, 67.1% and 78.8% of the Danish, Norwegian and Swedish population may be at risk). The northward expansion of ticks and tick-borne pathogens in Norway and Sweden is a considerable public health concern [9]. However, human population densities in northern Norway and Sweden are low compared with the southern regions, and a tick range expanding north will therefore affect a smaller proportion of the human population. Our results therefore suggest that it may be desirable to target our surveillance and preventive measures in areas with high human population density and where ticks are well established, i.e. the whole of Denmark, the southern coastal parts of Norway, southern Sweden and Sweden's densely populated eastern coast along the Bothnian Bay.
Machine learning techniques allowed us to produce models and maps with high accuracy and predictive sensitivity for the whole region without having to sample every habitat. These models have highlighted areas at high risk of tick exposure and thus potentially of vector-borne diseases, and can help in targeting these areas for costly surveillance and preventive measures. It is important to note that our model reflects a moment in time, and does not show annual variation in tick distribution or how a future potential increase in temperatures may affect tick distribution and thus the potential for human exposure. Still, the study design is consistent between sites and repeatable, ensuring reliable future comparisons of tick distribution, and the produced maps allow for easy external validation. The resolution used to create our models may be too coarsely grained to catch small hotspots of tick presence/absence and the potential for human exposure. This is particularly evident for Denmark, which, throughout the country, has numerous small forest fragments smaller than 1 km 2 .