Research Open Access
Like 0



Despite the early development of Google Flu Trends in 2009, standards for digital epidemiology methods have not been established and research from European countries is scarce.


In this article, we study the use of web search queries to monitor influenza-like illness (ILI) rates in the Netherlands in real time.


In this retrospective analysis, we simulated the weekly use of a prediction model for estimating the then-current ILI incidence across the 2017/18 influenza season solely based on Google search query data. We used weekly ILI data as reported to The European Surveillance System (TESSY)  each week, and we removed the then-last 4 weeks from our dataset. We then fitted a prediction model based on the then-most-recent search query data from Google Trends to fill the 4-week gap (‘Nowcasting’). Lasso regression, in combination with cross-validation, was applied to select predictors and to fit the 52 models, one for each week of the season.


The models provided accurate predictions with a mean and maximum absolute error of 1.40 (95% confidence interval: 1.09–1.75) and 6.36 per 10,000 population. The onset, peak and end of the epidemic were predicted with an error of 1, 3 and 2 weeks, respectively. The number of search terms retained as predictors ranged from three to five, with one keyword, ‘griep’ (‘flu’), having the most weight in all models.


This study demonstrates the feasibility of accurate, real-time ILI incidence predictions in the Netherlands using Google search query data.


Article metrics loading...

Loading full text...

Full text loading...



  1. Salathé M, Bengtsson L, Bodnar TJ, Brewer DD, Brownstein JS, Buckee C, et al. Digital epidemiology. PLOS Comput Biol. 2012;8(7):e1002616.  https://doi.org/10.1371/journal.pcbi.1002616  PMID: 22844241 
  2. Milinovich GJ, Williams GM, Clements ACA, Hu W. Internet-based surveillance systems for monitoring emerging infectious diseases. Lancet Infect Dis. 2014;14(2):160-8.  https://doi.org/10.1016/S1473-3099(13)70244-5  PMID: 24290841 
  3. Simonsen L, Gog JR, Olson D, Viboud C. Infectious Disease Surveillance in the Big Data Era: Towards Faster and Locally Relevant Systems. J Infect Dis. 2016;214(4) suppl_4;S380-5.  https://doi.org/10.1093/infdis/jiw376  PMID: 28830112 
  4. Bovi AM, Council on Ethical and Judicial Affairs of the American Medical Association. Use of health-related online sites. Am J Bioeth. 2003;3(3):F3.  https://doi.org/10.1162/152651603322874780  PMID: 14735882 
  5. Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L. Detecting influenza epidemics using search engine query data. Nature. 2009;457(7232):1012-4.  https://doi.org/10.1038/nature07634  PMID: 19020500 
  6. Lazer D, Kennedy R, King G, Vespignani A. Big data. The parable of Google Flu: traps in big data analysis. Science. 2014;343(6176):1203-5.  https://doi.org/10.1126/science.1248506  PMID: 24626916 
  7. O’Shea J. Digital disease detection: A systematic review of event-based internet biosurveillance systems. Int J Med Inform. 2017;101:15-22.  https://doi.org/10.1016/j.ijmedinf.2017.01.019  PMID: 28347443 
  8. Schneider P, Paget J, Spreeuwenberg P, Barnett D, van Gool C. Using Wikipedia and Google data to estimate near real-time influenza incidence in Germany: A Tutorial in R. 2018. Available from: https://projectflutrend.github.io/
  9. Netherlands Institute for Health Service Research (Nivel). Nivel Primary Care Database. Utrecht: Nivel; 2018. Available from: https://www.nivel.nl/en/nivel-primary-care-database
  10. Donker GA. Nivel Primary Care Database - Sentinel Practices 2015. Utrecht: Nivel; 2016. Available from: https://www.nivel.nl/sites/default/files/bestanden/Peilstations_2015_Engel.pdf
  11. European Centre for Disease Prevention and Control (ECDC). The European Surveillance System (TESSy). Stockholm: ECDC; 2018. Available from: https://ecdc.europa.eu/en/publications-data/european-surveillance-system-tessy
  12. Google. Google Trends. Mountain View: Google; 2018. Available from: https://trends.google.com/trends/
  13. Massicotte P, Eddelbuettel D. gtrendsR. R functions to perform and display Google Trends queries. 2018. Available from: https://github.com/PMassicotte/gtrendsR
  14. Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc B. 1996;58(1):267-88.  https://doi.org/10.1111/j.2517-6161.1996.tb02080.x 
  15. Heinze G, Wallisch C, Dunkler D. Variable selection - A review and recommendations for the practicing statistician. Biom J. 2018;60(3):431-49.  https://doi.org/10.1002/bimj.201700067  PMID: 29292533 
  16. Kuhn M, Johnson K. Applied predictive modeling. New York: Springer; 2013. p. 69-128.
  17. Tashman LJ. Out-of-sample tests of forecasting accuracy: an analysis and review. Int J Forecast. 2000;16(4):437-50.  https://doi.org/10.1016/S0169-2070(00)00065-0 
  18. Lee YS, Scholtes S. Empirical prediction intervals revisited. Int J Forecast. 2014;30(2):217-34.  https://doi.org/10.1016/j.ijforecast.2013.07.018 
  19. Taylor J, Tibshirani R. Post‐selection inference for l1‐penalized likelihood models. Can J Stat. 2018;46(1):41-61.  https://doi.org/10.1002/cjs.11313  PMID: 30127543 
  20. Schneider P, Gool C van, Spreeuwenberg P, Barnett D, Paget J. bitowaqr/DutchFluTrend: Using digital epidemiology methods to monitor influenza-like illness in the Netherlands in real-time: the 2017-2018 season. 2018. Available from: https://zenodo.org/record/1459862#.XaSSFugzaUk
  21. Reukers DF, Asten LV, Brandsema PS, Dijkstra F, Donker GA, van Gageldonk-Lafeber AB, et al. Annual report: surveillance of influenza and other respiratory infections in the Netherlands: winter 2017/2018. Bilthoven: National National Institute for Public Health and the Environment (RIVM); 2018. Available from: https://www.rivm.nl/bibliotheek/rapporten/2018-0049.pdf
  22. Santillana M, Zhang DW, Althouse BM, Ayers JW. What can digital disease detection learn from (an external revision to) Google Flu Trends? Am J Prev Med. 2014;47(3):341-7.  https://doi.org/10.1016/j.amepre.2014.05.020  PMID: 24997572 
  23. Jun S-P, Yoo HS, Choi S. Ten years of research change using Google Trends: From the perspective of big data utilizations and applications. Technol Forecast Soc Change. 2018;130:69-87.  https://doi.org/10.1016/j.techfore.2017.11.009 
  24. McIver DJ, Brownstein JS. Wikipedia usage estimates prevalence of influenza-like illness in the United States in near real-time. PLOS Comput Biol. 2014;10(4):e1003581.  https://doi.org/10.1371/journal.pcbi.1003581  PMID: 24743682 
  25. Santillana M, Nguyen AT, Dredze M, Paul MJ, Nsoesie EO, Brownstein JS. Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance. PLOS Comput Biol. 2015;11(10):e1004513.  https://doi.org/10.1371/journal.pcbi.1004513  PMID: 26513245 
  26. Preis T, Moat HS. Adaptive nowcasting of influenza outbreaks using Google searches. R Soc Open Sci. 2014;1(2):140095.  https://doi.org/10.1098/rsos.140095  PMID: 26064532 
  27. Valdivia A, Lopez-Alcalde J, Vicente M, Pichiule M, Ruiz M, Ordobas M. Monitoring influenza activity in Europe with Google Flu Trends: comparison with the findings of sentinel physician networks - results for 2009-10. Euro Surveill. 2010;15(29):19621.  https://doi.org/10.2807/ese.15.29.19621-en  PMID: 20667303 
  28. Samaras L, García-Barriocanal E, Sicilia M-A. Syndromic Surveillance Models Using Web Data: The Case of Influenza in Greece and Italy Using Google Trends. JMIR Public Health Surveill. 2017;3(4):e90.  https://doi.org/10.2196/publichealth.8015  PMID: 29158208 
  29. Tabataba FS, Chakraborty P, Ramakrishnan N, Venkatramanan S, Chen J, Lewis B, et al. A framework for evaluating epidemic forecasts. BMC Infect Dis. 2017;17(1):345.  https://doi.org/10.1186/s12879-017-2365-1  PMID: 28506278 
  30. Generous N, Fairchild G, Deshpande A, Del Valle SY, Priedhorsky R. Global disease monitoring and forecasting with Wikipedia. PLOS Comput Biol. 2014;10(11):e1003892.  https://doi.org/10.1371/journal.pcbi.1003892  PMID: 25392913 
  31. Shmueli G. To explain or to predict? Stat Sci. 2010;25(3):289-310.  https://doi.org/10.1214/10-STS330 

Data & Media loading...

Supplementary data

Submit comment
Comment moderation successfully completed
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error