1887
Research Open Access
Like 0

Abstract

Background

The wide application of machine learning (ML) holds great potential to improve public health by supporting data analysis informing policy and practice. Its application, however, is often hampered by data fragmentation across organisations and strict regulation by the General Data Protection Regulation (GDPR). Federated learning (FL), as a decentralised approach to ML, has received considerable interest as a means to overcome the fragmentation of data, but it is yet unclear to which extent this approach complies with the GDPR.

Aim

Our aim was to understand the potential data protection implications of the use of federated learning for public health purposes.

Methods

Building upon semi-structured interviews (n = 14) and a panel discussion (n = 5) with key opinion leaders in Europe, including both FL and GDPR experts, we explored how GDPR principles would apply to the implementation of FL within public health.

Results

Whereas this study found that FL offers substantial benefits such as data minimisation, storage limitation and effective mitigation of many of the privacy risks of sharing personal data, it also identified various challenges. These challenges mostly relate to the increased difficulty of checking data at the source and the limited understanding of potential adverse outcomes of the technology.

Conclusion

Since FL is still in its early phase and under rapid development, it is expected that knowledge on its impracticalities will increase rapidly, potentially addressing remaining challenges. In the meantime, this study reflects on the potential of FL to align with data protection objectives and offers guidance on GDPR compliance.

Loading

Article metrics loading...

/content/10.2807/1560-7917.ES.2024.29.38.2300695
2024-09-19
2024-10-03
http://instance.metastore.ingenta.com/content/10.2807/1560-7917.ES.2024.29.38.2300695
Loading
Loading full text...

Full text loading...

/deliver/fulltext/eurosurveillance/29/38/eurosurv-29-38-3.html?itemId=/content/10.2807/1560-7917.ES.2024.29.38.2300695&mimeType=html&fmt=ahah

References

  1. Deiner MS, Lietman TM, McLeod SD, Chodosh J, Porco TC. Surveillance tools emerging from search engines and social media data for determining eye disease patterns. JAMA Ophthalmol. 2016;134(9):1024-30.  https://doi.org/10.1001/jamaophthalmol.2016.2267  PMID: 27416554 
  2. Lakhani CM, Tierney BT, Manrai AK, Yang J, Visscher PM, Patel CJ. Repurposing large health insurance claims data to estimate genetic and environmental contributions in 560 phenotypes. Nat Genet. 2019;51(2):327-34.  https://doi.org/10.1038/s41588-018-0313-7  PMID: 30643253 
  3. Khoury MJ, Engelgau M, Chambers DA, Mensah GA. Beyond public health genomics: can big data and predictive analytics deliver precision public health? Public Health Genomics. 2018;21(5-6):244-50.  https://doi.org/10.1159/000501465  PMID: 31315115 
  4. Salathé M. Digital epidemiology: what is it, and where is it going? Life Sci Soc Policy. 2018;14(1):1.  https://doi.org/10.1186/s40504-017-0065-7  PMID: 29302758 
  5. European Parliament. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation). Luxembourg: Official Journal of the European Union. L119/1. 4 May 2016. Available from: https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32016R0679&from=EN
  6. Kaissis GA, Makowski MR, Rückert D, Braren RF. Secure, privacy-preserving and federated machine learning in medical imaging. Nat Mach Intell. 2020;2(6):305-11.  https://doi.org/10.1038/s42256-020-0186-1 
  7. Institute of Medicine (US) Committee on Regional Health Data Networks. Health data in the information age: use, disclosure, and privacy. Donaldson MS, Lohr KN, editors. Washington (DC): National Academies Press (US); 1994. Available from: https://www.ncbi.nlm.nih.gov/books/NBK236547
  8. Mulder T, Tudorica M. Privacy policies, cross-border health data and the GDPR. Inf Commun Technol Law. 2019;28(3):261-74.  https://doi.org/10.1080/13600834.2019.1644068 
  9. Murdoch TB, Detsky AS. The inevitable application of big data to health care. JAMA. 2013;309(13):1351-2.  https://doi.org/10.1001/jama.2013.393  PMID: 23549579 
  10. Naudé W. Artificial intelligence vs COVID-19: limitations, constraints and pitfalls. AI Soc. 2020;35(3):761-5.  https://doi.org/10.1007/s00146-020-00978-0  PMID: 32346223 
  11. GCEU General Court of the European Union. Judgement of the General Court (Eighth Chamber, Extended Composition) of 26 April 2023. Single Resolution Board v European Data Protection Supervisor. Protection of personal data – Procedure for granting compensation to shareholders and creditors following the resolution of a bank – Decision of the EDPS in which it found that the SRB failed to fulfil its obligations concerning the processing of personal data – Article 15(1)(d) of Regulation (EU) 2018/1725 – Concept of personal data – Article 3(1) of Regulation 2018/1725 – Right of access to the file. Case T-557/20. ECLI:EU:T:2023:219. Available from: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A62020TJ0557
  12. Xu J, Glicksberg BS, Su C, Walker P, Bian J, Wang F. Federated learning for healthcare informatics. J Healthc Inform Res. 2021;5(1):1-19.  https://doi.org/10.1007/s41666-020-00082-4  PMID: 33204939 
  13. Konečný J, McMahan HB, Yu FX, Richtárik P, Suresh AT, Bacon D. Federated learning: Strategies for improving communication efficiency. arXiv preprint. 2016; arXiv:1610.05492.  https://doi.org/http://dx.doi.org/10.48550/arXiv.1610.05492 
  14. Konečný J, McMahan HB, Ramage D, Richtárik P. Federated optimization: Distributed machine learning for on-device intelligence. arXiv preprint. 2016; arXiv:1610.02527.  https://doi.org/http://dx.doi.org/10.48550/arXiv.1610.02527 
  15. McMahan HB, Moore E, Ramage D, Hampson S, Arcas BA. Communication-efficient learning of deep networks from decentralized data. PMLR. 2017;54:1273-82.
  16. Rieke N, Hancox J, Li W, Milletarì F, Roth HR, Albarqouni S, et al. The future of digital health with federated learning. NPJ Digit Med. 2020;3(1):119.  https://doi.org/10.1038/s41746-020-00323-1  PMID: 33015372 
  17. Haverinen J, Keränen N, Falkenbach P, Maijala A, Kolehmainen T, Reponen J. Digi-HTA: Health technology assessment framework for digital healthcare services. Finn J Ehealth Ewelfare. 2019;11(4):326-41.  https://doi.org/10.23996/fjhw.82538 
  18. Vollmer S, Mateen BA, Bohner G, Király FJ, Ghani R, Jonsson P, et al. Machine learning and artificial intelligence research for patient benefit: 20 critical questions on transparency, replicability, ethics, and effectiveness. BMJ. 2020;368:l6927.  https://doi.org/10.1136/bmj.l6927  PMID: 32198138 
  19. Fairchild AL, Gable L, Gostin LO, Bayer R, Sweeney P, Janssen RS. Public goods, private data: HIV and the history, ethics, and uses of identifiable public health information. Public Health Rep. 2007;122(Suppl 1) Suppl 1;7-15.  https://doi.org/10.1177/00333549071220S103  PMID: 17354522 
  20. Meingast M, Roosta T, Sastry S. Security and privacy issues with health care information technology. Conf Proc IEEE Eng Med Biol Soc. 2006;2006:5453-8.  https://doi.org/10.1109/IEMBS.2006.260060  PMID: 17946702 
  21. Moon MD. Triangulation: a method to increase validity, reliability, and legitimation in clinical research. J Emerg Nurs. 2019;45(1):103-5.  https://doi.org/10.1016/j.jen.2018.11.004  PMID: 30616761 
  22. Parker C, Scott S, Geddes A. Snowball sampling. In: P. Atkinson, S. Delamont, A. Cernat, J.W. Sakshaug, R.A. Williams (eds). SAGE Research Methods Foundations. 2019.
  23. Skjott Linneberg M, Korsgaard S. Coding qualitative data: a synthesis guiding the novice. Qual Res J. 2019;19(3):259-70.  https://doi.org/10.1108/QRJ-12-2018-0012 
  24. Choudhury O, Gkoulalas-Divanis A, Salonidis T, Sylla I, Park Y, Hsu G, et al. Differential privacy-enabled federated learning for sensitive health data. arXiv preprint; 2019. arXiv:1910.02578.  https://doi.org/http://dx.doi.org/10.48550/arXiv.1910.02578 
  25. Ma J, Naas SA, Sigg S, Lyu X. Privacy-preserving federated learning based on multi-key homomorphic encryption. Int J Intell Syst. 2022;37(9):5880-901.  https://doi.org/10.1002/int.22818 
  26. Bak M, Madai VI, Celi LA, Kaissis GA, Cornet R, Maris M, et al. Federated learning is not a cure-all for data ethics. Nat Mach Intell. 2024;6(4):370-2.  https://doi.org/10.1038/s42256-024-00813-x 
  27. Truong N, Sun K, Wang S, Guitton F, Guo YK. Privacy preservation in federated learning: An insightful survey from the GDPR perspective. arXiv preprint; 2020. arXiv:2011.05411.  https://doi.org/10.48550/arXiv.2011.05411 
  28. Neuwirth RJ. The European Union's proposed artificial intelligence act. In: The EU Artificial Intelligence Act (AIA). 1st ed. London: Routledge; 2022. ISBN9781003319436.
  29. Kairouz P, McMahan HB, Avent B, Bellet A, Bennis M, Nitin Bhagoji A, et al. Advances and open problems in federated learning. Found Trends Mach Learn. 2021;14(1–2):1-210.  https://doi.org/10.1561/2200000083 
  30. Hueske AK, Guenther E. What hampers innovation? External stakeholders, the organization, groups and individuals: a systematic review of empirical barrier research. Manag Rev Q.2015;65(2):113-48.  https://doi.org/10.1007/s11301-014-0109-5 
  31. Kolain M, Wirth C. Privacy by BlockChain Design: A blockchain-enabled GDPR-compliant approach for handling personal data. Proceedings of 1st ERCIM Blockchain Workshop. European Society for Socially Embedded Technologies (EUSSET), 6 May 2018.  https://doi.org/10.18420/blockchain2018_03  https://doi.org/10.18420/blockchain2018_03 
  32. Eager J, Whittle M, Smit J, Cacciaguerra G, Lale-demoz E. Opportunities of artificial intelligence policy. Bruxelles: European Parliament, Policy Department for Economic, Scientific and Quality of Life Policies; 2020. Available from: https://www.europarl.europa.eu/RegData/etudes/STUD/2020/652713/IPOL_STU(2020)652713_EN.pdf
  33. Schmidt P, Biessmann F, Teubner T. Transparency and trust in artificial intelligence systems. J Decis Syst. 2020;29(4):260-78.  https://doi.org/10.1080/12460125.2020.1819094 
  34. van der Waal MB, Dos S Ribeiro C, Ma M, Haringhuizen GB, Claassen E, van de Burgwal LHM. Blockchain-facilitated sharing to advance outbreak R&D. Science. 2020;368(6492):719-21.  https://doi.org/10.1126/science.aba1355  PMID: 32409465 
  35. Zhang X, Liu S, Chen X, Wang L, Gao B, Zhu Q. Health information privacy concerns, antecedents, and information disclosure intention in online health communities. Inf Manage. 2018;55(4):482-93.  https://doi.org/10.1016/j.im.2017.11.003 
  36. Chaari Fourati L, Ayed S. Federated learning toward data preprocessing: COVID-19 context. 2021 IEEE International Conference on Communications Workshops (ICC Workshops). Montreal; 14-23 June 2021.  https://doi.org/10.1109/ICCWorkshops50388.2021.9473590 
  37. Harrison PW, Lopez R, Rahman N, Allen SG, Aslam R, Buso N, et al. The COVID-19 Data Portal: accelerating SARS-CoV-2 and COVID-19 research through rapid open access data sharing. Nucleic Acids Res. 2021;49(W1):W619-23.  https://doi.org/10.1093/nar/gkab417  PMID: 34048576 
  38. Rijksoverheid. Wet medisch-wetenschappelijk onderzoek met mensen. [Medical research involving human subjects act]. Den Haag: Rijksoverheid; 1998. Available from: https://wetten.overheid.nl/BWBR0009408/2022-03-15
/content/10.2807/1560-7917.ES.2024.29.38.2300695
Loading

Data & Media loading...

Supplementary data

Submit comment
Close
Comment moderation successfully completed
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error