Leader: Mario Bochicchio (UniBA); Other collaborator(s): Giovanni Paragliola (ICAR-CNR)
In the perspective of the new models adopted for decentralized clinical trials, we will explore new solution based on privacy-preserving federated learning techniques and Generative Adversarial Networks to acquire patient data and process it on demand without violating the constraints imposed by GDPR
Brief description of the activities and of the intermediate results
Practices and tools adopted at the international/European/Italian level for collecting, sharing and processing clinical and wellness monitoring data were explored, with a focus on the new and promising class of privacy-friendly federated learning techniques. The analysis and testing of such techniques at different levels of granularity (individual subject/patient, physician, hospital, etc.) is aimed at the possible definition and validation of innovative and more flexible approaches for predictive health analysis.
Private data-based collaborative approach allows training ML models with no need to collect the subject’s data to the center for ML analysis. The decision model can be trained while preserving data privacy.
Different frameworks available based on the idea of a remote procedural call have been investigated: Substra, Flower, FedML, OpenFL and others. Especially Substra and OpenFL have been used for biomedical applications.
From the initial analysis it emerges that Substra is potentially a good solution:
- Substra is proven in real production environments (e.g. MELLODY and HealthChain project)
- Supports various FL scenarios, including horizontal FL, vertical FL, transfer FL
- Provides tools for data preprocessing, model evaluation, and secure aggregation of model updates.
- It supports both PyTorch and TensorFlow and can be deployed on-premise or cloud
Main policy, industrial and scientific implications
Federated learning could be implemented for robust privacy-preserving systems in health informatics.
Federated learning has been studied as an attractive solution to enable decentralized nodes to collectively train shared machine learning models without the need to transmit sensitive data to a central database.
In health informatics, the need for robust privacy-preserving mechanisms is critical, and it becomes particularly significant when dealing with predictive diagnosis and analysis in personalized medicine, precision medicine, risk stratification, and longitudinal monitoring. We explore the applications of federated learning frameworks in the context of cloud-edge in healthcare. We identify real-world settings to assess the benefits and challenges of personalized federated learning. These include issues of data imbalance, usability, promoting replicability, improving security, minimizing environmental impact (greenness), and optimizing overall efficiency.
In relation to multi-component interventions, we have initiated an analysis of Retrieval Augmented Generation (RAG) systems as components for the development of autonomous conversational agents to assist pre-frail and frail individuals for monitoring and cognitive stimulation purposes. The first results of the research were published at the international conference, SYNERGY 2024.
- In collaboration with project partners, the study of signal and data analysist echniques of interest has been deepened, focusing in particular on time series analysis using Convolutional Neural Networks. Special attention was given to the application of eXplainable AI (XAI) principles and techniques in the detection of arrhythmias in cardiology patients at high risk of heart failure. This activity is documented in a paper presented at the European Conference on Artificial Intelligence ECAI, Santiago de Compostela, October 19-24, 2024.
- In close collaboration with the Milano-Bicocca research group, the acquisition process for 250 Garmin Vivosmart 5 wearable devices was completed. These devices will be used to monitor pre-frail and frail patients in accordance with the protocol agreed upon with the clinical study project leaders. The acquisition procedures for other necessary devices and services to carry out the activities are ongoing.
- As a specific contribution to the project, the use of PrivacyPreserving Federated Learning (PPFL) techniques has been explored as a potential enabling element to achieve sufficient data quantities for the training of Machine Learning systems with certified quality, given that the data are related to patients under medical supervision. Specifically, we delved into the main issues and challenges of PPFL techniques (heterogeneous use of monitoring techniques and devices, non-uniform distribution of cases among participating centers, computational and communication overhead, convergence problems, etc.). We also analyzed the main frameworks available for research, aiming to start an in-depth experimentation and comparison between Substra and Flower. This activity is documented in Bochicchio, M., Zeleke, S. N. (2024, April). Personalized Federated Learning in Edge-Cloud Continuum for Privacy-Preserving Health Informatics: Opportunities and challenges.
- Lastly, we completed the selection process for research fellow Amin Tuni Gure, who began service in October 2024, and, through an international selection process, qualified PhD student Sileshi Nibret Zeleke to participate in a semester-long research internship focused on XAI and PPFL topics at Penn State University, USA, in coordination with Professor Fenglong Ma. This activity is also valuable for the dissemination of the research results achieved and the creation of international collaboration.
- Nibret Zeleke, S., Fentie Jember, A., & Bochicchio, M. (2025). Integrating Explainable AI for Effective Malware Detection in Encrypted Network Traffic. arXiv e-prints, arXiv-2501.
- Bochicchio, M. A., Corciulo, S., Symbiosis and Synesthesia in Proactive Conversational Agents for Healthy Ageing, in Proceedings of the 1st International Workshop on Designing and Building Hybrid Human-AI Systems (SYNERGY 2024), Arenzano (Genoa), Italy, June 03, 2024, CEUR Workshop Proceedings. URL: https://ceur-ws.org/Vol-3701/paper10.pdf.
- Bochicchio, M., Zeleke, S.N. (2024). Personalized Federated Learning in Edge-Cloud Continuum for Privacy-Preserving Health Informatics: Opportunities and Challenges. In: Barolli, L. (eds) Advanced Information Networking and Applications. AINA 2024. Lecture Notes on Data Engineering and Communications Technologies, vol 203. Springer, Cham. https://doi.org/10.1007/978-3-031-57931-8_36
- Zeleke, S. N., Bochicchio, M.Towards Explainable Federated Learning in Healthcare: A Focus on Heart Arrhythmia Detection. Proceedings of the First Workshop on Explainable Artificial Intelligence for the Medical Domain (EXPLIMED 2024) co-located with 27th European Conference on Artificial Intelligence (ECAI 2024) Santiago de Compostela, Spain, October 20, 2024.
- S. N. Zeleke, A. Fentie Jember and M. Bochicchio, "Encrypted Malicious Network Traffic Detection: Leveraging Attention Mechanism and Markov Chain Sequencing," 2024 International Conference on Information and Communication Technology for Development for Africa (ICT4DA), Bahir Dar, Ethiopia, 2024, pp. 148-153, doi: 10.1109/ICT4DA62874.2024.10777138.
- S. N. Zeleke and M. Bochicchio, "Federated Kolmogorov-Arnold Networks for Health Data Analysis: A Study Using ECG Signal," 2024 IEEE International Conference on Big Data (BigData), Washington, DC, USA, 2024, pp. 8070-8077, doi: 10.1109/BigData62323.2024.10825188.