Leader: Sandro Locati (Beta80); Other collaborator(s): M.Bochicchio (UNIBA)
The system architecture design will define the reference architecture of a service-based data platform to be securely published over the Internet performing:
1) automatic data acquisition from local data collectors;
2) queries of raw data;
3) management of data access of health operators (clinicians, health operators, etc.) with controlled permissions.
The main objective is to provide a strategic guidelines to implement a software platform that enables access to different heterogeneous data distributed throughout the territory.
This software platform, in addition to encourage the sharing of data and knowledge, offers a single point of reference, where an indicator of fragility can be identified, and a personalized intervention plan can be defined.
The prototype development will deliver a simplified implementation of the reference architecture, based on services provided by a standard Public Cloud Provider. The scope is to provide data acquisition, processing, and query capability tailored to the research needs of the other WPs of the Spoke 8.
Brief description of the activities and of the intermediate results
An Innovative ICT System for data collection, management, and analysis, has been designed and developed to face the requirements of Spoke 8. This ICT System is made of several subsystems collaborating together:
All the subsystems which goal is to manage and process data can be collectively referred to as “Data Platform”, which architecture and design has been described in the report corresponding to the first milestone of WP4.
Moreover, since the overall high-level process is a common pattern in applied research and, especially in several other spokes of the “Age-IT” project, the challenge to design a flexible and reusable “Data Platform” has been taking into account. The aim is to provide multiple instances/tenants of the same “Data Platform”, being able to ingest data from different Collection Subsystems (which must be tailored to the research environment and goals), to handle flexible data models, to configure alternative processing, and to supply customized data sets to different Data Provider Subsystems.
The resulting system includes the collecting systems, identified to conduct the trials and gather the data, and the target tools, used by researchers to visualize, observe, extract, and share the outcomes.
The identified collection subsystems are REDCap, an Electronic Data Capture System for building and managing surveys, and Garmin Vivosmart 5, a wearable technology company for fitness. The data platform will be able to integrate data directly from both systems. Moreover, the data platform will be able to load additional structured records form standard files (excel, json, xml), exported from REDCap or used in addition to it. Furthermore, Garmin Vivosmart 5 devices might be managed by fitrockr, a comprehensive data management platform designed to support innovative research projects using wearable devices, which proxies the data flows before loading into the data platform.
All researchers and stakeholders will either access the data products using the visualization and the business intelligence tools build-in the data platform or extract them to standard files (excel, json, xml).
Main policy, industrial and scientific implications
All the aspect related to data protection have been considered together with the data protection office of Milano-Bicocca, defining a document that regulates the data flow of the clinical trials. In particular the data platform architecture has been dedigned following the privacy by design approach.
In this period, in order to define and implement the ingestion methods and criteria of the CRF related to the clinical trials conducted in WP1 and WP2, collaboration has been established with the University of Bicocca both for the formalization of GDPR-related obligations and for defining the specific content of the CRFs that will be uploaded to the platform.
Within the task of system architecture design and operation, progress has been made in the implementation of the data platform. The implementation is based on the use of Azure cloud services and Databricks, which provides a scalable and collaborative environment for developing big data and machine learning solutions. Currently, the app for downloading data has been developed, while the system to enable data ingestion is still under development.
We have conducted an in-depth analysis of data ingestion methods, reviewing the format, frequency, and exploring automation mechanisms to streamline data submission and minimize errors. Additionally, we examined the data extraction processes from the platform, defining the required user interfaces and the level of granularity for data access across different user roles.
On the development side, we have made further progress in implementing the platform for WP1 and initiated the development for WP2. Moreover, we have carried out the first tests for importing data generated by wearable sensors, marking a key step towards the platform implementation.