Collaborations between data providers and data scientists
To develop new algorithms or improve their existing models, data scientists need a large amount of data. But the data is sometimes not accessible to them because it’is too confidential to be shared by the organizations that collected them. Data can also be spread among a large number of different suppliers.
HOW TO GIVE STARTUPS, COMPANIES, RESEARCHERS ACCESS TO NEW DATA WITHOUT COMPROMISING DATA CONFIDENTIALITY?
MOVE ALGOS AND MODELS, NOT DATA
Thanks to distributed learning, it is not necessary to transfer the data to a centralized server.
The data remain on the data provider’s own infrastructures. Only models and algorithms move.
COLLABORATE WITH COMPETING DATA PROviders
For your ML needs, you establish collaborations with data providers, sometimes competitors. With a model where data is distributed, each supplier preserves its data.
The absence of the need to centralize competitor data allows for new collaborations.
VALUE YOUR DATA AT THEIR FAIR VALUE
Each data providercan be remunerated according to its contribution to the performance of the model. We are working on different way of measuringthe contributivity of datasets (see on Github).
Data producers can value their data to the extent of their contribution.
You want to know more about it? Find out how the HealthChain consortium brings together hospitals, research laboratories, innovative start-ups and Substra Foundation to develop AI models based on clinical data.