Collaborations between data providers and data scientists

To develop new algorithms or improve their existing models, data scientists need a large amount of data. But the data is sometimes not accessible to them because it’is too confidential to be shared by the organizations that collected them. Data can also be spread among a large number of different suppliers. 

HOW TO GIVE STARTUPS, COMPANIES, RESEARCHERS ACCESS TO NEW DATA WITHOUT COMPROMISING DATA CONFIDENTIALITY?

A1.jpg

A new definition of data access

With Substra, data providers can make their data available to data scientists in a way that data scientists can create new models but can not see the data.

Capture d’écran 2020-06-01 à 18.51.44.png

MOVE ALGOS AND MODELS, NOT DATA

Thanks to distributed learning, it is not necessary to transfer the data to a centralized server.

The data remain on the data provider’s own infrastructures. Only models and algorithms move. 

A2.jpg

COLLABORATE WITH COMPETING DATA PROviders

For your ML needs, you establish collaborations with data providers, sometimes competitors. With a model where data is distributed, each supplier preserves its data.

The absence of the need to centralize competitor data allows for new collaborations.  

A3.jpg

VALUE YOUR DATA AT THEIR FAIR VALUE

Each data providercan be remunerated according to its contribution to the performance of the model. We are working on different way of measuringthe contributivity of datasets (see on Github).

Data producers can value their data to the extent of their contribution.

A4.jpg

Explore new horizons

Data too confidential, data too scattered among various organizations....

With the Substra framework, these hurdles will no longer be an obstacle to your data science collaborations.

As a data management organization, enhance the value of your data in security, and at their fair value.

You want to know more about it? Find out how the HealthChain consortium brings together hospitals, research laboratories, innovative start-ups and Substra Foundation to develop AI models based on clinical data.