Machine Learning Challenges on sensitive data

Machine learning challenges are a structuring part of the data science landscape. 

The most famous platform, Kaggle (owned by Google) has a community of more than one million users since 2010 and offers prize-pool that can exceed $100,000.

THESE CHALLENGES HAVE A DOUBLE INTEREST:  FOR COMPANIES AND FOR THE DATA SCIENTISTS PARTICIPATING IN THE CHALLENGES.


lIZrwvbeRuuzqOoWJUEn_Photoaday_CSD (1 of 1)-5.jpg

FOR DATA PROVIDERS....

These challenges allow data managers (companies, research centres, etc.) to present their problems, collect new ideas and estimate the best possible performance with state-of-the-art techniques.

B2.jpeg

For data scientists…

For competing data scientists, these challenges allow them to train in machine learning, to be able to work on real data and to demonstrate their ability to develop machine learning algorithms.

photo-1523485474951-78fcd9344f0c.jpg

The limits to the challenges....

However, many data providers cannot make sensitive data publicly available (strategic data, personal data, etc.), unless they transform it in depth (anonymization, standardization of values,, etc.), which makes it less interesting to work on.

THE SUBSTRA FRAMEWORK ALLOWS TO SOLVE THIS PROBLEM AND THUS MAKES POSSIBLE MACHINE LEARNING CHALLENGES ON SENSITIVE DATA

B4.jpg

NO SHARED DATA

The data provider hosts a Substra node where it stores its data. He is the only one who can access and visualize the data.

B5.jpg

AN AVAILABILITY OFFERED 

Competitor data scientists can then develop algorithms and send these algorithms to train on the data provider's node.

B6.jpg

PERFORMANCE MONITORING

The traceability intrinsic to the framework allows to track the performance of all algorithms that have been sent to the platform.

B7.jpg

It's time to harness all the talents....

With Substra, the barriers fall. Now you can allow the best data scientists, wherever they are, to work on your data without ever having to give access to it.